On Fri, 17 Oct 2014 at 20:58:22, Lukas Fleischer wrote:
[...] Anatol and I were able to reproduce this issue. It seems to be related to __memcpy_avx_unaligned() in glibc which means that it only occurs on architectures with the AVX extension.
If you have a look at the memcpy-avx-unaligned.S source code [1], you will notice that there are several branches that copy blocks of different sizes. Now, for some reason, 6.8.9.8-1 always (or almost always) picks the L(less_32bytes) branch which means that only small blocks are copied, while 6.8.9.7-1 copies larger blocks. I do not have the time to debug this in detail but maybe you can add this information to the upstream report?
After a more thorough analysis by Jan, we found out that the issue is caused by a bug in the new OpenCL benchmark code. On modern CPUs, the benchmark is executed on every start of ImageMagick which leads to huge number of __memcpy_avx_unaligned() calls with a small block size. Jan prepared a patch and he will submit it upstream.
Regards, Lukas
[1] https://sourceware.org/git/?p=glibc.git;a=blob_plain;f=sysdeps/x86_64/multia...