On Fri, 17 Oct 2014 at 20:58:22, Lukas Fleischer wrote:
> [...]
> Anatol and I were able to reproduce this issue. It seems to be related
> to __memcpy_avx_unaligned() in glibc which means that it only occurs on
> architectures with the AVX extension.
> If you have a look at the memcpy-avx-unaligned.S source code [1], you
> will notice that there are several branches that copy blocks of
> different sizes. Now, for some reason, always (or almost
> always) picks the L(less_32bytes) branch which means that only small
> blocks are copied, while copies larger blocks. I do not have
> the time to debug this in detail but maybe you can add this information
> to the upstream report?

After a more thorough analysis by Jan, we found out that the issue is
caused by a bug in the new OpenCL benchmark code. On modern CPUs, the
benchmark is executed on every start of ImageMagick which leads to huge
number of __memcpy_avx_unaligned() calls with a small block size. Jan
prepared a patch and he will submit it upstream.

> Regards,
> Lukas
> [1] https://sourceware.org/git/?p=glibc.git;a=blob_plain;f=sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S;hb=HEAD

