[arch-general] imagemagick 6.8.9.8-1 is slower than 6.8.9.7-1

newer
[arch-general] Catalyst and GNOME...

Matthew Wynn

17 Oct 2014 17 Oct '14

2:39 a.m.

After upgrading to imagemagick 6.8.9.8-1, I've found it to be a lot slower than 6.8.9.7-1. I only get this issue when downloading from the repositories or using the PKGBUILD, not when compiling using the instructions at imagemagic.org. I've discussed the issue with the ImageMagick developers, which you can find at http://www.imagemagick.org/discourse-server/viewtopic.php?f=3&t=26369&sid=a2.... Even when trying to convert a 500x500 image to 100x100, it appears that an image that is 2048x1536 is created. I've tried this on multiple Arch machines with the same result.

Show replies by date

Anatol Pomozov

17 Oct 17 Oct

2:54 a.m.

Hi On Thu, Oct 16, 2014 at 7:39 PM, Matthew Wynn <m-wynn@live.com> wrote:

...

After upgrading to imagemagick 6.8.9.8-1, I've found it to be a lot slower than 6.8.9.7-1. I only get this issue when downloading from the repositories or using the PKGBUILD, not when compiling using the instructions at imagemagic.org.

I've discussed the issue with the ImageMagick developers, which you can find at http://www.imagemagick.org/discourse-server/viewtopic.php?f=3&t=26369&sid=a2.... Even when trying to convert a 500x500 image to 100x100, it appears that an image that is 2048x1536 is created.

I've tried this on multiple Arch machines with the same result.

Use code profiler (e.g. 'perf') to record information where the tool spends its CPU cycles. Then compare what is the difference and why newer version is slower.

Matthew Wynn

3:30 a.m.

As shown in the forum post I linked, here is a strace for 6.8.9.7: $ strace -c convert /tmp/test.jpg -limit thread 4 -thumbnail 100x100 -gravity center -background none -extent 100x100 /tmp/mpdcover.png % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000033 1 47 read 0.00 0.000000 0 3 write 0.00 0.000000 0 84 31 open 0.00 0.000000 0 54 close 0.00 0.000000 0 12 3 stat 0.00 0.000000 0 52 fstat 0.00 0.000000 0 10 lseek 0.00 0.000000 0 117 mmap 0.00 0.000000 0 72 mprotect 0.00 0.000000 0 25 munmap 0.00 0.000000 0 9 brk 0.00 0.000000 0 11 rt_sigaction 0.00 0.000000 0 19 rt_sigprocmask 0.00 0.000000 0 4 1 access 0.00 0.000000 0 1 clone 0.00 0.000000 0 1 execve 0.00 0.000000 0 2 getdents 0.00 0.000000 0 1 getcwd 0.00 0.000000 0 1 readlink 0.00 0.000000 0 2 getrlimit 0.00 0.000000 0 18 times 0.00 0.000000 0 1 arch_prctl 0.00 0.000000 0 8 futex 0.00 0.000000 0 1 sched_getaffinity 0.00 0.000000 0 1 set_tid_address 0.00 0.000000 0 2 1 openat 0.00 0.000000 0 1 set_robust_list ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000033 559 36 total and for 6.8.9.8 $ strace -c convert /tmp/test.jpg -limit thread 4 -thumbnail 100x100 -gravity center -background none -extent 100x100 /tmp/mpdcover.png % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 56.99 0.001325 29 46 munmap 43.01 0.001000 18 56 futex 0.00 0.000000 0 50 read 0.00 0.000000 0 4 write 0.00 0.000000 0 89 31 open 0.00 0.000000 0 59 close 0.00 0.000000 0 23 13 stat 0.00 0.000000 0 55 fstat 0.00 0.000000 0 10 lseek 0.00 0.000000 0 139 mmap 0.00 0.000000 0 76 mprotect 0.00 0.000000 0 12 brk 0.00 0.000000 0 11 rt_sigaction 0.00 0.000000 0 19 rt_sigprocmask 0.00 0.000000 0 5 1 access 0.00 0.000000 0 9 madvise 0.00 0.000000 0 3 clone 0.00 0.000000 0 1 execve 0.00 0.000000 0 2 getdents 0.00 0.000000 0 1 getcwd 0.00 0.000000 0 1 readlink 0.00 0.000000 0 2 getrlimit 0.00 0.000000 0 56 times 0.00 0.000000 0 1 arch_prctl 0.00 0.000000 0 1 sched_getaffinity 0.00 0.000000 0 1 set_tid_address 0.00 0.000000 0 2 1 openat 0.00 0.000000 0 1 set_robust_list ------ ----------- ----------- --------- --------- ---------------- 100.00 0.002325 735 46 total If you have a specific perf command that would be more revealing?

...

Date: Thu, 16 Oct 2014 19:54:21 -0700 From: anatol.pomozov@gmail.com To: arch-general@archlinux.org Subject: Re: [arch-general] imagemagick 6.8.9.8-1 is slower than 6.8.9.7-1

Hi

On Thu, Oct 16, 2014 at 7:39 PM, Matthew Wynn <m-wynn@live.com> wrote:

...
After upgrading to imagemagick 6.8.9.8-1, I've found it to be a lot slower than 6.8.9.7-1. I only get this issue when downloading from the repositories or using the PKGBUILD, not when compiling using the instructions at imagemagic.org.

I've discussed the issue with the ImageMagick developers, which you can find at http://www.imagemagick.org/discourse-server/viewtopic.php?f=3&t=26369&sid=a2.... Even when trying to convert a 500x500 image to 100x100, it appears that an image that is 2048x1536 is created.

I've tried this on multiple Arch machines with the same result.

Use code profiler (e.g. 'perf') to record information where the tool spends its CPU cycles. Then compare what is the difference and why newer version is slower.

Anatol Pomozov

4:10 a.m.

Hi On Thu, Oct 16, 2014 at 8:30 PM, Matthew Wynn <m-wynn@live.com> wrote:

...

As shown in the forum post I linked, here is a strace for 6.8.9.7:

$ strace -c convert /tmp/test.jpg -limit thread 4 -thumbnail 100x100 -gravity center -background none -extent 100x100 /tmp/mpdcover.png % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000033 1 47 read 0.00 0.000000 0 3 write 0.00 0.000000 0 84 31 open 0.00 0.000000 0 54 close 0.00 0.000000 0 12 3 stat 0.00 0.000000 0 52 fstat 0.00 0.000000 0 10 lseek 0.00 0.000000 0 117 mmap 0.00 0.000000 0 72 mprotect 0.00 0.000000 0 25 munmap 0.00 0.000000 0 9 brk 0.00 0.000000 0 11 rt_sigaction 0.00 0.000000 0 19 rt_sigprocmask 0.00 0.000000 0 4 1 access 0.00 0.000000 0 1 clone 0.00 0.000000 0 1 execve 0.00 0.000000 0 2 getdents 0.00 0.000000 0 1 getcwd 0.00 0.000000 0 1 readlink 0.00 0.000000 0 2 getrlimit 0.00 0.000000 0 18 times 0.00 0.000000 0 1 arch_prctl 0.00 0.000000 0 8 futex 0.00 0.000000 0 1 sched_getaffinity 0.00 0.000000 0 1 set_tid_address 0.00 0.000000 0 2 1 openat 0.00 0.000000 0 1 set_robust_list ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000033 559 36 total

and for 6.8.9.8 $ strace -c convert /tmp/test.jpg -limit thread 4 -thumbnail 100x100 -gravity center -background none -extent 100x100 /tmp/mpdcover.png % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 56.99 0.001325 29 46 munmap 43.01 0.001000 18 56 futex 0.00 0.000000 0 50 read 0.00 0.000000 0 4 write 0.00 0.000000 0 89 31 open 0.00 0.000000 0 59 close 0.00 0.000000 0 23 13 stat 0.00 0.000000 0 55 fstat 0.00 0.000000 0 10 lseek 0.00 0.000000 0 139 mmap 0.00 0.000000 0 76 mprotect 0.00 0.000000 0 12 brk 0.00 0.000000 0 11 rt_sigaction 0.00 0.000000 0 19 rt_sigprocmask 0.00 0.000000 0 5 1 access 0.00 0.000000 0 9 madvise 0.00 0.000000 0 3 clone 0.00 0.000000 0 1 execve 0.00 0.000000 0 2 getdents 0.00 0.000000 0 1 getcwd 0.00 0.000000 0 1 readlink 0.00 0.000000 0 2 getrlimit 0.00 0.000000 0 56 times 0.00 0.000000 0 1 arch_prctl 0.00 0.000000 0 1 sched_getaffinity 0.00 0.000000 0 1 set_tid_address 0.00 0.000000 0 2 1 openat 0.00 0.000000 0 1 set_robust_list ------ ----------- ----------- --------- --------- ---------------- 100.00 0.002325 735 46 total

The strace output says that cpu time spent by the syscall in kernel is small (2.3 ms) so the rest is spent in userspase. You need a profiler (e.g. perf) get more information. Run "perf record -g $YOUR_COMMAND" then "perf report" and "perf annotate". "perf annotate" shows better results if you recompile imagemagick with debug symbols enabled. I highly recommend to learn perf (independently of this issue), it is very powerful and useful development tool. See more information here https://perf.wiki.kernel.org/index.php/Tutorial

...

If you have a specific perf command that would be more revealing?

...
Date: Thu, 16 Oct 2014 19:54:21 -0700 From: anatol.pomozov@gmail.com To: arch-general@archlinux.org Subject: Re: [arch-general] imagemagick 6.8.9.8-1 is slower than 6.8.9.7-1

Hi

On Thu, Oct 16, 2014 at 7:39 PM, Matthew Wynn <m-wynn@live.com> wrote:

...
After upgrading to imagemagick 6.8.9.8-1, I've found it to be a lot slower than 6.8.9.7-1. I only get this issue when downloading from the repositories or using the PKGBUILD, not when compiling using the instructions at imagemagic.org.

I've discussed the issue with the ImageMagick developers, which you can find at

http://www.imagemagick.org/discourse-server/viewtopic.php?f=3&t=26369&sid=a2... .

...
Even when trying to convert a 500x500 image to 100x100, it appears that an image that is 2048x1536 is created.

I've tried this on multiple Arch machines with the same result.

Use code profiler (e.g. 'perf') to record information where the tool spends its CPU cycles. Then compare what is the difference and why newer version is slower.

Lukas Fleischer

6:58 p.m.

On Fri, 17 Oct 2014 at 04:39:33, Matthew Wynn wrote:

...

After upgrading to imagemagick 6.8.9.8-1, I've found it to be a lot slower than 6.8.9.7-1. I only get this issue when downloading from the repositories or using the PKGBUILD, not when compiling using the instructions at imagemagic.org.

I've discussed the issue with the ImageMagick developers, which you can find at http://www.imagemagick.org/discourse-server/viewtopic.php?f=3&t=26369&sid=a2.... Even when trying to convert a 500x500 image to 100x100, it appears that an image that is 2048x1536 is created.

I've tried this on multiple Arch machines with the same result.

Anatol and I were able to reproduce this issue. It seems to be related to __memcpy_avx_unaligned() in glibc which means that it only occurs on architectures with the AVX extension. If you have a look at the memcpy-avx-unaligned.S source code [1], you will notice that there are several branches that copy blocks of different sizes. Now, for some reason, 6.8.9.8-1 always (or almost always) picks the L(less_32bytes) branch which means that only small blocks are copied, while 6.8.9.7-1 copies larger blocks. I do not have the time to debug this in detail but maybe you can add this information to the upstream report? Regards, Lukas [1] https://sourceware.org/git/?p=glibc.git;a=blob_plain;f=sysdeps/x86_64/multia...

Lukas Fleischer

18 Oct 18 Oct

6:22 a.m.

On Fri, 17 Oct 2014 at 20:58:22, Lukas Fleischer wrote:

...

[...] Anatol and I were able to reproduce this issue. It seems to be related to __memcpy_avx_unaligned() in glibc which means that it only occurs on architectures with the AVX extension.

If you have a look at the memcpy-avx-unaligned.S source code [1], you will notice that there are several branches that copy blocks of different sizes. Now, for some reason, 6.8.9.8-1 always (or almost always) picks the L(less_32bytes) branch which means that only small blocks are copied, while 6.8.9.7-1 copies larger blocks. I do not have the time to debug this in detail but maybe you can add this information to the upstream report?

After a more thorough analysis by Jan, we found out that the issue is caused by a bug in the new OpenCL benchmark code. On modern CPUs, the benchmark is executed on every start of ImageMagick which leads to huge number of __memcpy_avx_unaligned() calls with a small block size. Jan prepared a patch and he will submit it upstream.

...

Regards, Lukas

[1] https://sourceware.org/git/?p=glibc.git;a=blob_plain;f=sysdeps/x86_64/multia...

ProgAndy

6:41 a.m.

Am 18.10.2014 um 08:22 schrieb Lukas Fleischer:

...

After a more thorough analysis by Jan, we found out that the issue is caused by a bug in the new OpenCL benchmark code. On modern CPUs, the benchmark is executed on every start of ImageMagick which leads to huge number of __memcpy_avx_unaligned() calls with a small block size. Jan prepared a patch and he will submit it upstream. If you wish to use the current imagemagick version without a recompile you can set the opencl cache to a missing directory. Then imagemagick cannot write the cachefile and skips the benchmark.

env MAGICK_OPENCL_CACHE_DIR=/tmp/CACHE_DOESNT_EXIST convert ...

3984

Age (days ago)

3985

Last active (days ago)

List overview

Download

6 comments

4 participants

participants (4)

Anatol Pomozov
Lukas Fleischer
Matthew Wynn
ProgAndy