[arch-general] imagemagick 6.8.9.8-1 is slower than 6.8.9.7-1
After upgrading to imagemagick 6.8.9.8-1, I've found it to be a lot slower than 6.8.9.7-1. I only get this issue when downloading from the repositories or using the PKGBUILD, not when compiling using the instructions at imagemagic.org. I've discussed the issue with the ImageMagick developers, which you can find at http://www.imagemagick.org/discourse-server/viewtopic.php?f=3&t=26369&sid=a29400f51bd723e98da6b74c0819e9b3. Even when trying to convert a 500x500 image to 100x100, it appears that an image that is 2048x1536 is created. I've tried this on multiple Arch machines with the same result.
Hi On Thu, Oct 16, 2014 at 7:39 PM, Matthew Wynn <m-wynn@live.com> wrote:
After upgrading to imagemagick 6.8.9.8-1, I've found it to be a lot slower than 6.8.9.7-1. I only get this issue when downloading from the repositories or using the PKGBUILD, not when compiling using the instructions at imagemagic.org.
I've discussed the issue with the ImageMagick developers, which you can find at http://www.imagemagick.org/discourse-server/viewtopic.php?f=3&t=26369&sid=a29400f51bd723e98da6b74c0819e9b3. Even when trying to convert a 500x500 image to 100x100, it appears that an image that is 2048x1536 is created.
I've tried this on multiple Arch machines with the same result.
Use code profiler (e.g. 'perf') to record information where the tool spends its CPU cycles. Then compare what is the difference and why newer version is slower.
As shown in the forum post I linked, here is a strace for 6.8.9.7: $ strace -c convert /tmp/test.jpg -limit thread 4 -thumbnail 100x100 -gravity center -background none -extent 100x100 /tmp/mpdcover.png % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000033 1 47 read 0.00 0.000000 0 3 write 0.00 0.000000 0 84 31 open 0.00 0.000000 0 54 close 0.00 0.000000 0 12 3 stat 0.00 0.000000 0 52 fstat 0.00 0.000000 0 10 lseek 0.00 0.000000 0 117 mmap 0.00 0.000000 0 72 mprotect 0.00 0.000000 0 25 munmap 0.00 0.000000 0 9 brk 0.00 0.000000 0 11 rt_sigaction 0.00 0.000000 0 19 rt_sigprocmask 0.00 0.000000 0 4 1 access 0.00 0.000000 0 1 clone 0.00 0.000000 0 1 execve 0.00 0.000000 0 2 getdents 0.00 0.000000 0 1 getcwd 0.00 0.000000 0 1 readlink 0.00 0.000000 0 2 getrlimit 0.00 0.000000 0 18 times 0.00 0.000000 0 1 arch_prctl 0.00 0.000000 0 8 futex 0.00 0.000000 0 1 sched_getaffinity 0.00 0.000000 0 1 set_tid_address 0.00 0.000000 0 2 1 openat 0.00 0.000000 0 1 set_robust_list ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000033 559 36 total and for 6.8.9.8 $ strace -c convert /tmp/test.jpg -limit thread 4 -thumbnail 100x100 -gravity center -background none -extent 100x100 /tmp/mpdcover.png % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 56.99 0.001325 29 46 munmap 43.01 0.001000 18 56 futex 0.00 0.000000 0 50 read 0.00 0.000000 0 4 write 0.00 0.000000 0 89 31 open 0.00 0.000000 0 59 close 0.00 0.000000 0 23 13 stat 0.00 0.000000 0 55 fstat 0.00 0.000000 0 10 lseek 0.00 0.000000 0 139 mmap 0.00 0.000000 0 76 mprotect 0.00 0.000000 0 12 brk 0.00 0.000000 0 11 rt_sigaction 0.00 0.000000 0 19 rt_sigprocmask 0.00 0.000000 0 5 1 access 0.00 0.000000 0 9 madvise 0.00 0.000000 0 3 clone 0.00 0.000000 0 1 execve 0.00 0.000000 0 2 getdents 0.00 0.000000 0 1 getcwd 0.00 0.000000 0 1 readlink 0.00 0.000000 0 2 getrlimit 0.00 0.000000 0 56 times 0.00 0.000000 0 1 arch_prctl 0.00 0.000000 0 1 sched_getaffinity 0.00 0.000000 0 1 set_tid_address 0.00 0.000000 0 2 1 openat 0.00 0.000000 0 1 set_robust_list ------ ----------- ----------- --------- --------- ---------------- 100.00 0.002325 735 46 total If you have a specific perf command that would be more revealing?
Date: Thu, 16 Oct 2014 19:54:21 -0700 From: anatol.pomozov@gmail.com To: arch-general@archlinux.org Subject: Re: [arch-general] imagemagick 6.8.9.8-1 is slower than 6.8.9.7-1
Hi
On Thu, Oct 16, 2014 at 7:39 PM, Matthew Wynn <m-wynn@live.com> wrote:
After upgrading to imagemagick 6.8.9.8-1, I've found it to be a lot slower than 6.8.9.7-1. I only get this issue when downloading from the repositories or using the PKGBUILD, not when compiling using the instructions at imagemagic.org.
I've discussed the issue with the ImageMagick developers, which you can find at http://www.imagemagick.org/discourse-server/viewtopic.php?f=3&t=26369&sid=a29400f51bd723e98da6b74c0819e9b3. Even when trying to convert a 500x500 image to 100x100, it appears that an image that is 2048x1536 is created.
I've tried this on multiple Arch machines with the same result.
Use code profiler (e.g. 'perf') to record information where the tool spends its CPU cycles. Then compare what is the difference and why newer version is slower.
Hi On Thu, Oct 16, 2014 at 8:30 PM, Matthew Wynn <m-wynn@live.com> wrote:
As shown in the forum post I linked, here is a strace for 6.8.9.7:
$ strace -c convert /tmp/test.jpg -limit thread 4 -thumbnail 100x100 -gravity center -background none -extent 100x100 /tmp/mpdcover.png % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000033 1 47 read 0.00 0.000000 0 3 write 0.00 0.000000 0 84 31 open 0.00 0.000000 0 54 close 0.00 0.000000 0 12 3 stat 0.00 0.000000 0 52 fstat 0.00 0.000000 0 10 lseek 0.00 0.000000 0 117 mmap 0.00 0.000000 0 72 mprotect 0.00 0.000000 0 25 munmap 0.00 0.000000 0 9 brk 0.00 0.000000 0 11 rt_sigaction 0.00 0.000000 0 19 rt_sigprocmask 0.00 0.000000 0 4 1 access 0.00 0.000000 0 1 clone 0.00 0.000000 0 1 execve 0.00 0.000000 0 2 getdents 0.00 0.000000 0 1 getcwd 0.00 0.000000 0 1 readlink 0.00 0.000000 0 2 getrlimit 0.00 0.000000 0 18 times 0.00 0.000000 0 1 arch_prctl 0.00 0.000000 0 8 futex 0.00 0.000000 0 1 sched_getaffinity 0.00 0.000000 0 1 set_tid_address 0.00 0.000000 0 2 1 openat 0.00 0.000000 0 1 set_robust_list ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000033 559 36 total
and for 6.8.9.8 $ strace -c convert /tmp/test.jpg -limit thread 4 -thumbnail 100x100 -gravity center -background none -extent 100x100 /tmp/mpdcover.png % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 56.99 0.001325 29 46 munmap 43.01 0.001000 18 56 futex 0.00 0.000000 0 50 read 0.00 0.000000 0 4 write 0.00 0.000000 0 89 31 open 0.00 0.000000 0 59 close 0.00 0.000000 0 23 13 stat 0.00 0.000000 0 55 fstat 0.00 0.000000 0 10 lseek 0.00 0.000000 0 139 mmap 0.00 0.000000 0 76 mprotect 0.00 0.000000 0 12 brk 0.00 0.000000 0 11 rt_sigaction 0.00 0.000000 0 19 rt_sigprocmask 0.00 0.000000 0 5 1 access 0.00 0.000000 0 9 madvise 0.00 0.000000 0 3 clone 0.00 0.000000 0 1 execve 0.00 0.000000 0 2 getdents 0.00 0.000000 0 1 getcwd 0.00 0.000000 0 1 readlink 0.00 0.000000 0 2 getrlimit 0.00 0.000000 0 56 times 0.00 0.000000 0 1 arch_prctl 0.00 0.000000 0 1 sched_getaffinity 0.00 0.000000 0 1 set_tid_address 0.00 0.000000 0 2 1 openat 0.00 0.000000 0 1 set_robust_list ------ ----------- ----------- --------- --------- ---------------- 100.00 0.002325 735 46 total
The strace output says that cpu time spent by the syscall in kernel is small (2.3 ms) so the rest is spent in userspase. You need a profiler (e.g. perf) get more information. Run "perf record -g $YOUR_COMMAND" then "perf report" and "perf annotate". "perf annotate" shows better results if you recompile imagemagick with debug symbols enabled. I highly recommend to learn perf (independently of this issue), it is very powerful and useful development tool. See more information here https://perf.wiki.kernel.org/index.php/Tutorial
If you have a specific perf command that would be more revealing?
Date: Thu, 16 Oct 2014 19:54:21 -0700 From: anatol.pomozov@gmail.com To: arch-general@archlinux.org Subject: Re: [arch-general] imagemagick 6.8.9.8-1 is slower than 6.8.9.7-1
Hi
On Thu, Oct 16, 2014 at 7:39 PM, Matthew Wynn <m-wynn@live.com> wrote:
After upgrading to imagemagick 6.8.9.8-1, I've found it to be a lot slower than 6.8.9.7-1. I only get this issue when downloading from the repositories or using the PKGBUILD, not when compiling using the instructions at imagemagic.org.
I've discussed the issue with the ImageMagick developers, which you can find at
Even when trying to convert a 500x500 image to 100x100, it appears
image that is 2048x1536 is created.
I've tried this on multiple Arch machines with the same result.
Use code profiler (e.g. 'perf') to record information where the tool spends its CPU cycles. Then compare what is the difference and why newer version is slower.
On Fri, 17 Oct 2014 at 04:39:33, Matthew Wynn wrote:
After upgrading to imagemagick 6.8.9.8-1, I've found it to be a lot slower than 6.8.9.7-1. I only get this issue when downloading from the repositories or using the PKGBUILD, not when compiling using the instructions at imagemagic.org.
I've discussed the issue with the ImageMagick developers, which you can find at http://www.imagemagick.org/discourse-server/viewtopic.php?f=3&t=26369&sid=a29400f51bd723e98da6b74c0819e9b3. Even when trying to convert a 500x500 image to 100x100, it appears that an image that is 2048x1536 is created.
I've tried this on multiple Arch machines with the same result.
Anatol and I were able to reproduce this issue. It seems to be related to __memcpy_avx_unaligned() in glibc which means that it only occurs on architectures with the AVX extension. If you have a look at the memcpy-avx-unaligned.S source code [1], you will notice that there are several branches that copy blocks of different sizes. Now, for some reason, 6.8.9.8-1 always (or almost always) picks the L(less_32bytes) branch which means that only small blocks are copied, while 6.8.9.7-1 copies larger blocks. I do not have the time to debug this in detail but maybe you can add this information to the upstream report? Regards, Lukas [1] https://sourceware.org/git/?p=glibc.git;a=blob_plain;f=sysdeps/x86_64/multia...
On Fri, 17 Oct 2014 at 20:58:22, Lukas Fleischer wrote:
[...] Anatol and I were able to reproduce this issue. It seems to be related to __memcpy_avx_unaligned() in glibc which means that it only occurs on architectures with the AVX extension.
If you have a look at the memcpy-avx-unaligned.S source code [1], you will notice that there are several branches that copy blocks of different sizes. Now, for some reason, 6.8.9.8-1 always (or almost always) picks the L(less_32bytes) branch which means that only small blocks are copied, while 6.8.9.7-1 copies larger blocks. I do not have the time to debug this in detail but maybe you can add this information to the upstream report?
After a more thorough analysis by Jan, we found out that the issue is caused by a bug in the new OpenCL benchmark code. On modern CPUs, the benchmark is executed on every start of ImageMagick which leads to huge number of __memcpy_avx_unaligned() calls with a small block size. Jan prepared a patch and he will submit it upstream.
Regards, Lukas
[1] https://sourceware.org/git/?p=glibc.git;a=blob_plain;f=sysdeps/x86_64/multia...
Am 18.10.2014 um 08:22 schrieb Lukas Fleischer:
After a more thorough analysis by Jan, we found out that the issue is caused by a bug in the new OpenCL benchmark code. On modern CPUs, the benchmark is executed on every start of ImageMagick which leads to huge number of __memcpy_avx_unaligned() calls with a small block size. Jan prepared a patch and he will submit it upstream. If you wish to use the current imagemagick version without a recompile you can set the opencl cache to a missing directory. Then imagemagick cannot write the cachefile and skips the benchmark.
env MAGICK_OPENCL_CACHE_DIR=/tmp/CACHE_DOESNT_EXIST convert ...
participants (4)
-
Anatol Pomozov
-
Lukas Fleischer
-
Matthew Wynn
-
ProgAndy