[arch-general] IPC slowness
Dear Arch, I've noticed a 20x slowdown on a simple IPC task on a couple of recently updated Arch systems. Here is a script to exhibit the problem: #!/bin/bash SCREEN=/usr/bin/screen $SCREEN -dmS x time $SCREEN -S x -X setenv A B $SCREEN -S x -X quit On my laptop, which is a slower Arch system, it takes about 0.4 seconds, with 0.2s for user and 0.2s for system. On a faster Arch system, it's 0.11s, 0.03s user and 0.09s system. On a *really* slow Arch system that hasn't been updated since August, it takes 0.015s (0.01s user and 0.00s system). On a Raspbian system it is 0.014s, and on a Debian-based virtual server it is 0.006s. In summary, the recent Arch systems are about 20x slower than the Debian systems for this command, but the August 2018 Arch laptop is just as fast. Any ideas what might be causing this? I tried to debug with 'strace -r -p' and it looks like most of the time is getting spent in a 'select' call. 0.000171 geteuid() = 1000 0.000075 getegid() = 1000 0.000080 select(1024, [3 4 5 6], [], NULL, NULL) = 1 (in [3]) 0.399534 fcntl(3, F_SETFL, O_RDONLY) = 0 0.000171 read(3, "\0gsm\4\0\0\0/dev/pts/33\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 12336) = 12336 0.000169 close(3) = 0 Thanks, Frederick
On 03/07/2019 03:51 PM, frederik@ofb.net wrote:
Any ideas what might be causing this? I tried to debug with 'strace -r -p' and it looks like most of the time is getting spent in a 'select' call.
0.000171 geteuid() = 1000 0.000075 getegid() = 1000 0.000080 select(1024, [3 4 5 6], [], NULL, NULL) = 1 (in [3]) 0.399534 fcntl(3, F_SETFL, O_RDONLY) = 0 0.000171 read(3, "\0gsm\4\0\0\0/dev/pts/33\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 12336) = 12336 0.000169 close(3) = 0
Are you using Arch on bare-metal or in VirtualBox 6.x by chance? I saw similar slowdowns with VirtualBox 6.0 (and 6.2) when run --headless and similar strace errors (although with poll/write), e.g. [pid 32452] poll([{fd=3, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=3, revents=POLLOUT}]) [pid 32452] writev(3, [{iov_base="b\0\4\0\6\0@\4", iov_len=8}, {iov_base="RENDER", iov_len=6}, {iov_base="\0\0", io v_len=2}], 3) = 16 [pid 32452] futex(0x7fffd1f3f468, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 32493] +++ killed by SIGABRT (core dumped) +++ -- David C. Rankin, J.D.,P.E.
Are you using Arch on bare-metal or in VirtualBox 6.x by chance? I saw similar slowdowns with VirtualBox 6.0 (and 6.2) when run --headless and similar strace errors (although with poll/write), e.g.
There is no virtualization on my machines. Did anyone try to reproduce it using my script? Thanks, Frederick
On 08.03.2019 07:15, frederik@ofb.net wrote:
There is no virtualization on my machines. Did anyone try to reproduce it using my script?
$ bash archtest.sh real 0m0,004s user 0m0,004s sys 0m0,000s $ inxi CPU: Dual Core Intel Core i5-3320M (-MT MCP-) speed/min/max: 1559/1200/3300 MHz Kernel: 4.20.13-arch1-1-ARCH x86_64 Up: 7m System updated 2019-03-03
On 3/8/19 7:15 AM, frederik@ofb.net wrote:
There is no virtualization on my machines.
Did anyone try to reproduce it using my script?
./bench-test.sh real 0m0.111s user 0m0.040s sys 0m0.070s Intel Core i7-8550U, Kernel: 5.0.0-arch1-1-ARCH
$ bash sc.sh real 0m0,140s user 0m0,044s sys 0m0,097s $inxi CPU: Quad Core Intel Xeon E5-1620 v2 (-MT MCP-) speed/min/max: 1497/1200/3900 MHz Kernel: 5.0.0-arch1-1-ARCH x86_64 Up: 1d 1h 34m Mem: 2791.9/64368.0 MiB (4.3%) Storage: 1.83 TiB (23.8% used) Procs: 230 Shell: bash 5.0.0 inxi: 3.0.30 The same on my laptop. It gets slightly better if you remove any .screenrc, but that's not very significant.
+ grep -m 1 'model name' /proc/cpuinfo | awk -F ': ' '{print $2}';uname -r; screen -v; for i in {1..5}; do ./test.sh; sleep 1; done Intel(R) Core(TM) i5-7500T CPU @ 2.70GHz 4.19.26-1-lts Screen version 4.06.02 (GNU) 23-Oct-17 real 0m0.139s user 0m0.060s sys 0m0.079s real 0m0.140s user 0m0.050s sys 0m0.090s real 0m0.140s user 0m0.050s sys 0m0.090s real 0m0.140s user 0m0.090s sys 0m0.050s real 0m0.139s user 0m0.050s sys 0m0.089s
Hi Frederick,
#!/bin/bash SCREEN=/usr/bin/screen $SCREEN -dmS x time $SCREEN -S x -X setenv A B $SCREEN -S x -X quit
Perhaps it's closing more file descriptors now than before? $ strace -c screen -dmS x Directory '/run/screens' must have mode 777. % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 99.98 11.762280 22 524299 524285 close 0.00 0.000328 16 20 8 openat 0.00 0.000323 10 30 mmap 0.00 0.000304 16 18 read 0.00 0.000192 14 13 fstat 0.00 0.000132 66 2 2 connect 0.00 0.000130 32 4 lstat 0.00 0.000104 26 4 rt_sigaction 0.00 0.000094 94 1 readlink 0.00 0.000092 46 2 socket 0.00 0.000071 7 9 7 stat 0.00 0.000063 31 2 munmap 0.00 0.000063 21 3 brk 0.00 0.000048 48 1 write 0.00 0.000041 3 11 mprotect 0.00 0.000041 10 4 4 access 0.00 0.000032 32 1 ioctl 0.00 0.000028 7 4 lseek 0.00 0.000025 25 1 umask 0.00 0.000025 25 1 getuid 0.00 0.000025 25 1 getgid 0.00 0.000025 25 1 geteuid 0.00 0.000024 24 1 getegid 0.00 0.000000 0 1 execve 0.00 0.000000 0 3 fcntl 0.00 0.000000 0 2 1 arch_prctl 0.00 0.000000 0 1 prlimit64 ------ ----------- ----------- --------- --------- ---------------- 100.00 11.764490 524440 524307 total $ And that's because of prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=512*1024}) = 0 close(524287) = -1 EBADF (Bad file descriptor) close(524286) = -1 EBADF (Bad file descriptor) close(524285) = -1 EBADF (Bad file descriptor) ... -- Cheers, Ralph.
Hi again Frederick,
% time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 99.98 11.762280 22 524299 524285 close
Ah, here we are. https://savannah.gnu.org/bugs/?55618 -- Cheers, Ralph.
Big thank you to Ralph for tracking that down and for teaching me about strace. Also thanks to those who reproduced the problem, which was not after all as general my message subject had advertised. I recall noticing years ago that Screen tends to start somewhat slowly as a result of closing 4000 file descriptors. I never bothered to figure out why it did this. I guess it's good that Systemd exercised this a bit, since now we can fix a decades-old bug. Frederick On Fri, Mar 08, 2019 at 11:49:29AM +0000, Ralph Corderoy wrote:
Hi again Frederick,
% time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 99.98 11.762280 22 524299 524285 close
Ah, here we are. https://savannah.gnu.org/bugs/?55618
-- Cheers, Ralph.
participants (7)
-
Dan R Beste
-
David C. Rankin
-
frederik@ofb.net
-
Jens Hausdorf
-
Marco
-
Ralph Corderoy
-
Sławomir Szczyrba