[arch-general] GDB does not work, sometimes
Hi! I'm having a very weird problem with gdb. I hadn't used the debugger for some time, so I'm not sure for how long it has been broken... The problem is that the debugged program does not stop in any breakpoing, exception, signal... nothing. Just as if it weren't being debugged. First I thought that it was a problem with GDB and I was ready to submit a bug report, but then... I have gdb-7.6.1-1 and that has been in the repositories for several months. Surely someone noticed if it didn't work at all for so long! So I started investigating, but how to debug a broken debugger? Long story made short: * gdb fails even with the simplest program (`int main() {}`). I even disabled the start-up stcripts with "gdb -n", no difference. * it fails with programs compiled with both gcc and clang. * it does not fail if I debug as root. * it does not fail if I debug in a VT instead of a PTY. So it looks like some kind of permission issues. Maybe something related to the recent SELinux changes? Googling around I've found that setting /proc/sys/kernel/yama/ptrace-scope to 0 solves the issue. But it should not work this way: the idea of this key is that by default only the parent of a process can ptrace/debug it. And certainly, gdb is the parent of the processes it spawns (yes, I've checked it). Moreover, running "ptrace ls" works fine, so why not "gdb ls"? And what's the difference between a VT and a PTY? That I cannot understand. For now I can workaround with the "yama/ptrace-scope", but does anybody know what is happening? Best regards. -- Rodrigo
Rodrigo Rivas <rodrigorivascosta <at> gmail.com> writes:
Hi!
I'm having a very weird problem with gdb. I hadn't used the debugger for some time, so I'm not sure for how long it has been broken... The problem is that the debugged program does not stop in any breakpoing, exception, signal... nothing. Just as if it weren't being debugged. ...
No problem here. gdb 7.6.1-1 $ cat /proc/sys/kernel/yama/ptrace_scope 1 $ Perhaps you should start with some basic steps. Are you using any repos beyond the 3 basic ones (core, extra, community) ? Re-install gdb the "hard" way: pacman -Suy pacman -R gdb pacman -S gdb find /etc -iname "*.pac*" Verify dependencies: pacman -Qi gdb Verify: $ ls -al /usr/bin/gdb -rwxr-xr-x 1 root root 4634932 Aug 31 06:13 /usr/bin/gdb $ ldd /usr/bin/gdb linux-gate.so.1 (0xb77b0000) libreadline.so.6 => /usr/lib/libreadline.so.6 (0xb775b000) libdl.so.2 => /usr/lib/libdl.so.2 (0xb7756000) libncursesw.so.5 => /usr/lib/libncursesw.so.5 (0xb76f7000) libz.so.1 => /usr/lib/libz.so.1 (0xb76e0000) libm.so.6 => /usr/lib/libm.so.6 (0xb769a000) libpthread.so.0 => /usr/lib/libpthread.so.0 (0xb767d000) libpython2.7.so.1.0 => /usr/lib/libpython2.7.so.1.0 (0xb74e5000) libexpat.so.1 => /usr/lib/libexpat.so.1 (0xb74bc000) liblzma.so.5 => /usr/lib/liblzma.so.5 (0xb7495000) libc.so.6 => /usr/lib/libc.so.6 (0xb72e5000) /lib/ld-linux.so.2 (0xb77b1000) libutil.so.1 => /usr/lib/libutil.so.1 (0xb72e0000) $ Verify: $ cat /etc/gdb/gdbinit $ mv .gdbinit .gdbinit-save You can try debugging gdb itself first: $ gdb gdb or $ strace gdb jb
On Sat, Nov 23, 2013 at 3:18 PM, jb <jb.1234abcd@gmail.com> wrote:
Rodrigo Rivas <rodrigorivascosta <at> gmail.com> writes:
Hi!
I'm having a very weird problem with gdb. I hadn't used the debugger for some time, so I'm not sure for how long it has been broken... The problem is that the debugged program does not stop in any breakpoing, exception, signal... nothing. Just as if it weren't being debugged. ...
No problem here. gdb 7.6.1-1 $ cat /proc/sys/kernel/yama/ptrace_scope 1 $
Perhaps you should start with some basic steps. Are you using any repos beyond the 3 basic ones (core, extra, community) ?
Re-install gdb the "hard" way: pacman -Suy pacman -R gdb pacman -S gdb find /etc -iname "*.pac*"
Verify dependencies: pacman -Qi gdb
Verify: $ ls -al /usr/bin/gdb -rwxr-xr-x 1 root root 4634932 Aug 31 06:13 /usr/bin/gdb $ ldd /usr/bin/gdb linux-gate.so.1 (0xb77b0000) libreadline.so.6 => /usr/lib/libreadline.so.6 (0xb775b000) libdl.so.2 => /usr/lib/libdl.so.2 (0xb7756000) libncursesw.so.5 => /usr/lib/libncursesw.so.5 (0xb76f7000) libz.so.1 => /usr/lib/libz.so.1 (0xb76e0000) libm.so.6 => /usr/lib/libm.so.6 (0xb769a000) libpthread.so.0 => /usr/lib/libpthread.so.0 (0xb767d000) libpython2.7.so.1.0 => /usr/lib/libpython2.7.so.1.0 (0xb74e5000) libexpat.so.1 => /usr/lib/libexpat.so.1 (0xb74bc000) liblzma.so.5 => /usr/lib/liblzma.so.5 (0xb7495000) libc.so.6 => /usr/lib/libc.so.6 (0xb72e5000) /lib/ld-linux.so.2 (0xb77b1000) libutil.so.1 => /usr/lib/libutil.so.1 (0xb72e0000) $
Verify: $ cat /etc/gdb/gdbinit $ mv .gdbinit .gdbinit-save
You can try debugging gdb itself first: $ gdb gdb or $ strace gdb
jb
Thank for the suggestions! I've already done that: latest versions of everything, no extra repositories... I've even recompiled "gdb" using the ABS (with debugging symbols) and the same result. The main difference from my machine to yours is that I'm using x86_64, so the ldd output is different. I'm also using Cinnamon, maybe the session management have something to do with it... I've strace'd two runs of gdb with the same program, one working (from a VT) and one not working (from a PTY). The main difference that I can see is that the non-working one does not call `ptrace()` at all. I've tried debugging two instances of gdb in parallel, one working and one not working. Actually quite confusing: 4 instances of gdb running at once, working or failing for no apparent reasons... What I've noticed is that if I'm debugging gdb and I put a breakpoint in some selected places (`br vfork`, for example) then the problem fixes itself for this execution. Now I think that it may actually be a bug in gdb... I think I'll report to upstream and see what happens. Best regards -- Rodrigo
On 23/11/2013 13:14, Rodrigo Rivas wrote:
So it looks like some kind of permission issues. Maybe something related to the recent SELinux changes?
Nothing related to SELinux support has been merged in official Arch packages for now (as far as I know). And even if it had been, SELinux would certainly not be enabled by default without a big news. You can cross this one off. Tim
On Sun, Nov 24, 2013 at 1:03 AM, Timothée Ravier <siosm99@gmail.com> wrote:
On 23/11/2013 13:14, Rodrigo Rivas wrote:
So it looks like some kind of permission issues. Maybe something related to the recent SELinux changes?
Nothing related to SELinux support has been merged in official Arch packages for now (as far as I know). And even if it had been, SELinux would certainly not be enabled by default without a big news. You can cross this one off.
Tim
Oh! I have read something about that, but did not check the details... Anyway, I like mysteries, so I kept on trying to find out the reason for this... and I think that I found something, after the one of the hardest debugging sessions I can remember. The problem is in the "signal mask". It looks like some process masks the signals in the early boot, and then the signal mask is inherited by all the process in my session. And, as it seems, `gdb` needs a lot of signals to work properly, but it assumes that they are not masked at the beginning. I don't know if this should be considered a gdb bug or not, but the real problem is elsewhere. And that also explains why from a VT (or a SSH session, BTW) there is no problem: there, the shell is not part of the graphic session, and so the signal mask is correct (and actually the `yuma/ptrace_scope` made no difference at all). For example, running: $ grep SigBlk /proc/$$/status SigBlk: 00007ffe597b0408 The funny thing is that this number looks nothing like a signal mask, and everything like a memory address: $ ldd /usr/bin/true /linux-vdso.so.1 (0x00007fffc03ac000) /libc.so.6 => /usr/lib/libc.so.6 (0x00007f7942fea000) /lib64/ld-linux-x86-64.so.2 (0x00007f7943395000) So I am now pretty sure that some process in the session is corrupting the signal mask. The only thing left is to know which one... -- Rodrigo
Rodrigo Rivas <rodrigorivascosta <at> gmail.com> writes:
... The problem is in the "signal mask". It looks like some process masks the signals in the early boot, and then the signal mask is inherited by all the process in my session. And, as it seems, `gdb` needs a lot of signals to work properly, but it assumes that they are not masked at the beginning. I don't know if this should be considered a gdb bug or not, but the real problem is elsewhere. ... So I am now pretty sure that some process in the session is corrupting the signal mask. The only thing left is to know which one...
Review it: $ pstree -p Your signal blocking could come from: - GUI Login Manager, DE session, ... Check for gnome-session PID and for its parent PID: $ grep SigBlk /proc/$pid/status - gnome-terminal (I guess) in which you run gdb Check as above. This entry gives you an overview of signal states; it will help you match processes to e.g. SigBlk pattern: $ ps axs | grep fffffffe7ffbfeff Btw, a similar problem occured there: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=499569 jb PS: Why is this mailing list so slow in reflecting posts ? Is is managed but neglected or delayed on purpose ? jb
On Mon, Nov 25, 2013 at 7:30 AM, jb <jb.1234abcd@gmail.com> wrote:
Rodrigo Rivas <rodrigorivascosta <at> gmail.com> writes:
... The problem is in the "signal mask". It looks like some process masks the signals in the early boot, and then the signal mask is inherited by all the process in my session. And, as it seems, `gdb` needs a lot of signals to work properly, but it assumes that they are not masked at the beginning. I don't know if this should be considered a gdb bug or not, but the real problem is elsewhere. ... So I am now pretty sure that some process in the session is corrupting the signal mask. The only thing left is to know which one...
Review it: $ pstree -p
Your signal blocking could come from: - GUI Login Manager, DE session, ... Check for gnome-session PID and for its parent PID: $ grep SigBlk /proc/$pid/status - gnome-terminal (I guess) in which you run gdb Check as above.
This entry gives you an overview of signal states; it will help you match processes to e.g. SigBlk pattern: $ ps axs | grep fffffffe7ffbfeff
Btw, a similar problem occured there: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=499569
I think I've finally solved it! The first process in my session that gets the wrong signal mask is cinnamon-session, so that's where I started looking. I managed to trace the bug until the conection to logind via dbus. But the connection is made using glib/gio, so I looked there. Then I traced the glib code until a deep buried call to `pthread_create()`. That is in glibc! So I got the sources and debugged again. There things get complicated... The bug happens somewhere between glib calling pthread_create() and glibc's implementation of that very same function. I got a few stack traces and they all pointed to one suspect... /usr/lib/libnvidia-tls.so.331.20 Alas, the source of that file is not available, so my investigation ends here. I did a rollback to nvidia-325.15-11 and {nvidia-libgl,nvidia-utils,opencl-nvidia}-325.15-1 and all is back to normality 8-). A quick search in the web shows that it has happened before [1] [2] [3] I'm reporting my findings out there. To anybody that is still reading, thank you for you attention. -- Rodrigo. [1] https://devtalk.nvidia.com/default/topic/638521/linux/gnome-terminal-problem... [2] https://bugzilla.redhat.com/show_bug.cgi?id=1028272 [3] https://bbs.archlinux.org/viewtopic.php?pid=1350302
participants (3)
-
jb
-
Rodrigo Rivas
-
Timothée Ravier