[arch-general] Debugging third-party library's segfault if its caused by system update?
I use the openni2 library to access an Asus Xtion Pro Live camera, installed from the AUR and working fine up till 2+ weeks ago. After a 2 week holiday, the most recent system update caused segfaults to happen within the library (both before and after rebuilding it), without any change to the code calling the library. Same segfault happens with the simple sample applications included in the library (previously running fine). How do I track down the issue? The library's source code is available, but without knowing it well I'm unsure where to even begin. Normally I'd contact the authors, but as this issue was caused (on my system) by a system update I think I'd need to do some tracking down first.
On 06/09/2015 09:26 AM, Oon-Ee Ng wrote:
How do I track down the issue? The library's source code is available, but without knowing it well I'm unsure where to even begin.
Normally I'd contact the authors, but as this issue was caused (on my system) by a system update I think I'd need to do some tracking down first.
Hi, Have you read [1]? Also, tell the developers how to reproduce the bug and since which version the bug occurs. You can pacman -U /var/cache/pacman/pkg/<old-package> to install the old version again. [1] https://wiki.archlinux.org/index.php/Step_By_Step_Debugging_Guide
On Tue, Jun 9, 2015 at 9:26 PM, Florian Pelz <pelzflorian@googlemail.com> wrote:
On 06/09/2015 09:26 AM, Oon-Ee Ng wrote:
How do I track down the issue? The library's source code is available, but without knowing it well I'm unsure where to even begin.
Normally I'd contact the authors, but as this issue was caused (on my system) by a system update I think I'd need to do some tracking down first.
Hi,
Have you read [1]? Also, tell the developers how to reproduce the bug and since which version the bug occurs. You can pacman -U /var/cache/pacman/pkg/<old-package> to install the old version again.
[1] https://wiki.archlinux.org/index.php/Step_By_Step_Debugging_Guide
Yes I have. Does not help as I do not know what is really causing the crash (as in, which update). The library was not updated, nor are there any missing dependencies or linked libraries (checked with ldd).
On 06/10/2015 01:43 AM, Oon-Ee Ng wrote:
On Tue, Jun 9, 2015 at 9:26 PM, Florian Pelz <pelzflorian@googlemail.com> wrote:
On 06/09/2015 09:26 AM, Oon-Ee Ng wrote:
How do I track down the issue? The library's source code is available, but without knowing it well I'm unsure where to even begin.
Normally I'd contact the authors, but as this issue was caused (on my system) by a system update I think I'd need to do some tracking down first.
Hi,
Have you read [1]? Also, tell the developers how to reproduce the bug and since which version the bug occurs. You can pacman -U /var/cache/pacman/pkg/<old-package> to install the old version again.
[1] https://wiki.archlinux.org/index.php/Step_By_Step_Debugging_Guide
Yes I have. Does not help as I do not know what is really causing the crash (as in, which update). The library was not updated, nor are there any missing dependencies or linked libraries (checked with ldd).
Have you tried running the application or the library's sample application with gdb as described on the wiki? This way you would get the exact place where it crashes. For more precise output (i.e. line numbers in the source code), you might have to adapt the PKGBUILD as described in [1]. Since the library and application are from the AUR, you won't need ABS to get the PKGBUILD. [1] https://wiki.archlinux.org/index.php/Step_By_Step_Debugging_Guide#Improved_g...
Hi On Tue, Jun 9, 2015 at 12:26 AM, Oon-Ee Ng <ngoonee.talk@gmail.com> wrote:
I use the openni2 library to access an Asus Xtion Pro Live camera, installed from the AUR and working fine up till 2+ weeks ago.
After a 2 week holiday, the most recent system update caused segfaults to happen within the library (both before and after rebuilding it), without any change to the code calling the library. Same segfault happens with the simple sample applications included in the library (previously running fine).
How do I track down the issue? The library's source code is available, but without knowing it well I'm unsure where to even begin.
Normally I'd contact the authors, but as this issue was caused (on my system) by a system update I think I'd need to do some tracking down first.
Do you use Intel CPU? Try to setup microcode and see if it helps https://wiki.archlinux.org/index.php/Microcode
On Wed, Jun 10, 2015 at 7:48 AM, Anatol Pomozov <anatol.pomozov@gmail.com> wrote:
Do you use Intel CPU? Try to setup microcode and see if it helps https://wiki.archlinux.org/index.php/Microcode
I do, but that has not changed, and as CPU problems would likely affect the whole system rather than a specific library I'm hesitant to try something so low-level.
On 10 June 2015 at 03:12, Oon-Ee Ng <ngoonee.talk@gmail.com> wrote:
On Wed, Jun 10, 2015 at 7:48 AM, Anatol Pomozov <anatol.pomozov@gmail.com> wrote:
Do you use Intel CPU? Try to setup microcode and see if it helps https://wiki.archlinux.org/index.php/Microcode
I do, but that has not changed, and as CPU problems would likely affect the whole system rather than a specific library I'm hesitant to try something so low-level.
You should definitely set the microcode updates up, bugs in μcode could affect any part of the system and you'll need to do it sooner or later. It's not hard to do, at least with syslinux, I hear the GNU bootloader is more complicated to use.
(Assuming by GNU bootloader you mean GRUB) In Arch GRUB is patched to search intel-ucode.img automatically. If you write your own grub.cfg instead of using grub-mkconfig, you just put the image name in the same "initrd" line before the main initramfs. On 11 June 2015 at 00:45, Neven Sajko <nsajko@gmail.com> wrote:
On 10 June 2015 at 03:12, Oon-Ee Ng <ngoonee.talk@gmail.com> wrote:
On Wed, Jun 10, 2015 at 7:48 AM, Anatol Pomozov <anatol.pomozov@gmail.com> wrote:
Do you use Intel CPU? Try to setup microcode and see if it helps https://wiki.archlinux.org/index.php/Microcode
I do, but that has not changed, and as CPU problems would likely affect the whole system rather than a specific library I'm hesitant to try something so low-level.
You should definitely set the microcode updates up, bugs in μcode could affect any part of the system and you'll need to do it sooner or later. It's not hard to do, at least with syslinux, I hear the GNU bootloader is more complicated to use.
On Thu, Jun 11, 2015 at 12:45 AM, Neven Sajko <nsajko@gmail.com> wrote:
On 10 June 2015 at 03:12, Oon-Ee Ng <ngoonee.talk@gmail.com> wrote:
On Wed, Jun 10, 2015 at 7:48 AM, Anatol Pomozov <anatol.pomozov@gmail.com> wrote:
Do you use Intel CPU? Try to setup microcode and see if it helps https://wiki.archlinux.org/index.php/Microcode
I do, but that has not changed, and as CPU problems would likely affect the whole system rather than a specific library I'm hesitant to try something so low-level.
You should definitely set the microcode updates up, bugs in μcode could affect any part of the system and you'll need to do it sooner or later. It's not hard to do, at least with syslinux, I hear the GNU bootloader is more complicated to use.
Set it up, but nothing seems to change after reboot in terms of this behaviour. However it is possible there is a bigger issue in my recent updates, as I now have segfaults on the latest intel driver updates.
On Tue, Jun 09, 2015 at 03:26:35PM +0800, Oon-Ee Ng wrote:
How do I track down the issue? The library's source code is available, but without knowing it well I'm unsure where to even begin.
Try running a small program using the library in valgrind. The output should provide you with some hints. GDB (like Florian suggested) is also an option but, personally, I find valgrind a bit more convenient for such first quick checks. It also flags invalid memory accesses that do not cause your program to get killed. If valgrind turns out to be unusably slow, you might also try to build with ASan/AddressSanitizer enabled. Search the web for more info.
On Fri, Jun 12, 2015 at 2:25 PM, Lars Seipel <lars.seipel@gmail.com> wrote:
Try running a small program using the library in valgrind. The output should provide you with some hints. GDB (like Florian suggested) is also an option but, personally, I find valgrind a bit more convenient for such first quick checks. It also flags invalid memory accesses that do not cause your program to get killed.
Tried that, and the error comes in a thread reading an uninitialized pointer (I believe, valgrind output shown below). This isn't the sort of error which should be triggered by upgrading a different package though, is it? Which leads to my follow-up question, how likely is it that a glibc update causes a crash? ==10236== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info [13/20] ==10236== Command: SimpleViewer ==10236== ==10236== Thread 3: ==10236== Invalid read of size 1 ==10236== at 0x59E9784: ____strtoul_l_internal (in /usr/lib/libc-2.21.so) ==10236== by 0xA654637: ??? (in /usr/lib/OpenNI2/Drivers/libPS1080.so) ==10236== by 0xA654BD8: ??? (in /usr/lib/OpenNI2/Drivers/libPS1080.so) ==10236== by 0x90CE353: start_thread (in /usr/lib/libpthread-2.21.so) ==10236== by 0x5A99BFC: clone (in /usr/lib/libc-2.21.so) ==10236== Address 0x0 is not stack'd, malloc'd or (recently) free'd ==10236== ==10236== ==10236== Process terminating with default action of signal 11 (SIGSEGV): dumping core ==10236== Access not within mapped region at address 0x0 ==10236== at 0x59E9784: ____strtoul_l_internal (in /usr/lib/libc-2.21.so) ==10236== by 0xA654637: ??? (in /usr/lib/OpenNI2/Drivers/libPS1080.so) ==10236== by 0xA654BD8: ??? (in /usr/lib/OpenNI2/Drivers/libPS1080.so) ==10236== by 0x90CE353: start_thread (in /usr/lib/libpthread-2.21.so) ==10236== by 0x5A99BFC: clone (in /usr/lib/libc-2.21.so) ==10236== If you believe this happened as a result of a stack ==10236== overflow in your program's main thread (unlikely but ==10236== possible), you can try to increase the size of the ==10236== main thread stack using the --main-stacksize= flag. ==10236== The main thread stack size used in this run was 8388608. ==10236== ==10236== HEAP SUMMARY: ==10236== in use at exit: 284,816 bytes in 1,875 blocks ==10236== total heap usage: 6,845 allocs, 4,970 frees, 2,388,919 bytes allocated
On Tue, Jun 9, 2015 at 3:26 PM, Oon-Ee Ng <ngoonee.talk@gmail.com> wrote:
I use the openni2 library to access an Asus Xtion Pro Live camera, installed from the AUR and working fine up till 2+ weeks ago.
After a 2 week holiday, the most recent system update caused segfaults to happen within the library (both before and after rebuilding it), without any change to the code calling the library. Same segfault happens with the simple sample applications included in the library (previously running fine).
How do I track down the issue? The library's source code is available, but without knowing it well I'm unsure where to even begin.
Normally I'd contact the authors, but as this issue was caused (on my system) by a system update I think I'd need to do some tracking down first.
Dear all, This is becoming curiouser and curiouser. Used the Arch Rollback Machine to repeat the upgrades (without [testing]) and got all the way till today without the problem reoccuring. Some more selective upgrading revealed the problem to be libsystemd, systemd, and systemd-sysvcompat (upgrading from 219-6 to 220-1 brings the problem back reliably). I am reporting this at https://bugs.archlinux.org/task/45343
participants (6)
-
Anatol Pomozov
-
Florian Pelz
-
Lars Seipel
-
Neven Sajko
-
Oon-Ee Ng
-
Tom Yan