[arch-general] ldconfig -> Aborted.
this might be a bit brief as it's really late and i'm already in trouble :-/ ... will expand as needed. i upgrade my machines and VMs very regularly, at least once a week, this last batch of updates broke all of my VMs in particular ... hardware devices still seem to function correctly. they are not really broken, but i am unable to regenerate `/etc/ld.so.cache` on any of them (3 ATM). upgraded packages (in order, from pacman.log): augeas linux-api-headers glibc <---------------------- prime suspect!!!! binutils fakeroot gcc-libs gcc pcre git keyutils krb5 libmysqlclient polkit yajl libvirt module-init-tools mkinitcpio linux pm-quirks postgresql-libs shadow systemd-arch-units texinfo ... during the upgrade i saw: upgraded linux-api-headers (3.0.1-1 -> 3.1.4-1) /tmp/alpm_gfNxsS/.INSTALL: line 4: 575 Aborted sbin/ldconfig -r . Generating locales... en_US.UTF-8...cannot map archive header: Invalid argument upgraded glibc (2.14.1-1 -> 2.14.1-2) upgraded binutils (2.21.1-2 -> 2.22-2) /tmp/alpm_dwiY8r/.INSTALL: line 5: 654 Aborted sbin/ldconfig -r . upgraded fakeroot (1.18.1-1 -> 1.18.2-1) ... the locale stuff is just a side effect of the ldconfig failure IIRC -- locales are a little borked because the archive file was blown away, not a problem tho. for some reason ldconfig refuses to update on all the VMs (each has worked without issue until today, for several months): # ldconfig Aborted ... i tried blacklisting some virtio modules (balloon in particular) and tripling memory, no change, and not convinced its 100% related to virtio yet. i tried removing `/var/cache/ldconfig/aux-cache` to force ldconfig to rescan everything (vs. stat checks) -- again, works on hardware but not VM. i tried removing the last library it processes before failing (per strace), no change, it just fails on another (libgcrypt -> libsysfs). i tried reinstalling glibc and whatnot ... these VMs are all pure 64bit, no multilib, and pure systemd, original initscript stuff purged. the only thing i didnt try was downgrading, because i would have to use the ARM ... i use 9p2000.L passthru for the rootfs of each VM, bindmount a local mirror into each VM's VFS, then configure pacman to use the `pool` directory of the bound mirror as a cachedir -- the net effect is pacman never downloads anything, ever, because it believes it already has :-) slightly odd perhaps, but working well for quite some time. any ideas? i can't find anything out of place, or any significant differences, and i'm not sure what to try next -- nothing unusual in dmesg or logs, on the VMs or the host. host is completely current as of Dec 14 00:00 CST. reduce strace follows, limited to files and signals. thanks for your time so far, if you made it to this point legitimately :-) C Anthony # LANG=C strace -ff -s256 -etrace=file,signal ldconfig execve("/sbin/ldconfig", ["ldconfig"], [/* 29 vars */]) = 0 open("/etc/ld.so.conf", O_RDONLY) = 3 open("/etc/ld.so.conf.d", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 4 open("/etc/ld.so.conf.d/fakeroot.conf", O_RDONLY) = 4 stat("/usr/lib/libfakeroot", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 open("/etc/ld.so.conf.d/perl.conf", O_RDONLY) = 4 stat("/usr/lib/perl5/core_perl/CORE", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 stat("/lib", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 stat("/lib64", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 stat("/usr/lib", {st_mode=S_IFDIR|0755, st_size=36864, ...}) = 0 stat("/usr/lib64", 0x7fff85a41d90) = -1 ENOENT (No such file or directory) open("/var/cache/ldconfig/aux-cache", O_RDONLY) = -1 ENOENT (No such file or directory) open("/usr/lib/libfakeroot", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3 lstat("/usr/lib/libfakeroot/libfakeroot.so", {st_mode=S_IFLNK|0777, st_size=16, ...}) = 0 open("/usr/lib/libfakeroot/libfakeroot.so", O_RDONLY) = 4 lstat("/usr/lib/libfakeroot/libfakeroot-0.so", {st_mode=S_IFREG|0755, st_size=40400, ...}) = 0 stat("/usr/lib/libfakeroot/libfakeroot-0.so", {st_mode=S_IFREG|0755, st_size=40400, ...}) = 0 stat("/usr/lib/libfakeroot/libfakeroot.so", {st_mode=S_IFREG|0755, st_size=40400, ...}) = 0 open("/usr/lib/perl5/core_perl/CORE", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3 lstat("/usr/lib/perl5/core_perl/CORE/libperl.so", {st_mode=S_IFREG|0555, st_size=1642088, ...}) = 0 open("/usr/lib/perl5/core_perl/CORE/libperl.so", O_RDONLY) = 4 stat("/usr/lib/perl5/core_perl/CORE/libperl.so", {st_mode=S_IFREG|0555, st_size=1642088, ...}) = 0 stat("/usr/lib/perl5/core_perl/CORE/libperl.so", {st_mode=S_IFREG|0555, st_size=1642088, ...}) = 0 open("/lib", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3 lstat("/lib/libpopt.so", {st_mode=S_IFLNK|0777, st_size=12, ...}) = 0 open("/lib/libpopt.so", O_RDONLY) = 4 lstat("/lib/libudev.so.0.13.0", {st_mode=S_IFREG|0755, st_size=55952, ...}) = 0 open("/lib/libudev.so.0.13.0", O_RDONLY) = 4 lstat("/lib/libutil.so.1", {st_mode=S_IFLNK|0777, st_size=17, ...}) = 0 open("/lib/libutil.so.1", O_RDONLY) = 4 lstat("/lib/libgcrypt.so", {st_mode=S_IFLNK|0777, st_size=19, ...}) = 0 open("/lib/libgcrypt.so", O_RDONLY) = 4 lstat("/lib/libpamc.so.0", {st_mode=S_IFLNK|0777, st_size=17, ...}) = 0 open("/lib/libpamc.so.0", O_RDONLY) = 4 lstat("/lib/libsysfs.so.2.0.1", {st_mode=S_IFREG|0755, st_size=47640, ...}) = 0 open("/lib/libsysfs.so.2.0.1", O_RDONLY) = 4 lstat("/lib/libhandle.so.1.0.3", {st_mode=S_IFREG|0644, st_size=14400, ...}) = 0 open("/lib/libhandle.so.1.0.3", O_RDONLY) = 4 lstat("/lib/libBrokenLocale.so.1", {st_mode=S_IFLNK|0777, st_size=25, ...}) = 0 open("/lib/libBrokenLocale.so.1", O_RDONLY) = 4 lstat("/lib/libBrokenLocale-2.14.1.so", {st_mode=S_IFREG|0755, st_size=6264, ...}) = 0 lstat("/lib/libnss_mdns_minimal.so.2", {st_mode=S_IFREG|0755, st_size=9904, ...}) = 0 open("/lib/libnss_mdns_minimal.so.2", O_RDONLY) = 4 lstat("/lib/libpam.so.0", {st_mode=S_IFLNK|0777, st_size=16, ...}) = 0 open("/lib/libpam.so.0", O_RDONLY) = 4 lstat("/lib/liblvm2app.so.2.2", {st_mode=S_IFREG|0555, st_size=667916, ...}) = 0 open("/lib/liblvm2app.so.2.2", O_RDONLY) = 4 lstat("/lib/libutil-2.14.1.so", {st_mode=S_IFREG|0755, st_size=10656, ...}) = 0 lstat("/lib/libbz2.so.1.0", {st_mode=S_IFLNK|0777, st_size=15, ...}) = 0 open("/lib/libbz2.so.1.0", O_RDONLY) = 4 lstat("/lib/libsystemd-login.so.0", {st_mode=S_IFLNK|0777, st_size=25, ...}) = 0 open("/lib/libsystemd-login.so.0", O_RDONLY) = 4 lstat("/lib/libanl-2.14.1.so", {st_mode=S_IFREG|0755, st_size=14928, ...}) = 0 open("/lib/libanl-2.14.1.so", O_RDONLY) = 4 lstat("/lib/libgpg-error.so", {st_mode=S_IFLNK|0777, st_size=21, ...}) = 0 open("/lib/libgpg-error.so", O_RDONLY) = 4 lstat("/lib/libacl.so", {st_mode=S_IFLNK|0777, st_size=11, ...}) = 0 open("/lib/libacl.so", O_RDONLY) = 4 lstat("/lib/libpam.so.0.83.1", {st_mode=S_IFREG|0755, st_size=55856, ...}) = 0 lstat("/lib/libgcrypt.so.11", {st_mode=S_IFLNK|0777, st_size=19, ...}) = 0 open("/lib/libgcrypt.so.11", O_RDONLY) = 4 rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0 tgkill(2559, 2559, SIGABRT) = 0 --- {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=2559, si_uid=0, si_value={int=2164749013, ptr=0xffffffff810772d5}} (Aborted) --- +++ killed by SIGABRT +++ Aborted
[root@archlinux spinymouse]# ldconfig [root@archlinux spinymouse]# It's ok on my machine. The reposiztories I'm using are [root@archlinux spinymouse]# pacman -Syy :: Synchronizing package databases... core 102.0K 425.1K/s 00:00:00 [######################################] 100% extra 1151.8K 424.4K/s 00:00:03 [######################################] 100% community 960.0K 424.1K/s 00:00:02 [######################################] 100% and my machine is upgraded using those repositories. Did you get the bug by using only those repositories? Regards, Ralf
On Wed, Dec 14, 2011 at 5:46 AM, C Anthony Risinger <anthony@xtfx.me> wrote:
any ideas? i can't find anything out of place, or any significant differences, and i'm not sure what to try next -- nothing unusual in dmesg or logs, on the VMs or the host. host is completely current as of Dec 14 00:00 CST. reduce strace follows, limited to files and signals.
at the last second i looked at the locale-gen stuff again, the trace shows mmap() failing with EINVAL: # strace -ff -s256 -etrace=mmap localedef -i en_US -c -f ISO-8859-1 -A /usr/share/locale/locale.alias en_US ...... mmap(NULL, 536870912, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb3aac63000 mmap(0x7fb3aac63000, 103860, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_FIXED, 3, 0) = -1 EINVAL (Invalid argument) cannot map archive header: Invalid argument ... i'm thinking it's probably related to 9p2000.L passthru at this point (ehm, under KVM if i didn't already mention it), but if anyone has some additional input, or better debug commands (eg. strace) that would be awesome. ldconfig does *not* fail with any errors at all, or trigger any whatsoever (other than ENOENT for missing files/etc). i might have created one of these from scratch on 9p2000.L, but i think they were all rsync'ed from existing installs on LVM partitions (as i was conviting my setup to use passthru for many benefits) ... it's possible this is the first time glibc/locale-gen has been ran since the conversion. -- C Anthony
On Wed, 14 Dec 2011 06:01:37 -0600 C Anthony Risinger <anthony@xtfx.me> wrote:
On Wed, Dec 14, 2011 at 5:46 AM, C Anthony Risinger <anthony@xtfx.me> wrote:
at the last second i looked at the locale-gen stuff again, the trace shows mmap() failing with EINVAL:
# strace -ff -s256 -etrace=mmap localedef -i en_US -c -f ISO-8859-1 -A /usr/share/locale/locale.alias en_US
...... mmap(NULL, 536870912, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb3aac63000 mmap(0x7fb3aac63000, 103860, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_FIXED, 3, 0) = -1 EINVAL (Invalid argument) cannot map archive header: Invalid argument
... i'm thinking it's probably related to 9p2000.L passthru at this point (ehm, under KVM if i didn't already mention it), but if anyone has some additional input, or better debug commands (eg. strace) that would be awesome. ldconfig does *not* fail with any errors at all, or trigger any whatsoever (other than ENOENT for missing files/etc).
i might have created one of these from scratch on 9p2000.L, but i think they were all rsync'ed from existing installs on LVM partitions (as i was conviting my setup to use passthru for many benefits) ... it's possible this is the first time glibc/locale-gen has been ran since the conversion.
--
C Anthony
Erm, have you actually tried to run ldconfig -v? -- Leonid Isaev GnuPG key ID: 164B5A6D Key fingerprint: C0DF 20D0 C075 C3F1 E1BE 775A A7AE F6CB 164B 5A6D
On Wed, Dec 14, 2011 at 10:56 AM, Leonid Isaev <lisaev@umail.iu.edu> wrote:
On Wed, 14 Dec 2011 06:01:37 -0600 C Anthony Risinger <anthony@xtfx.me> wrote:
... i'm thinking it's probably related to 9p2000.L passthru at this point (ehm, under KVM if i didn't already mention it), but if anyone has some additional input, or better debug commands (eg. strace) that would be awesome. ldconfig does *not* fail with any errors at all, or trigger any whatsoever (other than ENOENT for missing files/etc).
i might have created one of these from scratch on 9p2000.L, but i think they were all rsync'ed from existing installs on LVM partitions (as i was conviting my setup to use passthru for many benefits) ... it's possible this is the first time glibc/locale-gen has been ran since the conversion.
Erm, have you actually tried to run ldconfig -v?
heh ... uh, no. no i didn't. i guess my mind skipped right to the heavy artillery. # ldconfig -v ldconfig: Can't stat /usr/lib64: No such file or directory /usr/lib/libfakeroot: libfakeroot-0.so -> libfakeroot.so /usr/lib/perl5/core_perl/CORE: libperl.so -> libperl.so /lib: Aborted ... nothing useful i'm afraid :-( -- C Anthony
On Wed, 14 Dec 2011 14:56:25 -0600 C Anthony Risinger <anthony@xtfx.me> wrote:
On Wed, Dec 14, 2011 at 10:56 AM, Leonid Isaev <lisaev@umail.iu.edu> wrote:
On Wed, 14 Dec 2011 06:01:37 -0600 C Anthony Risinger <anthony@xtfx.me> wrote:
... i'm thinking it's probably related to 9p2000.L passthru at this point (ehm, under KVM if i didn't already mention it), but if anyone has some additional input, or better debug commands (eg. strace) that would be awesome. ldconfig does *not* fail with any errors at all, or trigger any whatsoever (other than ENOENT for missing files/etc).
i might have created one of these from scratch on 9p2000.L, but i think they were all rsync'ed from existing installs on LVM partitions (as i was conviting my setup to use passthru for many benefits) ... it's possible this is the first time glibc/locale-gen has been ran since the conversion.
Erm, have you actually tried to run ldconfig -v?
heh ... uh, no. no i didn't. i guess my mind skipped right to the heavy artillery.
# ldconfig -v ldconfig: Can't stat /usr/lib64: No such file or directory /usr/lib/libfakeroot: libfakeroot-0.so -> libfakeroot.so /usr/lib/perl5/core_perl/CORE: libperl.so -> libperl.so /lib: Aborted
... nothing useful i'm afraid :-(
So it basically receives SIGABRT. Have you already run strace on ldconfig, or only locale-gen? If not try this and also try removing /var/cache/ldconfig... -- Leonid Isaev GnuPG key ID: 164B5A6D Key fingerprint: C0DF 20D0 C075 C3F1 E1BE 775A A7AE F6CB 164B 5A6D
C Anthony Risinger wrote:
# ldconfig -v ldconfig: Can't stat /usr/lib64: No such file or directory /usr/lib/libfakeroot: libfakeroot-0.so -> libfakeroot.so /usr/lib/perl5/core_perl/CORE: libperl.so -> libperl.so /lib: Aborted
I think there's no harm in "mkdir /usr/lib64". To me this sounds as if the VM balloons out of memory. How much RAM is allocated to the VM's? clemens
On Thu, Dec 22, 2011 at 4:01 PM, clemens fischer <ino-news@spotteswoode.dnsalias.org> wrote:
# ldconfig -v ldconfig: Can't stat /usr/lib64: No such file or directory /usr/lib/libfakeroot: libfakeroot-0.so -> libfakeroot.so /usr/lib/perl5/core_perl/CORE: libperl.so -> libperl.so /lib: Aborted
I think there's no harm in "mkdir /usr/lib64".
To me this sounds as if the VM balloons out of memory. How much RAM is allocated to the VM's?
(fair amount of debug output ... summary at end) yeah i originally tried upping the mem to 1024M+, preventing the balloon module from loading (since it's an opt-in kernel module), and not even using mem ballooning -- no changes at all. the /usr/lib64 stuff isn't a prob, my guess is everyone's machine does that (/lib64 is created by glibc for compat reasons only, not in filesystem package) ... ... though, after rebuilding glibc with debug syms, i was able to trace the issue. `ldconfig` is consistently receiving the correct, then incorrect(?) inode, twice(!), to an arbitrary library; `ldconfig` detects this anomaly just before adding the entry to it's aux-cache, then explicitly calls abort(). while the problem library is `libgcrypt.so.11`, it's not specific to that lib (if i remove that library, it just fails on a different one) ... possibly a pattern here but not yet sure. i ran `gdb --args ldconfig -v` (breakpoint, output, backtrace, and source context provided below): ---------------------------------------------------------------------------- Reading symbols from /sbin/ldconfig...done. (gdb) break cache.c:620 if (soname!=0x0 && strcmp(soname, "libgcrypt.so.11")==0) Breakpoint 1 at 0x402e14: file cache.c, line 620. (gdb) commands Type commands for breakpoint(s) 1, one per line. End with a line saying just "end".
silent printf "\n---- soname: %s\n---- inode: %i\n---- hash: %i\n\n", soname, id->ino, hash continue end (gdb) run Starting program: /sbin/ldconfig -v /sbin/ldconfig: Can't stat /usr/lib64: No such file or directory
---- soname: libgcrypt.so.11 ---- inode: 15344348 ---- hash: 722 /usr/lib/libfakeroot: libfakeroot-0.so -> libfakeroot.so /usr/lib/perl5/core_perl/CORE: libperl.so -> libperl.so /lib: ---- soname: libgcrypt.so.11 ---- inode: 15344350 ---- hash: 834 ---- soname: libgcrypt.so.11 ---- inode: 15344350 ---- hash: 834 Program received signal SIGABRT, Aborted. 0x000000000044f4fc in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 64 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig); (gdb) bt #0 0x000000000044f4fc in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x000000000040c20e in abort () at abort.c:93 #2 0x0000000000402e57 in insert_to_aux_cache (id=0x7fffffffd1b0, flags=771, osversion=0, soname=0x6db360 "libgcrypt.so.11", used=1) at cache.c:625 #3 0x0000000000403dea in add_to_aux_cache (stat_buf=<optimized out>, flags=<optimized out>, osversion=<optimized out>, soname=<optimized out>) at cache.c:650 #4 0x00000000004023cd in search_dir (entry=0x6d09d0) at ldconfig.c:880 #5 0x0000000000402d09 in search_dirs () at ldconfig.c:1023 #6 main (argc=2, argv=<optimized out>) at ldconfig.c:1372 (gdb) list cache.c:620,625 620 for (entry = aux_hash[hash]; entry; entry = entry->next) 621 if (id->ino == entry->id.ino 622 && id->ctime == entry->id.ctime 623 && id->size == entry->id.size 624 && id->dev == entry->id.dev) 625 abort (); ---------------------------------------------------------------------------- ... before adding a new entry to the cache, `ldconfig` loops thru existing entries and aborts if an *exact* match is found ... and in this case there appears to somehow be 2 entries to the same library (with different inodes), the first is bogus (from VM perspective anyway) and the second is added twice, triggering the abort. i don't know if v9fs or QEMU is suppose to be changing the inode, but every file i test is "off by two", example (host/VM, resp): # stat --format="%i %n" ./lib/libgcrypt.so.11.7.0 15344348 ./lib/libgcrypt.so.11.7.0 # stat --format="%i %n" /lib/libgcrypt.so.11.7.0 15344350 /lib/libgcrypt.so.11.7.0 ... `ldconfig` is attempting to add EACH as `libgcrypt.so.11.7.0` (see gdb output)! very suspicious. the first (host?) version is somehow detected before anything in ld.so.conf.d/* is tried (and gdb confirms all other libs are found during this period as well) ... ... something is definitely wonky though, because `ldconfig` tries to add inode `15344348` as `libgcrypt.so.11.7.0`, but that is totally wrong from guest perspective: stat --format="%i %n" /lib/l* | grep 15344348 15344348 /lib/libext2fs.so.2.4 ... i don't know how the !@#$ it's getting that, but i suspect some kind of bad interaction between the host/VM page caches, or a bug in ldconfig, the v9fs kernel module, the "virtfs" server implemented within QEMU, or possibly something *very* odd about my setup. i'm also using the "mapped" virtfs option (guest perms/etc are stored in xattrs on the host) allowing QEMU to run as nobody:kvm instead of root ... could be part of the problem ... i thought this was the recommended way, perhaps not. in conclusion ... the issue is very unlikely to be Arch-specific. i'll debug a bit more, and take the information to the proper sources, but i figured i'd do a final update here for closure/interest, but will of course still gladly accepted any further advice or suggestion. thanks! -- C Anthony
participants (4)
-
C Anthony Risinger
-
clemens fischer
-
Leonid Isaev
-
Ralf Mardorf