[pacman-dev] [BUG] segfault when the download fails
I believe my mirror gave me a way to reproduce easily the issue Nagy described there : http://bugs.archlinux.org/task/8350#comment20223 However, it doesn't look at all like the output in the original bug report, where the progress bar is at 64%. So I wasn't sure where to report this information, so I'm reporting it here for now before investigating further. Here is the --debug output: Targets: hunspell-1.2.1-2 openoffice-fr-2.3.1-1 Total Download Size: 15.08 MB [00:27:01] debug: using cachedir: /var/cache/pacman/pkg/ :: Retrieving packages from extra... error: Internal pacman error: Segmentation fault. Please submit a full bug report with --debug if appropriate. [00:27:01] debug: using 'hunspell-1.2.1-2-i686.pkg.tar.gz' for download progress [00:27:01] debug: connected to mir1.archlinuxfr.org successfully [00:27:02] debug: using 'openoffice-fr-2.3.1-1-i686.pkg.tar.gz' for download progress [00:27:02] error: failed retrieving file 'openoffice-fr-2.3.1-1-i686.pkg.tar.gz' from mir1.archlinuxfr.org : Not Found [00:27:02] debug: using 'hunspell-1.2.1-2-i686.pkg.tar.gz' for download progress [00:27:02] error: segmentation fault So pacman first downloads hunspell-1.2.1-2-i686.pkg.tar.gz and succeeds, then it tries to download openoffice-fr-2.3.1-1-i686.pkg.tar.gz and fails. So after that, it tries using the second server in the list. It starts with hunspell-1.2.1-2-i686.pkg.tar.gz again, and then bam, segfault. While I'm finishing this mail, the bug is already gone, since that package is now available on my first mirror. But when it wasn't available, the issue was 100% reproducible, I made it happen 10 times in a row.
On Dec 6, 2007 5:46 PM, Xavier <shiningxc@gmail.com> wrote:
I believe my mirror gave me a way to reproduce easily the issue Nagy described there : http://bugs.archlinux.org/task/8350#comment20223
However, it doesn't look at all like the output in the original bug report, where the progress bar is at 64%. So I wasn't sure where to report this information, so I'm reporting it here for now before investigating further.
Here is the --debug output:
Targets: hunspell-1.2.1-2 openoffice-fr-2.3.1-1
Total Download Size: 15.08 MB
[00:27:01] debug: using cachedir: /var/cache/pacman/pkg/ :: Retrieving packages from extra... error: Internal pacman error: Segmentation fault. Please submit a full bug report with --debug if appropriate. [00:27:01] debug: using 'hunspell-1.2.1-2-i686.pkg.tar.gz' for download progress [00:27:01] debug: connected to mir1.archlinuxfr.org successfully [00:27:02] debug: using 'openoffice-fr-2.3.1-1-i686.pkg.tar.gz' for download progress [00:27:02] error: failed retrieving file 'openoffice-fr-2.3.1-1-i686.pkg.tar.gz' from mir1.archlinuxfr.org : Not Found [00:27:02] debug: using 'hunspell-1.2.1-2-i686.pkg.tar.gz' for download progress [00:27:02] error: segmentation fault
So pacman first downloads hunspell-1.2.1-2-i686.pkg.tar.gz and succeeds, then it tries to download openoffice-fr-2.3.1-1-i686.pkg.tar.gz and fails. So after that, it tries using the second server in the list. It starts with hunspell-1.2.1-2-i686.pkg.tar.gz again, and then bam, segfault.
While I'm finishing this mail, the bug is already gone, since that package is now available on my first mirror. But when it wasn't available, the issue was 100% reproducible, I made it happen 10 times in a row.
I caught a segfault the other day during a big upgrade when one of the xfce4 packages was missing on my first mirror. It segfaulted after downloading everything else, it appeared. This is an interesting bug and we might just get to the bottom of this now that we at least have our conditions narrowed down. -Dan
On Thu, Dec 06, 2007 at 06:14:05PM -0600, Dan McGee wrote:
I caught a segfault the other day during a big upgrade when one of the xfce4 packages was missing on my first mirror. It segfaulted after downloading everything else, it appeared.
This is an interesting bug and we might just get to the bottom of this now that we at least have our conditions narrowed down.
Actually, it's very easy to reproduce. Just pick two packages not yet downloaded (just remove from cache if necessary). Then run pacman -S on them a first time to see in which order they will be installed. Then for the first one, edit the sync db, and make a typo in the filename, so that it can't be found on the mirror. Then install the 2 packages. Here is the top of the backtrace, just before the segfault: #0 alpm_list_find_str (haystack=0x88f5bf0, needle=0x894a020 "hunspell-1.2.1-2-i686.pkg.tar.gz") at alpm_list.c:628 No locals. #1 0xb7edb71b in _alpm_downloadfiles_forreal (servers=0x807f318, localpath=0x8059dc0 "/var/cache/pacman/pkg/", files=0x88f53b8, mtime1=0, mtime2=0x0, dl_total=0xbf8966f4, totalsize=15813133) at server.c:207 fileurl = (struct url *) 0x88f53e8 realfile = "/var/cache/pacman/pkg/hunspell-1.2.1-2-i686.pkg.tar.gz<snip junk> fn = 0x894a020 "hunspell-1.2.1-2-i686.pkg.tar.gz" pkgname = "hunspell-1.2.1-2-i686.pkg.tar.gz", '\0' <repeats 223 times> output = "/var/cache/pacman/pkg/hunspell-1.2.1-2-i686.pkg.tar.gz.part<snip junk> server = (pmserver_t *) 0x807f360 dl_thisfile = 0 lp = (alpm_list_t *) 0x88f53b8 complete = (alpm_list_t *) 0x88f5bf0 i = (alpm_list_t *) 0x807ccf0 __func__ = "_alpm_downloadfiles_forreal" #2 0xb7edc2f8 in _alpm_downloadfiles (servers=0x807f318, localpath=0x8059dc0 "/var/cache/pacman/pkg/", files=0x88f53b8, dl_total=0xbf8966f4, totalsize=15813133) at server.c:147 No locals. #3 0xb7edd053 in _alpm_sync_commit (trans=0x8059b50, db_local=0x8082e48, data=0xbf8989e0) at sync.c:1055 spkg = (pmpkg_t *) 0x85463d0 current = (pmdb_t *) 0x807d020 i = (alpm_list_t *) 0x8079dd0 j = (alpm_list_t *) 0x0 files = (alpm_list_t *) 0x88f53b8 patches = (alpm_list_t *) 0x0 deltas = (alpm_list_t *) 0x0 tr = <value optimized out> replaces = <value optimized out> retval = <value optimized out> cachedir = 0x8059dc0 "/var/cache/pacman/pkg/" dltotal = 15813133 dl = 190656 __func__ = "_alpm_sync_commit" At this point, hunspell has been downloaded, and added to the complete list. It tried to download openoffice but failed. Then it's trying a second mirror, and wants to check if hunspell has already been downloaded, so check if it finds it in the complete list. That's line 207 of server.c. At this point, the complete list is completly (ahah) broken. It looks like it has been erased by a buffer overflow or something. I put a breakpoint in alpm_list_find_str, and here is it : x/s haystack 0x88f5bf0: "2-i686.pkg.tar.gz" x/s (char *)haystack-40 0x88f5bc8: "/archlinux/extra/os/i686/hunspell-1.2.1-2-i686.pkg.tar.gz" Instead of containing an actual list structure, complete contains the end of this string :P When it tries to read complete->data or complete->next, it explodes.
On Dec 6, 2007 5:46 PM, Xavier <shiningxc@gmail.com> wrote:
[00:27:01] debug: using 'hunspell-1.2.1-2-i686.pkg.tar.gz' [00:27:02] debug: using 'openoffice-fr-2.3.1-1-i686.pkg.tar.gz' [00:27:02] debug: using 'hunspell-1.2.1-2-i686.pkg.tar.gz'
A cursory look here shows something important. We have the following logic: for s in servers: for f in files: if cant_get_file: continue Well that's just not right - we want to pop out to the parent loop, not move to the next file While that's wrong and all, it's actually not really the error... I mean, just looking at it... it should still WORK in this case, just much slower due to extra loop iterations. My guess? There's an alpm_list_free inside the "for s in servers" loop that should probably be OUTSIDE it, unless that's a totally different bug, but I'd expect the list_free that DOESN'T null out the pointer is going to re-add without reallocating and blow up like this.
On Dec 6, 2007 6:24 PM, Aaron Griffin <aaronmgriffin@gmail.com> wrote:
On Dec 6, 2007 5:46 PM, Xavier <shiningxc@gmail.com> wrote:
[00:27:01] debug: using 'hunspell-1.2.1-2-i686.pkg.tar.gz' [00:27:02] debug: using 'openoffice-fr-2.3.1-1-i686.pkg.tar.gz' [00:27:02] debug: using 'hunspell-1.2.1-2-i686.pkg.tar.gz'
A cursory look here shows something important. We have the following logic:
for s in servers: for f in files: if cant_get_file: continue
Well that's just not right - we want to pop out to the parent loop, not move to the next file
While that's wrong and all, it's actually not really the error... I mean, just looking at it... it should still WORK in this case, just much slower due to extra loop iterations.
It is terrible logic and just screams for a rewrite. I might do this tonight if the bar doesn't call my name.
My guess? There's an alpm_list_free inside the "for s in servers" loop that should probably be OUTSIDE it, unless that's a totally different bug, but I'd expect the list_free that DOESN'T null out the pointer is going to re-add without reallocating and blow up like this.
Yeah, worth looking for. -Dan
On Thu, Dec 06, 2007 at 06:24:55PM -0600, Aaron Griffin wrote:
On Dec 6, 2007 5:46 PM, Xavier <shiningxc@gmail.com> wrote:
[00:27:01] debug: using 'hunspell-1.2.1-2-i686.pkg.tar.gz' [00:27:02] debug: using 'openoffice-fr-2.3.1-1-i686.pkg.tar.gz' [00:27:02] debug: using 'hunspell-1.2.1-2-i686.pkg.tar.gz'
A cursory look here shows something important. We have the following logic:
for s in servers: for f in files: if cant_get_file: continue
Well that's just not right - we want to pop out to the parent loop, not move to the next file
While that's wrong and all, it's actually not really the error... I mean, just looking at it... it should still WORK in this case, just much slower due to extra loop iterations.
My guess? There's an alpm_list_free inside the "for s in servers" loop that should probably be OUTSIDE it, unless that's a totally different bug, but I'd expect the list_free that DOESN'T null out the pointer is going to re-add without reallocating and blow up like this.
Oh of course, you are perfectly right. I saw that free, but didn't pay attention that it was inside that loop ;)
From c8dc427e07cbdfcba496f69de4ec524632295801 Mon Sep 17 00:00:00 2001 From: Chantry Xavier <shiningxc@gmail.com> Date: Fri, 7 Dec 2007 01:51:14 +0100 Subject: [PATCH] libalpm/server.c : fix segfault when downloading failed.
The alpm_list_free(complete) needs to be done OUTSIDE the loop walking through the server list. Besides, when downloading fails, we might as well skip directly to the next server, instead of trying to download other files from it. So s/continue/break. Signed-off-by: Chantry Xavier <shiningxc@gmail.com> --- lib/libalpm/server.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/libalpm/server.c b/lib/libalpm/server.c index 8aa5a45..bd7752b 100644 --- a/lib/libalpm/server.c +++ b/lib/libalpm/server.c @@ -249,7 +249,7 @@ int _alpm_downloadfiles_forreal(alpm_list_t *servers, const char *localpath, } /* try the next server */ downloadFreeURL(fileurl); - continue; + break; } else { _alpm_log(PM_LOG_DEBUG, "connected to %s successfully\n", fileurl->host); } @@ -410,8 +410,8 @@ int _alpm_downloadfiles_forreal(alpm_list_t *servers, const char *localpath, if(alpm_list_count(complete) == alpm_list_count(files)) { done = 1; } - alpm_list_free(complete); } + alpm_list_free(complete); return(done ? 0 : -1); } -- 1.5.3.7
On Thu, Dec 06, 2007 at 06:24:55PM -0600, Aaron Griffin wrote:
On Dec 6, 2007 5:46 PM, Xavier <shiningxc@gmail.com> wrote:
[00:27:01] debug: using 'hunspell-1.2.1-2-i686.pkg.tar.gz' [00:27:02] debug: using 'openoffice-fr-2.3.1-1-i686.pkg.tar.gz' [00:27:02] debug: using 'hunspell-1.2.1-2-i686.pkg.tar.gz'
A cursory look here shows something important. We have the following logic:
for s in servers: for f in files: if cant_get_file: continue
Well that's just not right - we want to pop out to the parent loop, not move to the next file
While that's wrong and all, it's actually not really the error... I mean, just looking at it... it should still WORK in this case, just much slower due to extra loop iterations.
This looked sane to me yesterday, so I included the s/continue/break change in my patch, but I'm not sure anymore. I think Nagy has a very valid point : On Fri, Dec 07, 2007 at 02:25:28PM +0100, Nagy Gabor wrote:
1st server: fastest server 2nd server: "fallback1" server: slow(er), but contains old packages too 3nd server: "fallback2": ftp://ftp.archlinux.org <- this should be the most up2date
So I would like you to use the fastest server as much as possible ;-) <- and I've synced my repo from there, so that "fits" better to my actual sync db...
And actually, in my example above, the mirror didn't have openoffice-fr, but it did have hunspell. And there were probably many other packages available from it, it's just that the syncing wasn't fully done yet. Though, there is also the case where the mirror is totally unavailable, and you want to download/install many packages. But in this case, the current behavior still works, it just will just be slower and print a lot of warnings. With the single mirror list file, it's also easy to temporarily disable that dead mirror by commenting it.
On Dec 7, 2007 8:12 AM, Xavier <shiningxc@gmail.com> wrote:
And actually, in my example above, the mirror didn't have openoffice-fr, but it did have hunspell. And there were probably many other packages available from it, it's just that the syncing wasn't fully done yet.
Though, there is also the case where the mirror is totally unavailable, and you want to download/install many packages. But in this case, the current behavior still works, it just will just be slower and print a lot of warnings. With the single mirror list file, it's also easy to temporarily disable that dead mirror by commenting it.
Yeah this is what I was saying. It... "works" but it's a little slow. I guess, in actuality, it's not a huge problem, and may actually be advantageous for a download of 20 packages where only 1 is missing on the super-fast mirror. So Nagy is probably right that the existing behavior (with the "continue") is better if we assume a sorted mirror list
On Dec 7, 2007 10:38 AM, Aaron Griffin <aaronmgriffin@gmail.com> wrote:
On Dec 7, 2007 8:12 AM, Xavier <shiningxc@gmail.com> wrote:
And actually, in my example above, the mirror didn't have openoffice-fr, but it did have hunspell. And there were probably many other packages available from it, it's just that the syncing wasn't fully done yet.
Though, there is also the case where the mirror is totally unavailable, and you want to download/install many packages. But in this case, the current behavior still works, it just will just be slower and print a lot of warnings. With the single mirror list file, it's also easy to temporarily disable that dead mirror by commenting it.
Yeah this is what I was saying. It... "works" but it's a little slow. I guess, in actuality, it's not a huge problem, and may actually be advantageous for a download of 20 packages where only 1 is missing on the super-fast mirror.
So Nagy is probably right that the existing behavior (with the "continue") is better if we assume a sorted mirror list
I pushed just the alpm_list_free part to my working branch, so our segfault is squashed. The code still needs a cleansing though. :) -Dan
On Fri, Dec 07, 2007 at 10:43:19AM -0600, Dan McGee wrote:
I pushed just the alpm_list_free part to my working branch, so our segfault is squashed. The code still needs a cleansing though. :)
It looks good, thanks. Aaron agreed as well, so everything is fine.
participants (3)
-
Aaron Griffin
-
Dan McGee
-
Xavier