[pacman-dev] [BUG] segfault when the download fails
I believe my mirror gave me a way to reproduce easily the issue Nagy described there : http://bugs.archlinux.org/task/8350#comment20223 However, it doesn't look at all like the output in the original bug report, where the progress bar is at 64%. So I wasn't sure where to report this information, so I'm reporting it here for now before investigating further. Here is the --debug output: Targets: hunspell-1.2.1-2 openoffice-fr-2.3.1-1 Total Download Size: 15.08 MB [00:27:01] debug: using cachedir: /var/cache/pacman/pkg/ :: Retrieving packages from extra... error: Internal pacman error: Segmentation fault. Please submit a full bug report with --debug if appropriate. [00:27:01] debug: using 'hunspell-1.2.1-2-i686.pkg.tar.gz' for download progress [00:27:01] debug: connected to mir1.archlinuxfr.org successfully [00:27:02] debug: using 'openoffice-fr-2.3.1-1-i686.pkg.tar.gz' for download progress [00:27:02] error: failed retrieving file 'openoffice-fr-2.3.1-1-i686.pkg.tar.gz' from mir1.archlinuxfr.org : Not Found [00:27:02] debug: using 'hunspell-1.2.1-2-i686.pkg.tar.gz' for download progress [00:27:02] error: segmentation fault So pacman first downloads hunspell-1.2.1-2-i686.pkg.tar.gz and succeeds, then it tries to download openoffice-fr-2.3.1-1-i686.pkg.tar.gz and fails. So after that, it tries using the second server in the list. It starts with hunspell-1.2.1-2-i686.pkg.tar.gz again, and then bam, segfault. While I'm finishing this mail, the bug is already gone, since that package is now available on my first mirror. But when it wasn't available, the issue was 100% reproducible, I made it happen 10 times in a row.
On Dec 6, 2007 5:46 PM, Xavier
I believe my mirror gave me a way to reproduce easily the issue Nagy described there : http://bugs.archlinux.org/task/8350#comment20223
However, it doesn't look at all like the output in the original bug report, where the progress bar is at 64%. So I wasn't sure where to report this information, so I'm reporting it here for now before investigating further.
Here is the --debug output:
Targets: hunspell-1.2.1-2 openoffice-fr-2.3.1-1
Total Download Size: 15.08 MB
[00:27:01] debug: using cachedir: /var/cache/pacman/pkg/ :: Retrieving packages from extra... error: Internal pacman error: Segmentation fault. Please submit a full bug report with --debug if appropriate. [00:27:01] debug: using 'hunspell-1.2.1-2-i686.pkg.tar.gz' for download progress [00:27:01] debug: connected to mir1.archlinuxfr.org successfully [00:27:02] debug: using 'openoffice-fr-2.3.1-1-i686.pkg.tar.gz' for download progress [00:27:02] error: failed retrieving file 'openoffice-fr-2.3.1-1-i686.pkg.tar.gz' from mir1.archlinuxfr.org : Not Found [00:27:02] debug: using 'hunspell-1.2.1-2-i686.pkg.tar.gz' for download progress [00:27:02] error: segmentation fault
So pacman first downloads hunspell-1.2.1-2-i686.pkg.tar.gz and succeeds, then it tries to download openoffice-fr-2.3.1-1-i686.pkg.tar.gz and fails. So after that, it tries using the second server in the list. It starts with hunspell-1.2.1-2-i686.pkg.tar.gz again, and then bam, segfault.
While I'm finishing this mail, the bug is already gone, since that package is now available on my first mirror. But when it wasn't available, the issue was 100% reproducible, I made it happen 10 times in a row.
I caught a segfault the other day during a big upgrade when one of the xfce4 packages was missing on my first mirror. It segfaulted after downloading everything else, it appeared. This is an interesting bug and we might just get to the bottom of this now that we at least have our conditions narrowed down. -Dan
On Thu, Dec 06, 2007 at 06:14:05PM -0600, Dan McGee wrote:
I caught a segfault the other day during a big upgrade when one of the xfce4 packages was missing on my first mirror. It segfaulted after downloading everything else, it appeared.
This is an interesting bug and we might just get to the bottom of this now that we at least have our conditions narrowed down.
Actually, it's very easy to reproduce. Just pick two packages not yet
downloaded (just remove from cache if necessary). Then run pacman -S on them
a first time to see in which order they will be installed. Then for the first
one, edit the sync db, and make a typo in the filename, so that it can't be
found on the mirror. Then install the 2 packages.
Here is the top of the backtrace, just before the segfault:
#0 alpm_list_find_str (haystack=0x88f5bf0,
needle=0x894a020 "hunspell-1.2.1-2-i686.pkg.tar.gz")
at alpm_list.c:628
No locals.
#1 0xb7edb71b in _alpm_downloadfiles_forreal (servers=0x807f318,
localpath=0x8059dc0 "/var/cache/pacman/pkg/", files=0x88f53b8,
mtime1=0, mtime2=0x0, dl_total=0xbf8966f4, totalsize=15813133)
at server.c:207
fileurl = (struct url *) 0x88f53e8
realfile = "/var/cache/pacman/pkg/hunspell-1.2.1-2-i686.pkg.tar.gz<snip junk>
fn = 0x894a020 "hunspell-1.2.1-2-i686.pkg.tar.gz"
pkgname = "hunspell-1.2.1-2-i686.pkg.tar.gz", '\0'
On Dec 6, 2007 5:46 PM, Xavier
[00:27:01] debug: using 'hunspell-1.2.1-2-i686.pkg.tar.gz' [00:27:02] debug: using 'openoffice-fr-2.3.1-1-i686.pkg.tar.gz' [00:27:02] debug: using 'hunspell-1.2.1-2-i686.pkg.tar.gz'
A cursory look here shows something important. We have the following logic: for s in servers: for f in files: if cant_get_file: continue Well that's just not right - we want to pop out to the parent loop, not move to the next file While that's wrong and all, it's actually not really the error... I mean, just looking at it... it should still WORK in this case, just much slower due to extra loop iterations. My guess? There's an alpm_list_free inside the "for s in servers" loop that should probably be OUTSIDE it, unless that's a totally different bug, but I'd expect the list_free that DOESN'T null out the pointer is going to re-add without reallocating and blow up like this.
On Dec 6, 2007 6:24 PM, Aaron Griffin
On Dec 6, 2007 5:46 PM, Xavier
wrote: [00:27:01] debug: using 'hunspell-1.2.1-2-i686.pkg.tar.gz' [00:27:02] debug: using 'openoffice-fr-2.3.1-1-i686.pkg.tar.gz' [00:27:02] debug: using 'hunspell-1.2.1-2-i686.pkg.tar.gz'
A cursory look here shows something important. We have the following logic:
for s in servers: for f in files: if cant_get_file: continue
Well that's just not right - we want to pop out to the parent loop, not move to the next file
While that's wrong and all, it's actually not really the error... I mean, just looking at it... it should still WORK in this case, just much slower due to extra loop iterations.
It is terrible logic and just screams for a rewrite. I might do this tonight if the bar doesn't call my name.
My guess? There's an alpm_list_free inside the "for s in servers" loop that should probably be OUTSIDE it, unless that's a totally different bug, but I'd expect the list_free that DOESN'T null out the pointer is going to re-add without reallocating and blow up like this.
Yeah, worth looking for. -Dan
On Thu, Dec 06, 2007 at 06:24:55PM -0600, Aaron Griffin wrote:
On Dec 6, 2007 5:46 PM, Xavier
wrote: [00:27:01] debug: using 'hunspell-1.2.1-2-i686.pkg.tar.gz' [00:27:02] debug: using 'openoffice-fr-2.3.1-1-i686.pkg.tar.gz' [00:27:02] debug: using 'hunspell-1.2.1-2-i686.pkg.tar.gz'
A cursory look here shows something important. We have the following logic:
for s in servers: for f in files: if cant_get_file: continue
Well that's just not right - we want to pop out to the parent loop, not move to the next file
While that's wrong and all, it's actually not really the error... I mean, just looking at it... it should still WORK in this case, just much slower due to extra loop iterations.
My guess? There's an alpm_list_free inside the "for s in servers" loop that should probably be OUTSIDE it, unless that's a totally different bug, but I'd expect the list_free that DOESN'T null out the pointer is going to re-add without reallocating and blow up like this.
Oh of course, you are perfectly right. I saw that free, but didn't pay attention that it was inside that loop ;)
From c8dc427e07cbdfcba496f69de4ec524632295801 Mon Sep 17 00:00:00 2001 From: Chantry Xavier
Date: Fri, 7 Dec 2007 01:51:14 +0100 Subject: [PATCH] libalpm/server.c : fix segfault when downloading failed.
The alpm_list_free(complete) needs to be done OUTSIDE the loop walking
through the server list.
Besides, when downloading fails, we might as well skip directly to the next
server, instead of trying to download other files from it. So
s/continue/break.
Signed-off-by: Chantry Xavier
On Thu, Dec 06, 2007 at 06:24:55PM -0600, Aaron Griffin wrote:
On Dec 6, 2007 5:46 PM, Xavier
wrote: [00:27:01] debug: using 'hunspell-1.2.1-2-i686.pkg.tar.gz' [00:27:02] debug: using 'openoffice-fr-2.3.1-1-i686.pkg.tar.gz' [00:27:02] debug: using 'hunspell-1.2.1-2-i686.pkg.tar.gz'
A cursory look here shows something important. We have the following logic:
for s in servers: for f in files: if cant_get_file: continue
Well that's just not right - we want to pop out to the parent loop, not move to the next file
While that's wrong and all, it's actually not really the error... I mean, just looking at it... it should still WORK in this case, just much slower due to extra loop iterations.
This looked sane to me yesterday, so I included the s/continue/break change in my patch, but I'm not sure anymore. I think Nagy has a very valid point : On Fri, Dec 07, 2007 at 02:25:28PM +0100, Nagy Gabor wrote:
1st server: fastest server 2nd server: "fallback1" server: slow(er), but contains old packages too 3nd server: "fallback2": ftp://ftp.archlinux.org <- this should be the most up2date
So I would like you to use the fastest server as much as possible ;-) <- and I've synced my repo from there, so that "fits" better to my actual sync db...
And actually, in my example above, the mirror didn't have openoffice-fr, but it did have hunspell. And there were probably many other packages available from it, it's just that the syncing wasn't fully done yet. Though, there is also the case where the mirror is totally unavailable, and you want to download/install many packages. But in this case, the current behavior still works, it just will just be slower and print a lot of warnings. With the single mirror list file, it's also easy to temporarily disable that dead mirror by commenting it.
On Dec 7, 2007 8:12 AM, Xavier
And actually, in my example above, the mirror didn't have openoffice-fr, but it did have hunspell. And there were probably many other packages available from it, it's just that the syncing wasn't fully done yet.
Though, there is also the case where the mirror is totally unavailable, and you want to download/install many packages. But in this case, the current behavior still works, it just will just be slower and print a lot of warnings. With the single mirror list file, it's also easy to temporarily disable that dead mirror by commenting it.
Yeah this is what I was saying. It... "works" but it's a little slow. I guess, in actuality, it's not a huge problem, and may actually be advantageous for a download of 20 packages where only 1 is missing on the super-fast mirror. So Nagy is probably right that the existing behavior (with the "continue") is better if we assume a sorted mirror list
On Dec 7, 2007 10:38 AM, Aaron Griffin
On Dec 7, 2007 8:12 AM, Xavier
wrote: And actually, in my example above, the mirror didn't have openoffice-fr, but it did have hunspell. And there were probably many other packages available from it, it's just that the syncing wasn't fully done yet.
Though, there is also the case where the mirror is totally unavailable, and you want to download/install many packages. But in this case, the current behavior still works, it just will just be slower and print a lot of warnings. With the single mirror list file, it's also easy to temporarily disable that dead mirror by commenting it.
Yeah this is what I was saying. It... "works" but it's a little slow. I guess, in actuality, it's not a huge problem, and may actually be advantageous for a download of 20 packages where only 1 is missing on the super-fast mirror.
So Nagy is probably right that the existing behavior (with the "continue") is better if we assume a sorted mirror list
I pushed just the alpm_list_free part to my working branch, so our segfault is squashed. The code still needs a cleansing though. :) -Dan
On Fri, Dec 07, 2007 at 10:43:19AM -0600, Dan McGee wrote:
I pushed just the alpm_list_free part to my working branch, so our segfault is squashed. The code still needs a cleansing though. :)
It looks good, thanks. Aaron agreed as well, so everything is fine.
participants (3)
-
Aaron Griffin
-
Dan McGee
-
Xavier