[pacman-dev] [PATCH] download: major refactor to address lingering issues
dan at archlinux.org
Thu Nov 12 03:13:54 EST 2009
On Thu, Nov 12, 2009 at 2:09 AM, Dan McGee <dan at archlinux.org> wrote:
> Sorry for this being such a huge patch, but I believe it is necessary for
> quite a few reasons which I will attempt to explain herein. I've been
> mulling this over for a while, but wasn't super happy with making the
> download interface more complex. Instead, if we carefully order things in
> the internal download code, we can actually make the interface simpler.
> 1. FS#15657 - This involves `name.db.tar.gz.part` files being left around the
> filesystem, and then causing all sorts of issues when someone attempts to
> rerun the operation they canceled. We need to ensure that if we resume a
> download, we are resuming it on exactly the same file; if we cannot be
> almost postive of that then we need to start over.
> 2. http://firstname.lastname@example.org/msg03536.html - Here
> we have a lighttpd bug to ruin the day. If we send both a Range: header and
> If-Modified-Since: header across the wire in a GET request, lighttpd doesn't
> do what we want in several cases. If the file hadn't been modified, it
> returns a '304 Not Modified' instead of a '206 Partial Content'. We need to
> do a stat (e.g. HEAD in HTTP terms) operation here, and the proceed
> accordingly based off the values we get back from it.
> 3. The mtime stuff was rather ugly, and relied on the called function to
> write back to a passed in reference, which isn't the greatest. Instead, use
> the power of the filesystem to contain this info. Every file downloaded
> internally is now carefully timestamped with the remote file time. This
> should allow the resume logic to work. In order to guarantee this, we need
> to implement a signal handler that catches interrupts, notifies the running
> code, and causes it to set the mtimes on the file. It then rethrows the
> signal so the pacman signal handler (or any frontend) works as expected.
> 4. We did a lot of funky stuff in trying to track the DB last modified time.
> It is a lot easier to just keep the downloaded DB file around and track the
> time on that rather than in a funky dot file. It also kills a lot of code.
> 5. For GPG verification of the databases down the road, we are going to need
> the DB file around for at least a short bit of time anyway, so this gets us
> closer to that.
> Signed-off-by: Dan McGee <dan at archlinux.org>
> lib/libalpm/alpm.h | 8 +--
> lib/libalpm/be_files.c | 96 ++-------------------------
> lib/libalpm/dload.c | 177 +++++++++++++++++++++++++++++++++---------------
> lib/libalpm/dload.h | 4 +-
> src/pacman/pacman.c | 11 +++-
> 5 files changed, 141 insertions(+), 155 deletions(-)
So yes, I know this patch is huge, but I'm really really *really*
hoping someone (or a few people!) would like to review it. Xavier, I
know you asked me about this stuff a while back, and I finally got
back around to it and it is in a much more working form. It is
hopefully well commented too.
I should note this went through about 3 distinct iterations locally
tonight before this final version came about. Function arguments were
dropped, added, and dropped again; calls into libfetch were reordered
and reworked and changed, etc. It took a good amount of time to get it
to work for all of the various partial download cases (both DB *and*
packages, yes, it actually works!) as well as keeping the force flag
working and just downloading in the most straightforward start-to-end
Anyway, it may be a tad easier to look at it here:
or in a side by side diff format:
Thanks guys! And don't ask why I am up at 2 AM my time working on
this, it feels like the old days of hacking again. :)
More information about the pacman-dev