[arch-devops] [arch-dev-public] Uploading old packages on archive.org (Was: Archive cleanup)

Florian Pritz bluewind at xinu.at
Thu Jan 24 09:48:27 UTC 2019

On Thu, Jan 24, 2019 at 09:27:23AM +0100, Baptiste Jonglez <baptiste at bitsofnetworks.org> wrote:
> I have just pushed the script I wrote last time:
>   https://github.com/zorun/arch-historical-archive
> It's a bit hackish and requires some manual work to correctly upload all
> packages for a given year, because archive.org rate-limits quite
> aggressively when they are overloaded.


> > We also have an archive cleanup script here[1]. Maybe the uploader can
> > be integrated there? I don't know how complicated it is.
> > 
> > [1] https://github.com/archlinux/archivetools/blob/master/archive-cleaner
> What about uploading to archive.org as soon as we archive packages on orion?
>   https://github.com/archlinux/archivetools/blob/master/archive.sh

While we still use this archive.sh script, dbscripts has recently also
be extended to populate the archive continuously. So uploading could be
integrated there with a queue file and a background job that performs
the upload.

Alternatively the uploader could be kept standalone and just adapted to
run more often and to maintain its own database/list to know which
packages have already been successfully uploaded and which haven't. I'll
call this "state database". Then we could run it every hour or so via a
systemd timer and it could upload all new and all failed packages. One
thing I'd want to have in this context is that the uploader should exit
with an error to let the systemd service fail if a package fails to
upload multiple times. I think I'd actually prefer this to be standalone
for simplicity.

> It would avoid hammering the archive.org server, because we would only
> send one package at a time.

Avoiding load spikes for archive.org certainly sounds like a good idea
and for us it's easier to monitor and maintain services that run more
often too.

> In any case, we need a retry mechanism to cope with the case where the
> upload fails.

This could use the state database I mentioned above. As for the
implementation of such a database, I'd suggest sqlite instead of rolling
your own text based list or whatever.  It's fast and simple, but you get
all the fancy stuff, like transactions, for free. You also don't have to
deal with recovering the database if the script crashes. sqlite just
rolls back uncommited transactions for you.

Would you be interested in adapting the uploader like this and making it
an automated service? If you're interested I can help with the
deployment part and provide feedback on the scripting side. If you want,
we can also discuss this on IRC.

PS: I've whitelisted you on the arch-devops ML so that your replies also
get archived.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://lists.archlinux.org/pipermail/arch-devops/attachments/20190124/f8ad69f6/attachment.asc>

More information about the arch-devops mailing list