On Thu, Jan 24, 2019 at 09:27:23AM +0100, Baptiste Jonglez <baptiste@bitsofnetworks.org> wrote:
I have just pushed the script I wrote last time:
https://github.com/zorun/arch-historical-archive
It's a bit hackish and requires some manual work to correctly upload all packages for a given year, because archive.org rate-limits quite aggressively when they are overloaded.
Thanks!
We also have an archive cleanup script here[1]. Maybe the uploader can be integrated there? I don't know how complicated it is.
[1] https://github.com/archlinux/archivetools/blob/master/archive-cleaner
What about uploading to archive.org as soon as we archive packages on orion?
https://github.com/archlinux/archivetools/blob/master/archive.sh
While we still use this archive.sh script, dbscripts has recently also be extended to populate the archive continuously. So uploading could be integrated there with a queue file and a background job that performs the upload. Alternatively the uploader could be kept standalone and just adapted to run more often and to maintain its own database/list to know which packages have already been successfully uploaded and which haven't. I'll call this "state database". Then we could run it every hour or so via a systemd timer and it could upload all new and all failed packages. One thing I'd want to have in this context is that the uploader should exit with an error to let the systemd service fail if a package fails to upload multiple times. I think I'd actually prefer this to be standalone for simplicity.
It would avoid hammering the archive.org server, because we would only send one package at a time.
Avoiding load spikes for archive.org certainly sounds like a good idea and for us it's easier to monitor and maintain services that run more often too.
In any case, we need a retry mechanism to cope with the case where the upload fails.
This could use the state database I mentioned above. As for the implementation of such a database, I'd suggest sqlite instead of rolling your own text based list or whatever. It's fast and simple, but you get all the fancy stuff, like transactions, for free. You also don't have to deal with recovering the database if the script crashes. sqlite just rolls back uncommited transactions for you. Would you be interested in adapting the uploader like this and making it an automated service? If you're interested I can help with the deployment part and provide feedback on the scripting side. If you want, we can also discuss this on IRC. PS: I've whitelisted you on the arch-devops ML so that your replies also get archived. Florian