On 24-01-19, Florian Pritz wrote:
What about uploading to archive.org as soon as we archive packages on orion?
https://github.com/archlinux/archivetools/blob/master/archive.sh
While we still use this archive.sh script, dbscripts has recently also be extended to populate the archive continuously. So uploading could be integrated there with a queue file and a background job that performs the upload.
Alternatively the uploader could be kept standalone and just adapted to run more often and to maintain its own database/list to know which packages have already been successfully uploaded and which haven't. I'll call this "state database". Then we could run it every hour or so via a systemd timer and it could upload all new and all failed packages. One thing I'd want to have in this context is that the uploader should exit with an error to let the systemd service fail if a package fails to upload multiple times. I think I'd actually prefer this to be standalone for simplicity.
There is one argument against a standalone tool: each time it runs, it will need to scan the whole filesystem hierarchy to detect new packages, which can be quite slow. One solution is to have dbscripts build a queue of new packages to upload, but then the upload tool would not be completely standalone (it's basically your first solution above). A simpler but less robust way would be to scan only the current year (along with the previous year for a while). Other than this issue, it indeed looks like a good idea to clearly separate this tool from the dbscripts.
In any case, we need a retry mechanism to cope with the case where the upload fails.
This could use the state database I mentioned above. As for the implementation of such a database, I'd suggest sqlite instead of rolling your own text based list or whatever. It's fast and simple, but you get all the fancy stuff, like transactions, for free. You also don't have to deal with recovering the database if the script crashes. sqlite just rolls back uncommited transactions for you.
Would you be interested in adapting the uploader like this and making it an automated service? If you're interested I can help with the deployment part and provide feedback on the scripting side. If you want, we can also discuss this on IRC.
I don't have a lot of time to work on this at the moment, but I'll see what I can do. How urgent is the cleanup on orion? Is it ok to do it in a few weeks/months?
PS: I've whitelisted you on the arch-devops ML so that your replies also get archived.
Ok, thanks! Baptiste