On 19/1/20 10:36 am, Eli Schwartz wrote:
On 1/18/20 6:42 PM, Allan McRae wrote:
We previously has the maximum database size as 25MB. This was set in the days before repos had as many packages as they do now, and before we started distributing files databases. Increase this limit to 128MB.
What ever happened to that long-ago idea to make .sig files be downloaded on-demand rather than embedding them into the .db ? This would have the added bonus of making downloads actually not be so big by default...
Well... it makes a difference for sync dbs, but barely makes a difference for files dbs, which is where we are hitting the limit. Also, out download output when downloading .sig files is rather bad...
Another potential optimization ISTR us discussing is making community.files not include the content from community.db, and providing e.g. community.alldb for anyone who needs the combined form.
Many things have been discussed.
Aside for that... we currently use .gz for databases on our official infrastructure. We could get much better compression than that, I'm sure. e.g. why not use xz there? We could even use xz -9, since the databases tend to be fairly conservative in size so optimizing decompression speed by switching to zstd is not really so important IMO, and xz with level -9 compression can beat zstd -20 in both size and compression speed.
That is all Arch Linux related, and decisions that don't happen on this list. But the speed of reading from a zstd file wins greatly over xz, and that should be a consideration.
community.files.gz when recompressed with either xz or zstd drops from 20MB to 15MB; exact numbers look like this:
$ du -b /var/lib/pacman/sync/community.files /tmp/community.files.* 20769830 /var/lib/pacman/sync/community.files 14969268 /tmp/community.files.xz 15090081 /tmp/community.files.zst
...
I'm not really a fan of just bumping the size forever, because it seems to me people who are running into this issue are indeed doing something they shouldn't. A 128MB repository that consumes 128MB of bandwidth on every pacman -Syu just because a single package has been updated is really not nice... I feel like the proper solution is more aggressive compression, figuring out why these .files databases are actually so huge (nodejs packages are probably a really annoying problem because that completely ridiculous language will ship an application composed of several hundred thousand micro-files, and the files database needs to record every single path, so I'd quite like nodejs packaging to die a horrible death), and, if databases are still running into size limits, shipping packages in split repositories.
Splitting out more repositories is not just about "fooling" pacman into splitting the limits up among multiple repos. It's about making a single-package update only trigger an update to one of the splitted repos. This strikes me as exactly the purpose of instituting a size limit to begin with!
None of this is the role of pacman to decide. If a distribution (or custom repo) wants to ship a database file or greater than 25MB in size, I see no reason not to support that in pacman. In fact, I'm of the opinion that the upper limit should be removed altogether. From memory, it was to prevent infinite download DoS type attacks. But we can stop that with a Ctrl+C anyway... With the patchset I'm working on that verifies repos before overwriting the old ones, this becomes even less of an issue. Allan