[pacman-dev] Pacman database size study
Anatol Pomozov
anatol.pomozov at gmail.com
Wed Jan 22 16:03:44 UTC 2020
Hello
On Wed, Jan 22, 2020 at 2:23 AM Allan McRae <allan at archlinux.org> wrote:
>
> On 22/1/20 6:54 pm, Anatol Pomozov wrote:
> > The first experiment is to parse db tarfile using the script and then
> > write it back to a file:
> > uncompressed size is 17757184 that is equal to original sample
> > 'zstd -19' compressed size is 4366994 that is 1.0084540990896713
> > times better than original sample
> >
> > Tar *entries* content is identical to the original file. Uncompressed
> > size is exactly the same. Compressed (zstd -19) size is 0.8% better.
> > It comes from the fact that my script does not set entries user/group
> > value and neither sets tar entries modification time. I am not sure if
> > this information is actually used by pacman. Modification time
> > contains a lot of entropy that compressor does not like.
>
> tl;dr
>
> "original" 4366994
> no md5 4188019
> no pgp 1160912
> np md5+pgp 1021667
>
>
> But do any of these numbers stand if you keep the tar file?
I do not fully understand your question here. plainXXX+uncomressed is
a TAR file that matches current db format.
original 17757184
no md5 17536365
no pgp 14085120
no md5/pgp 13248000
But compressed size is what really matters for users. Dropping pgp
signature from db file provides the biggest benefit for compressed
data (3.8 times smaller files).
>
> Also, I find downloading signature files causes a big pause in
> processing the downloads. Is that just a slow connection to the world
> at my end?
*.sig files are small so bandwidth should not be a problem.
My guess is that latency to your Arch mirror is too high and setting
up twice as many ssl connections gives noticeable slowdown. Check if
you use local Australian mirror - it will help to reduce the
connection setup time. Using HTTP over HTTPS might help a bit as well.
But the best solution for your problem is to have a proper pacman
parallel download support. In this case connection setup will run in
parallel thus sharing its setup latency. It would also require less
HTTP/HTTPS connections as HTTP2 supports multiplexing - multiple
downloads from the same server would share single connection.
More information about the pacman-dev
mailing list