[arch-dev-public] Use detached package signatures by default
TLDR; let’s start using detached package signatures to make system updates faster. Hi folks, Some time ago there was a discussion at IRC where someone (Allan maybe?) proposed to stop using embedded PGP signatures in favor of detached signature files. I would like to bring this idea here and quantify it with some numbers. Here is a bit of technical details on this topic. Pacman has the ability to verify authenticity of package files with PGP signatures. PGP signatures add protection against undesired package modifications by a third-party and it improves security aspects of the package management. This feature can be configured per repository and the official Arch Linux repos have it enabled. Package signatures have been used by Arch Linux successfully for a couple of years now. Package signatures are stored as a part of a pacman database file (it is called “embedded signatures”). One issue with embedded signatures is that they represent a quite large chunk of database file. What is worse, a PGP signature is high-entropy data and does not compress well. I was mildly shocked to learn how much of the *.db files signatures consume. I ran experiments and repackaged extra, community databases without PGP data. For uncompressed “extra” repository size drops to 83% of its original size (though uncompressed size is not that interesting). Arch uses GZIP compressed database and in this case removing signatures reduces the “extra” database to 36.8% of its original size. To emphasize it one more time - removing PGP signatures makes this repo only 1/3 of its original size. The change is even more dramatic in case of “zstd -19” compression where the final database file is only 31% of its original size. For community.db the numbers are: uncompressed file gets 79.8% of its original size, “gzip -9” gets 33.4%, and with “zstd -19” it gets 27.51% of its original size. A database gets modified with every package update. Users need to re-download the databases where 2/3 of it are package signatures that are used only when a specific package is installed. An alternative to embedded signatures are detached signatures. These are signatures stored in a separate file next to the package itself (in a <pkg>.sig file to be specific). Instead of downloading *all* signatures every time a database is updated, detached signatures are downloaded only when a specific package is installed/updated. If Arch could switch to this model then database files become 3 times smaller that saves users bandwidth and system update time. I looked through pacman code and most components have detached signatures support already. Most of the places have a logic like this: if(pkg->embedded_sig) { use(pkg->embedded_sig) } else { sig = load_detached_sig(pkg) use(sig) } I found only 2 places where pacman does not fallback to a detached signature: 1) Keyring key check. Pacman was using embedded signatures only. This has been fixed in pacman’s commit b01bcc7d3d680 and it will be available in pacman version 6.x 2) dump_pkg_full() that dump package information. If a package uses detached signatures only then it prints “None”. I think this is fine as this function displays database entries and it does not affect the package verification process. I disabled the embedded signatures at my testing machine to use detached signatures only and things look great so far. ‘pacman --debug’ confirms that detached signatures are correctly downloaded and used to verify the package content. Given this information I would like to propose to stop using embedded signatures and move to detached signatures by default. This will require pacman 6.x or as alternative backport the fix(es) to 5.x branch. It will help to make system updates even faster, something that me and many other Arch users really love.
TLDR; let’s start using detached package signatures to make system updates faster. That all sounds great, but it's really down to how repo-add does its
On 7/8/20 11:05 PM, Anatol Pomozov via arch-dev-public wrote: thing. So maybe this belongs on pacman-dev? -- Eli Schwartz Bug Wrangler and Trusted User
On 9/7/20 1:05 pm, Anatol Pomozov wrote:
Given this information I would like to propose to stop using embedded signatures and move to detached signatures by default. This will require pacman 6.x or as alternative backport the fix(es) to 5.x branch. It will help to make system updates even faster, something that me and many other Arch users really love.
There are several steps we need to complete: 1) backport the patch (or wait for pacman-6.0, which may be a while yet). I'll leave that to the distro packagers to decide! 2) adjust repo-add to optionally add signatures. 3) make a time line that all users need to have the patched/released pacman installed - we usually require at least 6 months. 4) turn off signature inclusion in repo dbs. Allan
Hi On Wed, Jul 8, 2020 at 8:22 PM Allan McRae via arch-dev-public <arch-dev-public@archlinux.org> wrote:
On 9/7/20 1:05 pm, Anatol Pomozov wrote:
Given this information I would like to propose to stop using embedded signatures and move to detached signatures by default. This will require pacman 6.x or as alternative backport the fix(es) to 5.x branch. It will help to make system updates even faster, something that me and many other Arch users really love.
There are several steps we need to complete:
1) backport the patch (or wait for pacman-6.0, which may be a while yet). I'll leave that to the distro packagers to decide!
2) adjust repo-add to optionally add signatures.
3) make a time line that all users need to have the patched/released pacman installed - we usually require at least 6 months.
4) turn off signature inclusion in repo dbs.
It sounds great. If we go this route for pacman 6.0 then it will take about 1 year to switch to the detached signatures. As it is quite an important change I would love to see its codepath tested as much as possible before we remove the embedded signatures from pacman database files. It will help to catch issues like https://bugs.archlinux.org/task/67232. What do you think about starting to use detached signatures by default *and* having embedded signatures as a backup option for time being? i.e. pacman database will have the signatures (the same as now) but it will be ignored. Instead pacman will use the detached *.sig files. And in case if there is a major issue with this implementation then a user would be able to switch back to embedded signatures using a pacman.conf option (e.g. "UseEmbeddedSignatures"). If folks are fine with it I can implement a patch for it.
Em julho 28, 2020 16:26 Anatol Pomozov via arch-dev-public escreveu:
It sounds great. If we go this route for pacman 6.0 then it will take about 1 year to switch to the detached signatures.
As it is quite an important change I would love to see its codepath tested as much as possible before we remove the embedded signatures from pacman database files. It will help to catch issues like https://bugs.archlinux.org/task/67232.
What do you think about starting to use detached signatures by default *and* having embedded signatures as a backup option for time being? i.e. pacman database will have the signatures (the same as now) but it will be ignored. Instead pacman will use the detached *.sig files. And in case if there is a major issue with this implementation then a user would be able to switch back to embedded signatures using a pacman.conf option (e.g. "UseEmbeddedSignatures"). If folks are fine with it I can implement a patch for it.
Hi Anatol, Can't we go with a different option here? Instead of an option the user sets on their end, we make pacman fallback to embedded db sigs, if there are no detached *or* if the signature check fails for some reason. This could be maintained as a patch on the package, it doesn't necessarily have to be on pacman's code itself. Just so we make this transition as painless as possible to users. Regards, Giancarlo Razzolini
Hi Giancarlo On Tue, Jul 28, 2020 at 12:35 PM Giancarlo Razzolini <grazzolini@archlinux.org> wrote:
This could be maintained as a patch on the package, it doesn't necessarily have to be on pacman's code itself. Just so we make this transition as painless as possible to users.
Having a seamless transition to the new technology is definitely a top priority here.
Can't we go with a different option here? Instead of an option the user sets on their end, we make pacman fallback to embedded db sigs, if there are no detached *or* if the signature check fails for some reason.
The detached signatures are generated by makepkg toolset since a long time ago. *.sig files are already in the Arch standard repository. I also looked through a dozen of random repos at https://wiki.archlinux.org/index.php/Unofficial_user_repositories and all of them have *.sig files for the packages. At this point we are trying to enable the detached signatures handling at the client side while having a backup option to disable it. Let me know about a specific situation when detached signatures cause an issue.
On 09/07/2020 05:05, Anatol Pomozov via arch-dev-public wrote:
TLDR; let’s start using detached package signatures to make system updates faster.
Hi folks,
Some time ago there was a discussion at IRC where someone (Allan maybe?) proposed to stop using embedded PGP signatures in favor of detached signature files. I would like to bring this idea here and quantify it with some numbers.
The downside of not having the package signatures in the database is that consumers can not easily obtain this information. For archweb that's showing who signed the package on the package details page. How would I implement an efficient alternative without fetching package files or all the sig files? A separate sig database? :P As far now I'll have to adjust the code not to break because of a missing PGPSIG entry.
Here is a bit of technical details on this topic. Pacman has the ability to verify authenticity of package files with PGP signatures. PGP signatures add protection against undesired package modifications by a third-party and it improves security aspects of the package management. This feature can be configured per repository and the official Arch Linux repos have it enabled. Package signatures have been used by Arch Linux successfully for a couple of years now.
<snip>
An alternative to embedded signatures are detached signatures. These are signatures stored in a separate file next to the package itself (in a <pkg>.sig file to be specific). Instead of downloading *all* signatures every time a database is updated, detached signatures are downloaded only when a specific package is installed/updated. If Arch could switch to this model then database files become 3 times smaller that saves users bandwidth and system update time.
It would be insightful to provide the database numbers, because one could argue 30% of 1MB is nothing, as 30% of 100M is nice improvement. Our biggest database should be community (5M atm), and with all the savings that would now be ~ 2 MB? Would be nice to have an overview of the real life numbers :) Greetings, Jelle van der Waa
Hi Jelle On Thu, Jul 9, 2020 at 2:00 AM Jelle van der Waa <jelle@vdwaa.nl> wrote:
On 09/07/2020 05:05, Anatol Pomozov via arch-dev-public wrote:
TLDR; let’s start using detached package signatures to make system updates faster.
Hi folks,
Some time ago there was a discussion at IRC where someone (Allan maybe?) proposed to stop using embedded PGP signatures in favor of detached signature files. I would like to bring this idea here and quantify it with some numbers.
The downside of not having the package signatures in the database is that consumers can not easily obtain this information. For archweb that's showing who signed the package on the package details page.
How would I implement an efficient alternative without fetching package files or all the sig files? A separate sig database? :P
The best option is to download and parse the signature file directly. Its filename is going to be <pkgfilename>.sig where <pkgfilename> is available in a package description as %FILENAME% entry.
As far now I'll have to adjust the code not to break because of a missing PGPSIG entry.
Here is a bit of technical details on this topic. Pacman has the ability to verify authenticity of package files with PGP signatures. PGP signatures add protection against undesired package modifications by a third-party and it improves security aspects of the package management. This feature can be configured per repository and the official Arch Linux repos have it enabled. Package signatures have been used by Arch Linux successfully for a couple of years now.
<snip>
An alternative to embedded signatures are detached signatures. These are signatures stored in a separate file next to the package itself (in a <pkg>.sig file to be specific). Instead of downloading *all* signatures every time a database is updated, detached signatures are downloaded only when a specific package is installed/updated. If Arch could switch to this model then database files become 3 times smaller that saves users bandwidth and system update time.
It would be insightful to provide the database numbers, because one could argue 30% of 1MB is nothing, as 30% of 100M is nice improvement.
Our biggest database should be community (5M atm), and with all the savings that would now be ~ 2 MB? Would be nice to have an overview of the real life numbers :)
For compressed "community" database the savings are going to be 5.2M -> 1.73M (gzip) or 1.26M (zstd -19). With other dbs I would say that for an average user we are looking at 7M->2.2M total savings in the database size. Keep in mind that database downloading/parsing is located at the critical path. Every user downloads these db files pretty much every time "pacman -Sy" is run. Detached signatures make this step faster by reducing the workload and downloading signatures on-demand later.
participants (5)
-
Allan McRae
-
Anatol Pomozov
-
Eli Schwartz
-
Giancarlo Razzolini
-
Jelle van der Waa