[arch-dev-public] git packages and checksums
Hi, As more of our official packages use git sources, I'd like to suggest we always enforce some kind of checksum verification. More specifically, I'd like us to avoid using straightforward source arrays such as: source=("git://github.com/systemd/systemd.git#tag=v$pkgver") md5sums=('SKIP') Instead I suggest we use the full commit hash. In the example above, that'd become something like: _commit=9a50ce20ef60263a6c88c29470ce761fcc424f2d source=("git://github.com/systemd/systemd.git#commit=$_commit") md5sums=('SKIP') Does that sound like a good idea? -- Gaetan
Hi On Sat, Jul 18, 2015 at 1:04 PM, Gaetan Bisson <bisson@archlinux.org> wrote:
Hi,
As more of our official packages use git sources, I'd like to suggest we always enforce some kind of checksum verification. More specifically, I'd like us to avoid using straightforward source arrays such as:
source=("git://github.com/systemd/systemd.git#tag=v$pkgver") md5sums=('SKIP')
Instead I suggest we use the full commit hash. In the example above, that'd become something like:
_commit=9a50ce20ef60263a6c88c29470ce761fcc424f2d source=("git://github.com/systemd/systemd.git#commit=$_commit") md5sums=('SKIP')
Would it be better to improve *sums=() function to work with directories? This will also help svn/hg based packages. A simple solution is to tar whole directory and then calculate the checksum: tar -c $DIR | md5sum
Does that sound like a good idea?
-- Gaetan
[2015-07-18 15:13:43 -0700] Anatol Pomozov:
On Sat, Jul 18, 2015 at 1:04 PM, Gaetan Bisson <bisson@archlinux.org> wrote:
Instead I suggest we use the full commit hash. In the example above, that'd become something like:
_commit=9a50ce20ef60263a6c88c29470ce761fcc424f2d source=("git://github.com/systemd/systemd.git#commit=$_commit") md5sums=('SKIP')
Would it be better to improve *sums=() function to work with directories? This will also help svn/hg based packages.
A simple solution is to tar whole directory and then calculate the checksum:
tar -c $DIR | md5sum
This involves file attributes, so it seems the md5sum would change any time you do a new `git clone` even if no actual content has changed. Also I think the commit hash is an intrinsically better value because it is explicitly published by upstream. Just as checksums are (or should be) published next to release tarballs. Cheers. -- Gaetan
On Sat, Jul 18, 2015 at 01:10:29PM -1000, Gaetan Bisson wrote:
[2015-07-18 15:13:43 -0700] Anatol Pomozov:
On Sat, Jul 18, 2015 at 1:04 PM, Gaetan Bisson <bisson@archlinux.org> wrote:
Instead I suggest we use the full commit hash. In the example above, that'd become something like:
_commit=9a50ce20ef60263a6c88c29470ce761fcc424f2d source=("git://github.com/systemd/systemd.git#commit=$_commit") md5sums=('SKIP')
Would it be better to improve *sums=() function to work with directories? This will also help svn/hg based packages.
A simple solution is to tar whole directory and then calculate the checksum:
tar -c $DIR | md5sum
This involves file attributes, so it seems the md5sum would change any time you do a new `git clone` even if no actual content has changed.
Also I think the commit hash is an intrinsically better value because it is explicitly published by upstream. Just as checksums are (or should be) published next to release tarballs.
Tags are more explicitly published by upstreams than commit hashes. I'm not sure I understand the benefit of switching. Why is it preferrable to use the "value" rather than the "pointer"? What makes it better? dR
[2015-07-18 22:32:47 -0400] Dave Reisner:
Tags are more explicitly published by upstreams than commit hashes. I'm not sure I understand the benefit of switching. Why is it preferrable to use the "value" rather than the "pointer"? What makes it better?
The commit hash is a checksum that ensures the integrity of the particular source tree you want. The tag, however, provides no information to verify the integrity. In other words, if someone hijacks your DNS resolver, github.com, or any other part of your connection to the git server, they can feed you malicious data and #tag=$version will never notice, while #commit=hash will. -- Gaetan
On 19 July 2015 at 05:43, Gaetan Bisson <bisson@archlinux.org> wrote:
[2015-07-18 22:32:47 -0400] Dave Reisner:
Tags are more explicitly published by upstreams than commit hashes. I'm not sure I understand the benefit of switching. Why is it preferrable to use the "value" rather than the "pointer"? What makes it better?
The commit hash is a checksum that ensures the integrity of the particular source tree you want. The tag, however, provides no information to verify the integrity.
In other words, if someone hijacks your DNS resolver, github.com, or any other part of your connection to the git server, they can feed you malicious data and #tag=$version will never notice, while #commit=hash will.
-- Gaetan
git tags can and should be pgp-signed, especially if the upstream is relying purely on git for releases. Is any package not covered by that? J. Leclanche
[2015-07-19 06:52:39 +0200] Jerome Leclanche:
git tags can and should be pgp-signed, especially if the upstream is relying purely on git for releases. Is any package not covered by that?
That would certainly be the ideal way of doing things but I don't believe pacman currently knows how to verify these. Cheers. -- Gaetan
On 19/07/15 15:29, Gaetan Bisson wrote:
[2015-07-19 06:52:39 +0200] Jerome Leclanche:
git tags can and should be pgp-signed, especially if the upstream is relying purely on git for releases. Is any package not covered by that?
That would certainly be the ideal way of doing things but I don't believe pacman currently knows how to verify these.
I guess that would be easy to add into makepkg. Look at scripts/libmakepkg/source/git.sh in the pacman.git tree... A
On 18/07, Gaetan Bisson wrote:
[2015-07-18 22:32:47 -0400] Dave Reisner:
Tags are more explicitly published by upstreams than commit hashes. I'm not sure I understand the benefit of switching. Why is it preferrable to use the "value" rather than the "pointer"? What makes it better?
The commit hash is a checksum that ensures the integrity of the particular source tree you want. The tag, however, provides no information to verify the integrity.
In other words, if someone hijacks your DNS resolver, github.com, or any other part of your connection to the git server, they can feed you malicious data and #tag=$version will never notice, while #commit=hash will.
Not to mention that it also prevents upstream from silently changing a tag, so that the package built will no longer be the same. -- Sincerely, Johannes Löthberg PGP Key ID: 0x50FB9B273A9D0BB5 https://theos.kyriasis.com/~kyrias/
Hi On Sat, Jul 18, 2015 at 4:10 PM, Gaetan Bisson <bisson@archlinux.org> wrote:
[2015-07-18 15:13:43 -0700] Anatol Pomozov:
On Sat, Jul 18, 2015 at 1:04 PM, Gaetan Bisson <bisson@archlinux.org> wrote:
Instead I suggest we use the full commit hash. In the example above, that'd become something like:
_commit=9a50ce20ef60263a6c88c29470ce761fcc424f2d source=("git://github.com/systemd/systemd.git#commit=$_commit") md5sums=('SKIP')
Would it be better to improve *sums=() function to work with directories? This will also help svn/hg based packages.
A simple solution is to tar whole directory and then calculate the checksum:
tar -c $DIR | md5sum
This involves file attributes, so it seems the md5sum would change any time you do a new `git clone` even if no actual content has changed.
tar has options to control file attributes added to the archive. For your case this will be '--mtime=0'. Instead of tar it is possible to use something like hashdeep [1] or just plain 'find' + {md5,sha1}sums. The point is that we already have a way to describe checksums for sources. It would be great to extend it to cases like VCS based releases (git, svn, hg, ...). [1] https://github.com/jessek/hashdeep/
Also I think the commit hash is an intrinsically better value because it is explicitly published by upstream. Just as checksums are (or should be) published next to release tarballs.
Cheers.
-- Gaetan
participants (6)
-
Allan McRae
-
Anatol Pomozov
-
Dave Reisner
-
Gaetan Bisson
-
Jerome Leclanche
-
Johannes Löthberg