[pacman-dev] [PATCH] makepkg: introduce SOURCE_DATE_EPOCH
This patch introduces the SOURCE_DATE_EPOCH environmental variable. All files in a package are adjusted to have their modification dates set to the value of SOURCE_DATE_EPOCH, which defaults to "date +%s". Setting this variable allows a package that is built twice in the same environment to be (potentially) reproducible in that the checksum of the generated package file will be the same. Signed-off-by: Allan McRae <allan@archlinux.org> --- scripts/makepkg.sh.in | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/scripts/makepkg.sh.in b/scripts/makepkg.sh.in index c019ae3b..529b51f7 100644 --- a/scripts/makepkg.sh.in +++ b/scripts/makepkg.sh.in @@ -87,6 +87,8 @@ SPLITPKG=0 SOURCEONLY=0 VERIFYSOURCE=0 +SOURCE_DATE_EPOCH=${SOURCE_DATE_EPOCH:-$(date +%s)} + PACMAN_OPTS=() shopt -s extglob @@ -620,7 +622,6 @@ write_kv_pair() { } write_pkginfo() { - local builddate=$(date -u "+%s") if [[ -n $PACKAGER ]]; then local packager="$PACKAGER" else @@ -654,7 +655,7 @@ write_pkginfo() { write_kv_pair "pkgdesc" "$spd" write_kv_pair "url" "$url" - write_kv_pair "builddate" "$builddate" + write_kv_pair "builddate" "$SOURCE_DATE_EPOCH" write_kv_pair "packager" "$packager" write_kv_pair "size" "$size" write_kv_pair "arch" "$pkgarch" @@ -738,10 +739,14 @@ create_package() { [[ -f $pkg_file ]] && rm -f "$pkg_file" [[ -f $pkg_file.sig ]] && rm -f "$pkg_file.sig" + # ensure all elements of the package have the same mtime + find . -exec touch -d @$SOURCE_DATE_EPOCH {} \; + msg2 "$(gettext "Generating .MTREE file...")" - list_package_files | LANG=C bsdtar -cnzf .MTREE --format=mtree \ + list_package_files | LANG=C bsdtar -cnf - --format=mtree \ --options='!all,use-set,type,uid,gid,mode,time,size,md5,sha256,link' \ - --null --files-from - --exclude .MTREE + --null --files-from - --exclude .MTREE | gzip -c -f -n > .MTREE + touch -d @$SOURCE_DATE_EPOCH .MTREE msg2 "$(gettext "Compressing package...")" # TODO: Maybe this can be set globally for robustness -- 2.12.0
On 17/04/17 20:41, Allan McRae wrote:
+ # ensure all elements of the package have the same mtime + find . -exec touch -d @$SOURCE_DATE_EPOCH {} \; + msg2 "$(gettext "Generating .MTREE file...")" - list_package_files | LANG=C bsdtar -cnzf .MTREE --format=mtree \ + list_package_files | LANG=C bsdtar -cnf - --format=mtree \ --options='!all,use-set,type,uid,gid,mode,time,size,md5,sha256,link' \ - --null --files-from - --exclude .MTREE + --null --files-from - --exclude .MTREE | gzip -c -f -n > .MTREE + touch -d @$SOURCE_DATE_EPOCH .MTREE
msg2 "$(gettext "Compressing package...")" # TODO: Maybe this can be set globally for robustness
These touch commands have had a -h added. A
On 04/17/17 at 08:41pm, Allan McRae wrote:
This patch introduces the SOURCE_DATE_EPOCH environmental variable. All files in a package are adjusted to have their modification dates set to the value of SOURCE_DATE_EPOCH, which defaults to "date +%s".
Setting this variable allows a package that is built twice in the same environment to be (potentially) reproducible in that the checksum of the generated package file will be the same.
Signed-off-by: Allan McRae <allan@archlinux.org>
I'm of the opinion that makepkg is the wrong place to work on reproducible builds. We could probably take care of the low-hanging fruit directly in makepkg, but a number of packages are going to require more find-grained control over the environment then I think we should be putting in makepkg. If you look at `perl -V`, for instance, it embeds the output of `uname -a` and a timestamp directly in the executable. I suspect that any effort we put into reproducible builds with makepkg would eventually have to be duplicated with a more powerful wrapper script in order to handle packages like perl that record more of their environment than we should be manipulating in makepkg. apg
On 04/17/17 at 10:04pm, Allan McRae wrote:
On 17/04/17 20:41, Allan McRae wrote:
+ # ensure all elements of the package have the same mtime + find . -exec touch -d @$SOURCE_DATE_EPOCH {} \; + msg2 "$(gettext "Generating .MTREE file...")" - list_package_files | LANG=C bsdtar -cnzf .MTREE --format=mtree \ + list_package_files | LANG=C bsdtar -cnf - --format=mtree \ --options='!all,use-set,type,uid,gid,mode,time,size,md5,sha256,link' \ - --null --files-from - --exclude .MTREE + --null --files-from - --exclude .MTREE | gzip -c -f -n > .MTREE + touch -d @$SOURCE_DATE_EPOCH .MTREE
msg2 "$(gettext "Compressing package...")" # TODO: Maybe this can be set globally for robustness
These touch commands have had a -h added.
touch -h and date %s are not POSIX, are they available everywhere we support? Why the change to gzip for .MTREE? apg
On 04/17/2017 03:34 PM, Andrew Gregory wrote:
On 04/17/17 at 08:41pm, Allan McRae wrote:
This patch introduces the SOURCE_DATE_EPOCH environmental variable. All files in a package are adjusted to have their modification dates set to the value of SOURCE_DATE_EPOCH, which defaults to "date +%s".
Setting this variable allows a package that is built twice in the same environment to be (potentially) reproducible in that the checksum of the generated package file will be the same.
Signed-off-by: Allan McRae <allan@archlinux.org>
I'm of the opinion that makepkg is the wrong place to work on reproducible builds. We could probably take care of the low-hanging fruit directly in makepkg, but a number of packages are going to require more find-grained control over the environment then I think we should be putting in makepkg. If you look at `perl -V`, for instance, it embeds the output of `uname -a` and a timestamp directly in the executable. I suspect that any effort we put into reproducible builds with makepkg would eventually have to be duplicated with a more powerful wrapper script in order to handle packages like perl that record more of their environment than we should be manipulating in makepkg.
apg
Makepkg is the place that we control and need to work on to make packages created by makepkg reproducible. Currently they are not exactly because of the reasons these patches address and there is literally no way to get reproducible package artifacts without these patches. Especially the deterministic way to pass in SOURCE_DATE_EPOCH is a requirement for cases you mentioned and downstream projects using dates in any produced artifacts should implement SOURCE_DATE_EPOCH. An incredible high amount of projects already do so and more and more adopt as this is getting infacto a standard (actually it already is). No complex wrapper scripts should be needed at any place to achieve reproducibility. cheers, Levente
On 04/17/17 at 03:53pm, Levente Polyak wrote:
On 04/17/2017 03:34 PM, Andrew Gregory wrote:
On 04/17/17 at 08:41pm, Allan McRae wrote:
This patch introduces the SOURCE_DATE_EPOCH environmental variable. All files in a package are adjusted to have their modification dates set to the value of SOURCE_DATE_EPOCH, which defaults to "date +%s".
Setting this variable allows a package that is built twice in the same environment to be (potentially) reproducible in that the checksum of the generated package file will be the same.
Signed-off-by: Allan McRae <allan@archlinux.org>
I'm of the opinion that makepkg is the wrong place to work on reproducible builds. We could probably take care of the low-hanging fruit directly in makepkg, but a number of packages are going to require more find-grained control over the environment then I think we should be putting in makepkg. If you look at `perl -V`, for instance, it embeds the output of `uname -a` and a timestamp directly in the executable. I suspect that any effort we put into reproducible builds with makepkg would eventually have to be duplicated with a more powerful wrapper script in order to handle packages like perl that record more of their environment than we should be manipulating in makepkg.
apg
Makepkg is the place that we control and need to work on to make packages created by makepkg reproducible. Currently they are not exactly because of the reasons these patches address and there is literally no way to get reproducible package artifacts without these patches. Especially the deterministic way to pass in SOURCE_DATE_EPOCH is a requirement for cases you mentioned and downstream projects using dates in any produced artifacts should implement SOURCE_DATE_EPOCH. An incredible high amount of projects already do so and more and more adopt as this is getting infacto a standard (actually it already is). No complex wrapper scripts should be needed at any place to achieve reproducibility.
cheers, Levente
I have no problem with making makepkg's own output more controllable (e.g. allowing builddate to be set rather than using the current time). But, a lot of the time, reproducing an identical package is going to require a very precise environment, especially for compiled software. The environmental factors that influence the built software vary from project to project and can get their values from a variety of locations. I think that trying to manage all of that from makepkg would be a mistake if it would even be possible. Some things, like building in a chroot for software that embeds the build directory, would almost certainly be easier from a script that wraps makepkg. I would prefer to see effort be put toward such a script rather than have it go into makepkg only to have to be moved to a separate script later. apg
On 04/17/2017 08:42 PM, Andrew Gregory wrote:
I have no problem with making makepkg's own output more controllable (e.g. allowing builddate to be set rather than using the current time). But, a lot of the time, reproducing an identical package is going to require a very precise environment, especially for compiled software. The environmental factors that influence the built software vary from project to project and can get their values from a variety of locations. I think that trying to manage all of that from makepkg would be a mistake if it would even be possible. Some things, like building in a chroot for software that embeds the build directory, would almost certainly be easier from a script that wraps makepkg. I would prefer to see effort be put toward such a script rather than have it go into makepkg only to have to be moved to a separate script later.
apg
I fully agree with your points... actually exactly that is the plan and the reason the .BUILDINFO file exists -- to be able to recreate the very precise environment that was used to build a package. This is of cause needed, as you mentioned, for things like some binary software (gcc version)... but we actually include the .BUILDINFO file into the package itself. This has IMO a lot of advantages but that already declares the requirement to have an exact identical environment to be reproducible. The current set of adjustments are needed for makepkg itself. I'm sure nobody intends to go lot further and include environment recreation things or explicit software dependent stuff (like PERL_BUILD_DATE). makechrootpkg and things like that are project (like Arch) specific. Surely there will be the need of a wrapper around it to recreate an identical environment from the .BUILDINFO file to be able to reproduce a package beyond invoking it twice (something like makerepropkg). On top of that, there will always be some need to add some things to PKGBUILD files that are software dependent. An example would be to define PERL_BUILD_DATE="${SOURCE_DATE_EPOCH}" and i agree that something like PERL_BUILD_DATE is not to be included in makepkg itself. I hope i could settle some of your concerns :) cheers, Levente
On 17/04/17 23:37, Andrew Gregory wrote:
On 04/17/17 at 10:04pm, Allan McRae wrote:
On 17/04/17 20:41, Allan McRae wrote:
+ # ensure all elements of the package have the same mtime + find . -exec touch -d @$SOURCE_DATE_EPOCH {} \; + msg2 "$(gettext "Generating .MTREE file...")" - list_package_files | LANG=C bsdtar -cnzf .MTREE --format=mtree \ + list_package_files | LANG=C bsdtar -cnf - --format=mtree \ --options='!all,use-set,type,uid,gid,mode,time,size,md5,sha256,link' \ - --null --files-from - --exclude .MTREE + --null --files-from - --exclude .MTREE | gzip -c -f -n > .MTREE + touch -d @$SOURCE_DATE_EPOCH .MTREE
msg2 "$(gettext "Compressing package...")" # TODO: Maybe this can be set globally for robustness
These touch commands have had a -h added.
touch -h and date %s are not POSIX, are they available everywhere we support?
Why the change to gzip for .MTREE?
A timestamp is embed in a gz file unless gzip -n is used. A
On 17/04/17 23:37, Andrew Gregory wrote:
On 04/17/17 at 10:04pm, Allan McRae wrote:
On 17/04/17 20:41, Allan McRae wrote:
+ # ensure all elements of the package have the same mtime + find . -exec touch -d @$SOURCE_DATE_EPOCH {} \; + msg2 "$(gettext "Generating .MTREE file...")" - list_package_files | LANG=C bsdtar -cnzf .MTREE --format=mtree \ + list_package_files | LANG=C bsdtar -cnf - --format=mtree \ --options='!all,use-set,type,uid,gid,mode,time,size,md5,sha256,link' \ - --null --files-from - --exclude .MTREE + --null --files-from - --exclude .MTREE | gzip -c -f -n > .MTREE + touch -d @$SOURCE_DATE_EPOCH .MTREE
msg2 "$(gettext "Compressing package...")" # TODO: Maybe this can be set globally for robustness
These touch commands have had a -h added.
touch -h and date %s are not POSIX, are they available everywhere we support?
touch -h is in BSDs. date +%s is mentioned in the FreeBSD man page, so I assume it works. A
On 17/04/17 23:34, Andrew Gregory wrote:
On 04/17/17 at 08:41pm, Allan McRae wrote:
This patch introduces the SOURCE_DATE_EPOCH environmental variable. All files in a package are adjusted to have their modification dates set to the value of SOURCE_DATE_EPOCH, which defaults to "date +%s".
Setting this variable allows a package that is built twice in the same environment to be (potentially) reproducible in that the checksum of the generated package file will be the same.
Signed-off-by: Allan McRae <allan@archlinux.org>
I'm of the opinion that makepkg is the wrong place to work on reproducible builds. We could probably take care of the low-hanging fruit directly in makepkg, but a number of packages are going to require more find-grained control over the environment then I think we should be putting in makepkg. If you look at `perl -V`, for instance, it embeds the output of `uname -a` and a timestamp directly in the executable. I suspect that any effort we put into reproducible builds with makepkg would eventually have to be duplicated with a more powerful wrapper script in order to handle packages like perl that record more of their environment than we should be manipulating in makepkg.
I agree that makepkg is not the place for much of this. However, the SOURCE_DATE_EPOCH variable is a standard and we require makepkg to understand it and make a few other minor changes for any tool to have a chance of recreating a package from its PKGBUILD and .BUILDINFO file. I am not looking to extend the changes beyond this initial patchset. Allan
participants (3)
-
Allan McRae
-
Andrew Gregory
-
Levente Polyak