[pacman-dev] [PATCH] Add a timestamp file into repo tarballs
When creating or modifying repo tarballs, place a .TIMESTAMP file with seconds since epoch in it. This will be used in the future to enable rejecting databases older that a given threshold. Also skip reading the .TIMESTAMP file in sync_db_populate(). Signed-off-by: Allan McRae <allan@archlinux.org> --- Anyone want to check my logic in the sync_db_populate() populate change? Repo-add puts the .TIMESTAMP file first when calling bsdtar, so if present it will be first in the repo db file. Otherwise the first item read from the tarball will be a directory, which we skip reading anyway. So I just read the header for the first item and discard it. lib/libalpm/be_sync.c | 9 +++++++++ scripts/repo-add.sh.in | 8 +++++--- 2 files changed, 14 insertions(+), 3 deletions(-) diff --git a/lib/libalpm/be_sync.c b/lib/libalpm/be_sync.c index 2c76fe83..041b2266 100644 --- a/lib/libalpm/be_sync.c +++ b/lib/libalpm/be_sync.c @@ -506,6 +506,13 @@ static int sync_db_populate(alpm_db_t *db) goto cleanup; } + /* the .TIMESTAMP file will be first entry in the repo archive if present. + * If not, the first entry will be a directory and can be skipped too */ + if((archive_ret = archive_read_next_header(archive, &entry)) != ARCHIVE_OK) { + ret = -1; + goto readfail; + } + while((archive_ret = archive_read_next_header(archive, &entry)) == ARCHIVE_OK) { mode_t mode = archive_entry_mode(entry); if(!S_ISDIR(mode)) { @@ -518,6 +525,8 @@ static int sync_db_populate(alpm_db_t *db) } } } + +readfail: if(archive_ret != ARCHIVE_EOF) { _alpm_log(db->handle, ALPM_LOG_ERROR, _("could not read db '%s' (%s)\n"), db->treename, archive_error_string(archive)); diff --git a/scripts/repo-add.sh.in b/scripts/repo-add.sh.in index caf1232d..c87409f1 100644 --- a/scripts/repo-add.sh.in +++ b/scripts/repo-add.sh.in @@ -526,6 +526,7 @@ create_db() { TAR_OPT=$(verify_repo_extension "$REPO_DB_FILE") # $LOCKFILE is already guaranteed to be absolute so this is safe dirname=${LOCKFILE%/*} + timestamp=$(date +%s) for repo in "db" "files"; do filename=${REPO_DB_PREFIX}.${repo}.${REPO_DB_SUFFIX} @@ -533,12 +534,13 @@ create_db() { tempname=$dirname/.tmp.$filename pushd "$tmpdir/$repo" >/dev/null + echo $timestamp > .TIMESTAMP if ( shopt -s nullglob; files=(*); (( ${#files[*]} )) ); then - bsdtar -c ${TAR_OPT} -f "$tempname" * + bsdtar -c ${TAR_OPT} -f "$tempname" .TIMESTAMP * else - # we have no packages remaining? zip up some emptyness + # we have no packages remaining warning "$(gettext "No packages remain, creating empty database.")" - bsdtar -c ${TAR_OPT} -f "$tempname" -T /dev/null + bsdtar -c ${TAR_OPT} -f "$tempname" .TIMESTAMP fi popd >/dev/null -- 2.23.0
On Tue, Nov 05, 2019 at 11:54:34PM +1000, Allan McRae wrote:
When creating or modifying repo tarballs, place a .TIMESTAMP file with seconds since epoch in it. This will be used in the future to enable rejecting databases older that a given threshold.
Also skip reading the .TIMESTAMP file in sync_db_populate().
Signed-off-by: Allan McRae <allan@archlinux.org> ---
Anyone want to check my logic in the sync_db_populate() populate change? Repo-add puts the .TIMESTAMP file first when calling bsdtar, so if present it will be first in the repo db file. Otherwise the first item read from the tarball will be a directory, which we skip reading anyway. So I just read the header for the first item and discard it.
lib/libalpm/be_sync.c | 9 +++++++++ scripts/repo-add.sh.in | 8 +++++--- 2 files changed, 14 insertions(+), 3 deletions(-)
diff --git a/lib/libalpm/be_sync.c b/lib/libalpm/be_sync.c index 2c76fe83..041b2266 100644 --- a/lib/libalpm/be_sync.c +++ b/lib/libalpm/be_sync.c @@ -506,6 +506,13 @@ static int sync_db_populate(alpm_db_t *db) goto cleanup; }
+ /* the .TIMESTAMP file will be first entry in the repo archive if present. + * If not, the first entry will be a directory and can be skipped too */ + if((archive_ret = archive_read_next_header(archive, &entry)) != ARCHIVE_OK) { + ret = -1; + goto readfail; + } + while((archive_ret = archive_read_next_header(archive, &entry)) == ARCHIVE_OK) { mode_t mode = archive_entry_mode(entry); if(!S_ISDIR(mode)) { @@ -518,6 +525,8 @@ static int sync_db_populate(alpm_db_t *db) } } } + +readfail: if(archive_ret != ARCHIVE_EOF) { _alpm_log(db->handle, ALPM_LOG_ERROR, _("could not read db '%s' (%s)\n"), db->treename, archive_error_string(archive)); diff --git a/scripts/repo-add.sh.in b/scripts/repo-add.sh.in index caf1232d..c87409f1 100644 --- a/scripts/repo-add.sh.in +++ b/scripts/repo-add.sh.in @@ -526,6 +526,7 @@ create_db() { TAR_OPT=$(verify_repo_extension "$REPO_DB_FILE") # $LOCKFILE is already guaranteed to be absolute so this is safe dirname=${LOCKFILE%/*} + timestamp=$(date +%s)
This should probably utilize SOURCE_DATE_EPOCH or something equivalent? timestamp=$(date --date="@${SOURCE_DATE_EPOCH:-$(date +%s)}" +%s))
for repo in "db" "files"; do filename=${REPO_DB_PREFIX}.${repo}.${REPO_DB_SUFFIX} @@ -533,12 +534,13 @@ create_db() { tempname=$dirname/.tmp.$filename
pushd "$tmpdir/$repo" >/dev/null + echo $timestamp > .TIMESTAMP if ( shopt -s nullglob; files=(*); (( ${#files[*]} )) ); then - bsdtar -c ${TAR_OPT} -f "$tempname" * + bsdtar -c ${TAR_OPT} -f "$tempname" .TIMESTAMP * else - # we have no packages remaining? zip up some emptyness + # we have no packages remaining warning "$(gettext "No packages remain, creating empty database.")" - bsdtar -c ${TAR_OPT} -f "$tempname" -T /dev/null + bsdtar -c ${TAR_OPT} -f "$tempname" .TIMESTAMP fi popd >/dev/null
-- 2.23.0
-- Morten Linderud PGP: 9C02FF419FECBE16
On 5/11/19 11:58 pm, Morten Linderud wrote:
On Tue, Nov 05, 2019 at 11:54:34PM +1000, Allan McRae wrote:
When creating or modifying repo tarballs, place a .TIMESTAMP file with seconds since epoch in it. This will be used in the future to enable rejecting databases older that a given threshold.
Also skip reading the .TIMESTAMP file in sync_db_populate().
Signed-off-by: Allan McRae <allan@archlinux.org> ---
<snip>
diff --git a/scripts/repo-add.sh.in b/scripts/repo-add.sh.in index caf1232d..c87409f1 100644 --- a/scripts/repo-add.sh.in +++ b/scripts/repo-add.sh.in @@ -526,6 +526,7 @@ create_db() { TAR_OPT=$(verify_repo_extension "$REPO_DB_FILE") # $LOCKFILE is already guaranteed to be absolute so this is safe dirname=${LOCKFILE%/*} + timestamp=$(date +%s)
This should probably utilize SOURCE_DATE_EPOCH or something equivalent?
timestamp=$(date --date="@${SOURCE_DATE_EPOCH:-$(date +%s)}" +%s))
Why? I can see no reason why it should...
On 11/5/19 9:03 AM, Allan McRae wrote:
On 5/11/19 11:58 pm, Morten Linderud wrote:
On Tue, Nov 05, 2019 at 11:54:34PM +1000, Allan McRae wrote:
When creating or modifying repo tarballs, place a .TIMESTAMP file with seconds since epoch in it. This will be used in the future to enable rejecting databases older that a given threshold.
Also skip reading the .TIMESTAMP file in sync_db_populate().
Signed-off-by: Allan McRae <allan@archlinux.org> ---
<snip>
diff --git a/scripts/repo-add.sh.in b/scripts/repo-add.sh.in index caf1232d..c87409f1 100644 --- a/scripts/repo-add.sh.in +++ b/scripts/repo-add.sh.in @@ -526,6 +526,7 @@ create_db() { TAR_OPT=$(verify_repo_extension "$REPO_DB_FILE") # $LOCKFILE is already guaranteed to be absolute so this is safe dirname=${LOCKFILE%/*} + timestamp=$(date +%s)
This should probably utilize SOURCE_DATE_EPOCH or something equivalent?
timestamp=$(date --date="@${SOURCE_DATE_EPOCH:-$(date +%s)}" +%s))
Why? I can see no reason why it should...
I don't either see value in "reproducible builds" for the actual state of the database. It's just a series of plaintext pointers to some other (hopefully reproducible) packages. If we actually did want to respect SOURCE_DATE_EPOCH, we'd need to do a lot more, like doing that for the bsdtar metadata (both file timestamps and file owners, probably sort files too, etc.) but again, I don't see how this protects the supply chain. -- Eli Schwartz Bug Wrangler and Trusted User
On Tue, Nov 05, 2019 at 09:12:33AM -0500, Eli Schwartz wrote:
On 11/5/19 9:03 AM, Allan McRae wrote:
On 5/11/19 11:58 pm, Morten Linderud wrote:
On Tue, Nov 05, 2019 at 11:54:34PM +1000, Allan McRae wrote:
When creating or modifying repo tarballs, place a .TIMESTAMP file with seconds since epoch in it. This will be used in the future to enable rejecting databases older that a given threshold.
Also skip reading the .TIMESTAMP file in sync_db_populate().
Signed-off-by: Allan McRae <allan@archlinux.org> ---
<snip>
diff --git a/scripts/repo-add.sh.in b/scripts/repo-add.sh.in index caf1232d..c87409f1 100644 --- a/scripts/repo-add.sh.in +++ b/scripts/repo-add.sh.in @@ -526,6 +526,7 @@ create_db() { TAR_OPT=$(verify_repo_extension "$REPO_DB_FILE") # $LOCKFILE is already guaranteed to be absolute so this is safe dirname=${LOCKFILE%/*} + timestamp=$(date +%s)
This should probably utilize SOURCE_DATE_EPOCH or something equivalent?
timestamp=$(date --date="@${SOURCE_DATE_EPOCH:-$(date +%s)}" +%s))
Why? I can see no reason why it should...
I don't either see value in "reproducible builds" for the actual state of the database. It's just a series of plaintext pointers to some other (hopefully reproducible) packages.
If we actually did want to respect SOURCE_DATE_EPOCH, we'd need to do a lot more, like doing that for the bsdtar metadata (both file timestamps and file owners, probably sort files too, etc.) but again, I don't see how this protects the supply chain.
Hmm, should probably discuss the threat model or attack vectors in #archlinux-reproducible. -- Morten Linderud PGP: 9C02FF419FECBE16
On Wed, Nov 06, 2019 at 12:03:17AM +1000, Allan McRae wrote:
On 5/11/19 11:58 pm, Morten Linderud wrote:
On Tue, Nov 05, 2019 at 11:54:34PM +1000, Allan McRae wrote:
When creating or modifying repo tarballs, place a .TIMESTAMP file with seconds since epoch in it. This will be used in the future to enable rejecting databases older that a given threshold.
Also skip reading the .TIMESTAMP file in sync_db_populate().
Signed-off-by: Allan McRae <allan@archlinux.org> ---
<snip>
diff --git a/scripts/repo-add.sh.in b/scripts/repo-add.sh.in index caf1232d..c87409f1 100644 --- a/scripts/repo-add.sh.in +++ b/scripts/repo-add.sh.in @@ -526,6 +526,7 @@ create_db() { TAR_OPT=$(verify_repo_extension "$REPO_DB_FILE") # $LOCKFILE is already guaranteed to be absolute so this is safe dirname=${LOCKFILE%/*} + timestamp=$(date +%s)
This should probably utilize SOURCE_DATE_EPOCH or something equivalent?
timestamp=$(date --date="@${SOURCE_DATE_EPOCH:-$(date +%s)}" +%s))
Why? I can see no reason why it should...
If you wan't to write tests for `repo-add` in the future, I think it will be beneficial to be able to create consistent databases. Outside of pacman I believe being able to reproduce any artifact produced is desireable. Enables us to not run through hoops recreating past database files given the correct packages. -- Morten Linderud PGP: 9C02FF419FECBE16
participants (3)
-
Allan McRae
-
Eli Schwartz
-
Morten Linderud