[pacman-dev] [Git][pacman/pacman][master] 7 commits: libmakepkg: reproducibilty for python packages

Allan McRae (@allan) allan at archlinux.org
Sat Sep 4 00:38:49 UTC 2021



Allan McRae pushed to branch master at Pacman / Pacman


Commits:
1c5a5688 by Allan McRae at 2021-08-08T22:49:32+10:00
libmakepkg: reproducibilty for python packages

Arch Linux has been setting PYTHONHASHSEED=0 to create deterministic
.pyc files.  After a thorough review by the Arch Security Team, setting
this variable was determined not to generated vulnerable .pyc files, as
when the loader loads the .pyc file and unmarshalls it, the internal
runtime will just populate the unordered data structures and use a new
runtime hash for them.

Signed-off-by: Allan McRae <allan at archlinux.org>

- - - - -
c0026caa by morganamilo at 2021-09-04T10:33:51+10:00
libalpm: Give -U downloads a random .part name if needed

archweb's download links all ended in /download. This cause all the temp
files to be named download.part. With parallel downloads this results in
multiple downloads to go to the same temp file and breaks the transaction.

Assign random temporary filenames to downloads from URLs that are either
missing a filename, or if the filename does not contain at least three
hyphens (as a well formed package filename does).

While this approach to determining when to use a temporary filename is
not 100% foolproof, it does keep nice looking download progress bar names
when a proper package filename is given. The only downside of not using
temporary files when provided with a filename  with three or more hyphens
is URLs created specifically to bypass temporary filename usage can not
be downloaded in parallel. We probably do not want to download packages
from such URLs anyway.

Fixes FS#71464

Modified-by: Allan McRae (do not use temporary files for realish URLs)
Signed-off-by: Allan McRae <allan at archlinux.org>

- - - - -
2ec6de96 by morganamilo at 2021-09-04T10:34:00+10:00
only use effective url for urls containing .db or .pkg

Github and other sites redirect their downloads to a cdn. So the
download http://foo.org/myrepo.db may redirect to something like
https://cdn.foo.org/83749327439.

This then causes pacman to try and download the sig as
https://cdn.foo.org/83749327439.sig which is incorrect. In this case
pacman should append .sig to the original url.

However urls like https://archlinux.org/packages/community/x86_64/0ad/download/
Redirect to the mirror, so .sig has to appended after the redirects and
not before.

So we decide if we should append .sig on the original or effective url
based on if the effective url (minus the query part) has .db or .pkg in it.

Fixes FS#71148

---

v2: move variable decleration to start of block
v3: use dbext instead of db

- - - - -
f951282b by morganamilo at 2021-09-04T10:34:00+10:00
pactest: add tests for downloading packages from a cdn

Test for downloads that redirect to some sort of cdn where the
redirected url does not relate to the original filename.

Signed-off-by: Allan McRae <allan at archlinux.org>

- - - - -
efb714b3 by Charlie Sale at 2021-09-04T10:34:00+10:00
Order downloads by descending max_size

When downloading in parallel, sort by package size so that the larger
packages are queued first to fully leverage parallelism.
Addresses FS#70172

Signed-off-by: Charlie Sale <softwaresale01 at gmail.com>
Signed-off-by: Allan McRae <allan at archlinux.org>

- - - - -
cf923e73 by Hugo Osvaldo Barrera at 2021-09-04T10:34:00+10:00
Update broken links pointing to git.archlinux.org

All of these links are broken since the recent move to
gitlab.archlinux.org.

A few projects are, apparently, only available on GitHub, so I've linked
to that source (hopefully that's only temporary).

For git-clone URLs, I've opted for the https URLs since those can be
used by anyone -- whereas the ssh URLs require the user to be registered
on the gitlab instance which is not open to the public yet.

Signed-off-by: Hugo Osvaldo Barrera <hugo at barrera.io>
Signed-off-by: Allan McRae <allan at archlinux.org>

- - - - -
5da4af2b by Hugo Osvaldo Barrera at 2021-09-04T10:34:00+10:00
Delete the "Other Utilities" section

Signed-off-by: Hugo Osvaldo Barrera <hugo at barrera.io>
Signed-off-by: Allan McRae <allan at archlinux.org>

- - - - -


10 changed files:

- doc/index.asciidoc
- doc/submitting-patches.asciidoc
- doc/translation-help.asciidoc
- lib/libalpm/dload.c
- lib/libalpm/dload.h
- scripts/libmakepkg/meson.build
- + scripts/libmakepkg/reproducible.sh.in
- + scripts/libmakepkg/reproducible/meson.build
- + scripts/libmakepkg/reproducible/python.sh.in
- test/pacman/tests/upgrade-download-pkg-and-sig-with-filename.py


Changes:

=====================================
doc/index.asciidoc
=====================================
@@ -59,11 +59,11 @@ configuration files dealing with pacman.
 Changelog
 ~~~~~~~~~
 For a good idea of what is going on in pacman development, take a look at the
-link:https://git.archlinux.org/pacman.git/[Git summary page] for the
+link:https://gitlab.archlinux.org/pacman/pacman[Git summary page] for the
 project.
 
 See the most recent
-link:https://git.archlinux.org/pacman.git/tree/NEWS[NEWS]
+link:https://gitlab.archlinux.org/pacman/pacman/-/blob/master/NEWS[NEWS]
 file for a not-as-frequently-updated list of changes. However, this should
 contain the biggest changes in a format more concise than the commit log.
 
@@ -220,12 +220,11 @@ these trees).
 
 The current development tree can be fetched with the following command:
 
-	git clone git://git.archlinux.org/pacman.git pacman
+	git clone https://gitlab.archlinux.org/pacman/pacman.git
 
 which will fetch the full development history into a directory named pacman.
 You can browse the source as well using
-link:https://git.archlinux.org/pacman.git/[cgit]. HTTP/HTTPS URLs are also
-available for cloning purposes; these URLs are listed at the above page.
+link:https://gitlab.archlinux.org/pacman/pacman/[gitlab].
 
 If you are interested in hacking on pacman, it is highly recommended you join
 the mailing list mentioned above, as well as take a quick glance at our
@@ -237,20 +236,6 @@ you speak a foreign language, you can help by either creating or updating a
 translation file for your native language. Instructions can be found in
 link:translation-help.html[translation-help].
 
-Other Utilities
-~~~~~~~~~~~~~~~
-Although the package manager itself is quite simple, many scripts have been
-developed that help automate building and installing packages. These are used
-extensively in link:https://archlinux.org/[Arch Linux]. Most of these utilities
-are available in the Arch Linux projects
-link:https://git.archlinux.org/[code browser].
-
-Utilities available:
-
-* link:https://git.archlinux.org/dbscripts.git/[dbscripts] - scripts used by Arch Linux to manage the main package repositories
-* link:https://git.archlinux.org/devtools.git/[devtools] - tools to assist in packaging and dependency checking
-* link:https://git.archlinux.org/namcap.git/[namcap] - a package analysis utility written in python
-
 Bugs
 ----
 If you find bugs (which is quite likely), please email them to the pacman-dev


=====================================
doc/submitting-patches.asciidoc
=====================================
@@ -20,7 +20,7 @@ started with GIT if you have not worked with it before.
 
 The pacman code can be fetched using the following command:
 
-	git clone git://git.archlinux.org/pacman.git
+	git clone https://gitlab.archlinux.org/pacman/pacman.git
 
 
 Creating your patch


=====================================
doc/translation-help.asciidoc
=====================================
@@ -78,7 +78,7 @@ Incremental Updates
 If you have more advanced needs you will have to get a copy of the pacman
 repository.
 
-	git clone git://git.archlinux.org/pacman.git pacman
+	git clone https://gitlab.archlinux.org/pacman/pacman.git
 
 Next, you will need to run `./autogen.sh` and `./configure` in the base
 directory to generate the correct Makefiles. At this point, all necessary


=====================================
lib/libalpm/dload.c
=====================================
@@ -613,12 +613,33 @@ static int curl_check_finished_download(CURLM *curlm, CURLMsg *msg,
 	/* Let's check if client requested downloading accompanion *.sig file */
 	if(!payload->signature && payload->download_signature && curlerr == CURLE_OK && payload->respcode < 400) {
 		struct dload_payload *sig = NULL;
-
+		char *url = payload->fileurl;
+		char *_effective_filename;
+		const char *effective_filename;
+		char *query;
+		const char *dbext = alpm_option_get_dbext(handle);
 		const char* realname = payload->destfile_name ? payload->destfile_name : payload->tempfile_name;
-		int len = strlen(effective_url) + 5;
+		int len;
+
+		STRDUP(_effective_filename, effective_url, GOTO_ERR(handle, ALPM_ERR_MEMORY, cleanup));
+		effective_filename = get_filename(_effective_filename);
+		query = strrchr(effective_filename, '?');
+
+		if(query) {
+			query[0] = '\0';
+		}
+
+		/* Only use the effective url for sig downloads if the effective_url contains .dbext or .pkg */
+		if(strstr(effective_filename, dbext) || strstr(effective_filename, ".pkg")) {
+			url = effective_url;
+		}
+
+		free(_effective_filename);
+
+		len = strlen(url) + 5;
 		CALLOC(sig, 1, sizeof(*sig), GOTO_ERR(handle, ALPM_ERR_MEMORY, cleanup));
 		MALLOC(sig->fileurl, len, FREE(sig); GOTO_ERR(handle, ALPM_ERR_MEMORY, cleanup));
-		snprintf(sig->fileurl, len, "%s.sig", effective_url);
+		snprintf(sig->fileurl, len, "%s.sig", url);
 
 		if(payload->trust_remote_name) {
 			/* In this case server might provide a new name for the main payload.
@@ -767,7 +788,7 @@ static int curl_add_payload(alpm_handle_t *handle, CURLM *curlm,
 		GOTO_ERR(handle, ALPM_ERR_SERVER_BAD_URL, cleanup);
 	}
 
-	if(payload->remote_name && strlen(payload->remote_name) > 0) {
+	if(!payload->random_partfile && payload->remote_name && strlen(payload->remote_name) > 0) {
 		if(!payload->destfile_name) {
 			payload->destfile_name = get_fullpath(localpath, payload->remote_name, "");
 		}
@@ -776,8 +797,9 @@ static int curl_add_payload(alpm_handle_t *handle, CURLM *curlm,
 			goto cleanup;
 		}
 	} else {
-		/* URL doesn't contain a filename, so make a tempfile. We can't support
-		 * resuming this kind of download; partial transfers will be destroyed */
+		/* We want a random filename or the URL does not contain a filename, so download to a
+		 * temporary location. We can not support resuming this kind of download; any partial
+		 * transfers will be destroyed */
 		payload->unlink_on_fail = 1;
 
 		payload->localf = create_tempfile(payload, localpath);
@@ -825,6 +847,19 @@ cleanup:
 	return ret;
 }
 
+/*
+ * Use to sort payloads by max size in decending order (largest -> smallest)
+ */
+static int compare_dload_payload_sizes(const void *left_ptr, const void *right_ptr)
+{
+	struct dload_payload *left, *right;
+
+	left = (struct dload_payload *) left_ptr;
+	right = (struct dload_payload *) right_ptr;
+
+	return right->max_size - left->max_size;
+}
+
 /* Returns -1 if an error happened for a required file
  * Returns 0 if a payload was actually downloaded
  * Returns 1 if no files were downloaded and all errors were non-fatal
@@ -838,6 +873,10 @@ static int curl_download_internal(alpm_handle_t *handle,
 	int max_streams = handle->parallel_downloads;
 	int updated = 0; /* was a file actually updated */
 	CURLM *curlm = handle->curlm;
+	size_t payloads_size = alpm_list_count(payloads);
+
+	/* Sort payloads by package size */
+	payloads = alpm_list_msort(payloads, payloads_size, &compare_dload_payload_sizes);
 
 	while(active_downloads_num > 0 || payloads) {
 		CURLMcode mc;
@@ -986,11 +1025,20 @@ int SYMEXPORT alpm_fetch_pkgurl(alpm_handle_t *handle, const alpm_list_t *urls,
 			alpm_list_append(fetched, filepath);
 		} else {
 			struct dload_payload *payload = NULL;
+			char *c;
 
 			ASSERT(url, GOTO_ERR(handle, ALPM_ERR_WRONG_ARGS, err));
 			CALLOC(payload, 1, sizeof(*payload), GOTO_ERR(handle, ALPM_ERR_MEMORY, err));
 			STRDUP(payload->fileurl, url, FREE(payload); GOTO_ERR(handle, ALPM_ERR_MEMORY, err));
-			payload->allow_resume = 1;
+
+			c = strrchr(url, '/');
+			if(strstr(c, ".pkg")) {
+				/* we probably have a usable package filename to download to */
+				payload->allow_resume = 1;
+			} else {
+				payload->random_partfile = 1;
+			}
+
 			payload->handle = handle;
 			payload->trust_remote_name = 1;
 			payload->download_signature = (handle->siglevel & ALPM_SIG_PACKAGE);


=====================================
lib/libalpm/dload.h
=====================================
@@ -44,6 +44,7 @@ struct dload_payload {
 	off_t prevprogress;
 	int force;
 	int allow_resume;
+	int random_partfile;
 	int errors_ok;
 	int unlink_on_fail;
 	int trust_remote_name;


=====================================
scripts/libmakepkg/meson.build
=====================================
@@ -5,6 +5,7 @@ libmakepkg_modules = [
   { 'name' : 'lint_config',   'has_subdir' : true },
   { 'name' : 'lint_package',  'has_subdir' : true },
   { 'name' : 'lint_pkgbuild', 'has_subdir' : true },
+  { 'name' : 'reproducible',  'has_subdir' : true },
   { 'name' : 'source',        'has_subdir' : true },
   { 'name' : 'srcinfo',                           },
   { 'name' : 'tidy',          'has_subdir' : true },


=====================================
scripts/libmakepkg/reproducible.sh.in
=====================================
@@ -0,0 +1,29 @@
+#!/bin/bash
+#
+#   reproducible.sh - utilities for improving package reproducibility
+#
+#   Copyright (c) 2021 Pacman Development Team <pacman-dev at archlinux.org>
+#
+#   This program is free software; you can redistribute it and/or modify
+#   it under the terms of the GNU General Public License as published by
+#   the Free Software Foundation; either version 2 of the License, or
+#   (at your option) any later version.
+#
+#   This program is distributed in the hope that it will be useful,
+#   but WITHOUT ANY WARRANTY; without even the implied warranty of
+#   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+#   GNU General Public License for more details.
+#
+#   You should have received a copy of the GNU General Public License
+#   along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+[[ -n "$LIBMAKEPKG_REPRODUCIBLE_SH" ]] && return
+LIBMAKEPKG_REPRODUCIBLE_SH=1
+
+LIBRARY=${LIBRARY:-'@libmakepkgdir@'}
+
+
+for lib in "$LIBRARY/reproducible/"*.sh; do
+	source "$lib"
+done


=====================================
scripts/libmakepkg/reproducible/meson.build
=====================================
@@ -0,0 +1,17 @@
+libmakepkg_module = 'reproducible'
+
+sources = [
+  'python.sh.in',
+]
+
+foreach src : sources
+  output_dir = join_paths(get_option('datadir'), 'makepkg', libmakepkg_module)
+
+  custom_target(
+    libmakepkg_module + '_' + src.underscorify(),
+    command : [ SCRIPT_EDITOR, '@INPUT@', '@OUTPUT@' ],
+    input : src,
+    output : '@BASENAME@',
+    install : true,
+    install_dir : output_dir)
+endforeach


=====================================
scripts/libmakepkg/reproducible/python.sh.in
=====================================
@@ -0,0 +1,29 @@
+#!/bin/bash
+#
+#   python.sh - creating reproducible python packages
+#
+#   Copyright (c) 2021 Pacman Development Team <pacman-dev at archlinux.org>
+#
+#   This program is free software; you can redistribute it and/or modify
+#   it under the terms of the GNU General Public License as published by
+#   the Free Software Foundation; either version 2 of the License, or
+#   (at your option) any later version.
+#
+#   This program is distributed in the hope that it will be useful,
+#   but WITHOUT ANY WARRANTY; without even the implied warranty of
+#   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+#   GNU General Public License for more details.
+#
+#   You should have received a copy of the GNU General Public License
+#   along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+[[ -n "$LIBMAKEPKG_REPRODUCIBLE_PYTHON_SH" ]] && return
+LIBMAKEPKG_REPRODUCIBLE_PYTHON_SH=1
+
+
+LIBRARY=${LIBRARY:-'@libmakepkgdir@'}
+
+
+# disable hash randomization when creating .pyc files
+export PYTHONHASHSEED=0


=====================================
test/pacman/tests/upgrade-download-pkg-and-sig-with-filename.py
=====================================
@@ -22,6 +22,12 @@
     '/redir-dest.pkg': 'redir-dest',
     '/redir-dest.pkg.sig': 'redir-dest.sig',
 
+    # redirect cdn
+    '/redir-cdn.pkg': { 'code': 303, 'headers': { 'Location': '/cdn-1' } },
+    '/redir-cdn.pkg.sig': { 'code': 303, 'headers': { 'Location': '/cdn-2' } },
+    '/cdn-1': 'redir-dest',
+    '/cdn-2': 'redir-dest.sig',
+
     # content-disposition and redirect
     '/cd-redir.pkg': { 'code': 303, 'headers': { 'Location': '/cd-redir-dest.pkg' } },
     '/cd-redir-dest.pkg': {
@@ -30,6 +36,18 @@
     },
     '/cd-redir-dest.pkg.sig': 'cd-redir-dest.sig',
 
+    # content-disposition and redirect to cdn
+    '/cd-redir-cdn.pkg': { 'code': 303, 'headers': { 'Location': '/cdn-3' } },
+    '/cd-redir-cdn.pkg.sig': { 'code': 303, 'headers': { 'Location': '/cdn-4' } },
+    '/cdn-3': {
+        'headers': { 'Content-Disposition': 'attachment; filename="cdn-alt.pkg"' },
+        'body': 'cdn-alt'
+    },
+    '/cdn-4': {
+        'headers': { 'Content-Disposition': 'attachment; filename="cdn-alt.pkg.sig"' },
+        'body': 'cdn-alt.sig'
+    },
+
     # TODO: absolutely terrible hack to prevent pacman from attempting to
     # validate packages, which causes failure under --valgrind thanks to
     # a memory leak in gpgme that is too general for inclusion in valgrind.supp
@@ -38,7 +56,7 @@
     '': 'fallback',
 })
 
-self.args = '-Uw {url}/simple.pkg {url}/cd.pkg {url}/redir.pkg {url}/cd-redir.pkg {url}/404'.format(url=url)
+self.args = '-Uw {url}/simple.pkg {url}/cd.pkg {url}/redir.pkg {url}/redir-cdn.pkg {url}/cd-redir.pkg {url}/cd-redir-cdn.pkg {url}/404'.format(url=url)
 
 # packages/sigs are not valid, error is expected
 self.addrule('!PACMAN_RETCODE=0')
@@ -59,3 +77,6 @@
 self.addrule('!CACHE_FEXISTS=cd-redir-dest.pkg')
 self.addrule('CACHE_FCONTENTS=cd-redir-dest-alt.pkg|cd-redir-dest')
 self.addrule('CACHE_FCONTENTS=cd-redir-dest-alt.pkg.sig|cd-redir-dest.sig')
+
+self.addrule('CACHE_FCONTENTS=cdn-alt.pkg|cdn-alt')
+self.addrule('CACHE_FCONTENTS=cdn-alt.pkg.sig|cdn-alt.sig')



View it on GitLab: https://gitlab.archlinux.org/pacman/pacman/-/compare/fc7986485cad9c93df7e59979a7e950fffdc4271...5da4af2b5dcb6f214a93fd5cabf228df08d006f5

-- 
View it on GitLab: https://gitlab.archlinux.org/pacman/pacman/-/compare/fc7986485cad9c93df7e59979a7e950fffdc4271...5da4af2b5dcb6f214a93fd5cabf228df08d006f5
You're receiving this email because of your account on gitlab.archlinux.org.




More information about the pacman-dev mailing list