[pacman-dev] [Git][pacman/pacman][master] 7 commits: libmakepkg: reproducibilty for python packages
Allan McRae pushed to branch master at Pacman / Pacman Commits: 1c5a5688 by Allan McRae at 2021-08-08T22:49:32+10:00 libmakepkg: reproducibilty for python packages Arch Linux has been setting PYTHONHASHSEED=0 to create deterministic .pyc files. After a thorough review by the Arch Security Team, setting this variable was determined not to generated vulnerable .pyc files, as when the loader loads the .pyc file and unmarshalls it, the internal runtime will just populate the unordered data structures and use a new runtime hash for them. Signed-off-by: Allan McRae <allan@archlinux.org> - - - - - c0026caa by morganamilo at 2021-09-04T10:33:51+10:00 libalpm: Give -U downloads a random .part name if needed archweb's download links all ended in /download. This cause all the temp files to be named download.part. With parallel downloads this results in multiple downloads to go to the same temp file and breaks the transaction. Assign random temporary filenames to downloads from URLs that are either missing a filename, or if the filename does not contain at least three hyphens (as a well formed package filename does). While this approach to determining when to use a temporary filename is not 100% foolproof, it does keep nice looking download progress bar names when a proper package filename is given. The only downside of not using temporary files when provided with a filename with three or more hyphens is URLs created specifically to bypass temporary filename usage can not be downloaded in parallel. We probably do not want to download packages from such URLs anyway. Fixes FS#71464 Modified-by: Allan McRae (do not use temporary files for realish URLs) Signed-off-by: Allan McRae <allan@archlinux.org> - - - - - 2ec6de96 by morganamilo at 2021-09-04T10:34:00+10:00 only use effective url for urls containing .db or .pkg Github and other sites redirect their downloads to a cdn. So the download http://foo.org/myrepo.db may redirect to something like https://cdn.foo.org/83749327439. This then causes pacman to try and download the sig as https://cdn.foo.org/83749327439.sig which is incorrect. In this case pacman should append .sig to the original url. However urls like https://archlinux.org/packages/community/x86_64/0ad/download/ Redirect to the mirror, so .sig has to appended after the redirects and not before. So we decide if we should append .sig on the original or effective url based on if the effective url (minus the query part) has .db or .pkg in it. Fixes FS#71148 --- v2: move variable decleration to start of block v3: use dbext instead of db - - - - - f951282b by morganamilo at 2021-09-04T10:34:00+10:00 pactest: add tests for downloading packages from a cdn Test for downloads that redirect to some sort of cdn where the redirected url does not relate to the original filename. Signed-off-by: Allan McRae <allan@archlinux.org> - - - - - efb714b3 by Charlie Sale at 2021-09-04T10:34:00+10:00 Order downloads by descending max_size When downloading in parallel, sort by package size so that the larger packages are queued first to fully leverage parallelism. Addresses FS#70172 Signed-off-by: Charlie Sale <softwaresale01@gmail.com> Signed-off-by: Allan McRae <allan@archlinux.org> - - - - - cf923e73 by Hugo Osvaldo Barrera at 2021-09-04T10:34:00+10:00 Update broken links pointing to git.archlinux.org All of these links are broken since the recent move to gitlab.archlinux.org. A few projects are, apparently, only available on GitHub, so I've linked to that source (hopefully that's only temporary). For git-clone URLs, I've opted for the https URLs since those can be used by anyone -- whereas the ssh URLs require the user to be registered on the gitlab instance which is not open to the public yet. Signed-off-by: Hugo Osvaldo Barrera <hugo@barrera.io> Signed-off-by: Allan McRae <allan@archlinux.org> - - - - - 5da4af2b by Hugo Osvaldo Barrera at 2021-09-04T10:34:00+10:00 Delete the "Other Utilities" section Signed-off-by: Hugo Osvaldo Barrera <hugo@barrera.io> Signed-off-by: Allan McRae <allan@archlinux.org> - - - - - 10 changed files: - doc/index.asciidoc - doc/submitting-patches.asciidoc - doc/translation-help.asciidoc - lib/libalpm/dload.c - lib/libalpm/dload.h - scripts/libmakepkg/meson.build - + scripts/libmakepkg/reproducible.sh.in - + scripts/libmakepkg/reproducible/meson.build - + scripts/libmakepkg/reproducible/python.sh.in - test/pacman/tests/upgrade-download-pkg-and-sig-with-filename.py Changes: ===================================== doc/index.asciidoc ===================================== @@ -59,11 +59,11 @@ configuration files dealing with pacman. Changelog ~~~~~~~~~ For a good idea of what is going on in pacman development, take a look at the -link:https://git.archlinux.org/pacman.git/[Git summary page] for the +link:https://gitlab.archlinux.org/pacman/pacman[Git summary page] for the project. See the most recent -link:https://git.archlinux.org/pacman.git/tree/NEWS[NEWS] +link:https://gitlab.archlinux.org/pacman/pacman/-/blob/master/NEWS[NEWS] file for a not-as-frequently-updated list of changes. However, this should contain the biggest changes in a format more concise than the commit log. @@ -220,12 +220,11 @@ these trees). The current development tree can be fetched with the following command: - git clone git://git.archlinux.org/pacman.git pacman + git clone https://gitlab.archlinux.org/pacman/pacman.git which will fetch the full development history into a directory named pacman. You can browse the source as well using -link:https://git.archlinux.org/pacman.git/[cgit]. HTTP/HTTPS URLs are also -available for cloning purposes; these URLs are listed at the above page. +link:https://gitlab.archlinux.org/pacman/pacman/[gitlab]. If you are interested in hacking on pacman, it is highly recommended you join the mailing list mentioned above, as well as take a quick glance at our @@ -237,20 +236,6 @@ you speak a foreign language, you can help by either creating or updating a translation file for your native language. Instructions can be found in link:translation-help.html[translation-help]. -Other Utilities -~~~~~~~~~~~~~~~ -Although the package manager itself is quite simple, many scripts have been -developed that help automate building and installing packages. These are used -extensively in link:https://archlinux.org/[Arch Linux]. Most of these utilities -are available in the Arch Linux projects -link:https://git.archlinux.org/[code browser]. - -Utilities available: - -* link:https://git.archlinux.org/dbscripts.git/[dbscripts] - scripts used by Arch Linux to manage the main package repositories -* link:https://git.archlinux.org/devtools.git/[devtools] - tools to assist in packaging and dependency checking -* link:https://git.archlinux.org/namcap.git/[namcap] - a package analysis utility written in python - Bugs ---- If you find bugs (which is quite likely), please email them to the pacman-dev ===================================== doc/submitting-patches.asciidoc ===================================== @@ -20,7 +20,7 @@ started with GIT if you have not worked with it before. The pacman code can be fetched using the following command: - git clone git://git.archlinux.org/pacman.git + git clone https://gitlab.archlinux.org/pacman/pacman.git Creating your patch ===================================== doc/translation-help.asciidoc ===================================== @@ -78,7 +78,7 @@ Incremental Updates If you have more advanced needs you will have to get a copy of the pacman repository. - git clone git://git.archlinux.org/pacman.git pacman + git clone https://gitlab.archlinux.org/pacman/pacman.git Next, you will need to run `./autogen.sh` and `./configure` in the base directory to generate the correct Makefiles. At this point, all necessary ===================================== lib/libalpm/dload.c ===================================== @@ -613,12 +613,33 @@ static int curl_check_finished_download(CURLM *curlm, CURLMsg *msg, /* Let's check if client requested downloading accompanion *.sig file */ if(!payload->signature && payload->download_signature && curlerr == CURLE_OK && payload->respcode < 400) { struct dload_payload *sig = NULL; - + char *url = payload->fileurl; + char *_effective_filename; + const char *effective_filename; + char *query; + const char *dbext = alpm_option_get_dbext(handle); const char* realname = payload->destfile_name ? payload->destfile_name : payload->tempfile_name; - int len = strlen(effective_url) + 5; + int len; + + STRDUP(_effective_filename, effective_url, GOTO_ERR(handle, ALPM_ERR_MEMORY, cleanup)); + effective_filename = get_filename(_effective_filename); + query = strrchr(effective_filename, '?'); + + if(query) { + query[0] = '\0'; + } + + /* Only use the effective url for sig downloads if the effective_url contains .dbext or .pkg */ + if(strstr(effective_filename, dbext) || strstr(effective_filename, ".pkg")) { + url = effective_url; + } + + free(_effective_filename); + + len = strlen(url) + 5; CALLOC(sig, 1, sizeof(*sig), GOTO_ERR(handle, ALPM_ERR_MEMORY, cleanup)); MALLOC(sig->fileurl, len, FREE(sig); GOTO_ERR(handle, ALPM_ERR_MEMORY, cleanup)); - snprintf(sig->fileurl, len, "%s.sig", effective_url); + snprintf(sig->fileurl, len, "%s.sig", url); if(payload->trust_remote_name) { /* In this case server might provide a new name for the main payload. @@ -767,7 +788,7 @@ static int curl_add_payload(alpm_handle_t *handle, CURLM *curlm, GOTO_ERR(handle, ALPM_ERR_SERVER_BAD_URL, cleanup); } - if(payload->remote_name && strlen(payload->remote_name) > 0) { + if(!payload->random_partfile && payload->remote_name && strlen(payload->remote_name) > 0) { if(!payload->destfile_name) { payload->destfile_name = get_fullpath(localpath, payload->remote_name, ""); } @@ -776,8 +797,9 @@ static int curl_add_payload(alpm_handle_t *handle, CURLM *curlm, goto cleanup; } } else { - /* URL doesn't contain a filename, so make a tempfile. We can't support - * resuming this kind of download; partial transfers will be destroyed */ + /* We want a random filename or the URL does not contain a filename, so download to a + * temporary location. We can not support resuming this kind of download; any partial + * transfers will be destroyed */ payload->unlink_on_fail = 1; payload->localf = create_tempfile(payload, localpath); @@ -825,6 +847,19 @@ cleanup: return ret; } +/* + * Use to sort payloads by max size in decending order (largest -> smallest) + */ +static int compare_dload_payload_sizes(const void *left_ptr, const void *right_ptr) +{ + struct dload_payload *left, *right; + + left = (struct dload_payload *) left_ptr; + right = (struct dload_payload *) right_ptr; + + return right->max_size - left->max_size; +} + /* Returns -1 if an error happened for a required file * Returns 0 if a payload was actually downloaded * Returns 1 if no files were downloaded and all errors were non-fatal @@ -838,6 +873,10 @@ static int curl_download_internal(alpm_handle_t *handle, int max_streams = handle->parallel_downloads; int updated = 0; /* was a file actually updated */ CURLM *curlm = handle->curlm; + size_t payloads_size = alpm_list_count(payloads); + + /* Sort payloads by package size */ + payloads = alpm_list_msort(payloads, payloads_size, &compare_dload_payload_sizes); while(active_downloads_num > 0 || payloads) { CURLMcode mc; @@ -986,11 +1025,20 @@ int SYMEXPORT alpm_fetch_pkgurl(alpm_handle_t *handle, const alpm_list_t *urls, alpm_list_append(fetched, filepath); } else { struct dload_payload *payload = NULL; + char *c; ASSERT(url, GOTO_ERR(handle, ALPM_ERR_WRONG_ARGS, err)); CALLOC(payload, 1, sizeof(*payload), GOTO_ERR(handle, ALPM_ERR_MEMORY, err)); STRDUP(payload->fileurl, url, FREE(payload); GOTO_ERR(handle, ALPM_ERR_MEMORY, err)); - payload->allow_resume = 1; + + c = strrchr(url, '/'); + if(strstr(c, ".pkg")) { + /* we probably have a usable package filename to download to */ + payload->allow_resume = 1; + } else { + payload->random_partfile = 1; + } + payload->handle = handle; payload->trust_remote_name = 1; payload->download_signature = (handle->siglevel & ALPM_SIG_PACKAGE); ===================================== lib/libalpm/dload.h ===================================== @@ -44,6 +44,7 @@ struct dload_payload { off_t prevprogress; int force; int allow_resume; + int random_partfile; int errors_ok; int unlink_on_fail; int trust_remote_name; ===================================== scripts/libmakepkg/meson.build ===================================== @@ -5,6 +5,7 @@ libmakepkg_modules = [ { 'name' : 'lint_config', 'has_subdir' : true }, { 'name' : 'lint_package', 'has_subdir' : true }, { 'name' : 'lint_pkgbuild', 'has_subdir' : true }, + { 'name' : 'reproducible', 'has_subdir' : true }, { 'name' : 'source', 'has_subdir' : true }, { 'name' : 'srcinfo', }, { 'name' : 'tidy', 'has_subdir' : true }, ===================================== scripts/libmakepkg/reproducible.sh.in ===================================== @@ -0,0 +1,29 @@ +#!/bin/bash +# +# reproducible.sh - utilities for improving package reproducibility +# +# Copyright (c) 2021 Pacman Development Team <pacman-dev@archlinux.org> +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. +# + +[[ -n "$LIBMAKEPKG_REPRODUCIBLE_SH" ]] && return +LIBMAKEPKG_REPRODUCIBLE_SH=1 + +LIBRARY=${LIBRARY:-'@libmakepkgdir@'} + + +for lib in "$LIBRARY/reproducible/"*.sh; do + source "$lib" +done ===================================== scripts/libmakepkg/reproducible/meson.build ===================================== @@ -0,0 +1,17 @@ +libmakepkg_module = 'reproducible' + +sources = [ + 'python.sh.in', +] + +foreach src : sources + output_dir = join_paths(get_option('datadir'), 'makepkg', libmakepkg_module) + + custom_target( + libmakepkg_module + '_' + src.underscorify(), + command : [ SCRIPT_EDITOR, '@INPUT@', '@OUTPUT@' ], + input : src, + output : '@BASENAME@', + install : true, + install_dir : output_dir) +endforeach ===================================== scripts/libmakepkg/reproducible/python.sh.in ===================================== @@ -0,0 +1,29 @@ +#!/bin/bash +# +# python.sh - creating reproducible python packages +# +# Copyright (c) 2021 Pacman Development Team <pacman-dev@archlinux.org> +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. +# + +[[ -n "$LIBMAKEPKG_REPRODUCIBLE_PYTHON_SH" ]] && return +LIBMAKEPKG_REPRODUCIBLE_PYTHON_SH=1 + + +LIBRARY=${LIBRARY:-'@libmakepkgdir@'} + + +# disable hash randomization when creating .pyc files +export PYTHONHASHSEED=0 ===================================== test/pacman/tests/upgrade-download-pkg-and-sig-with-filename.py ===================================== @@ -22,6 +22,12 @@ '/redir-dest.pkg': 'redir-dest', '/redir-dest.pkg.sig': 'redir-dest.sig', + # redirect cdn + '/redir-cdn.pkg': { 'code': 303, 'headers': { 'Location': '/cdn-1' } }, + '/redir-cdn.pkg.sig': { 'code': 303, 'headers': { 'Location': '/cdn-2' } }, + '/cdn-1': 'redir-dest', + '/cdn-2': 'redir-dest.sig', + # content-disposition and redirect '/cd-redir.pkg': { 'code': 303, 'headers': { 'Location': '/cd-redir-dest.pkg' } }, '/cd-redir-dest.pkg': { @@ -30,6 +36,18 @@ }, '/cd-redir-dest.pkg.sig': 'cd-redir-dest.sig', + # content-disposition and redirect to cdn + '/cd-redir-cdn.pkg': { 'code': 303, 'headers': { 'Location': '/cdn-3' } }, + '/cd-redir-cdn.pkg.sig': { 'code': 303, 'headers': { 'Location': '/cdn-4' } }, + '/cdn-3': { + 'headers': { 'Content-Disposition': 'attachment; filename="cdn-alt.pkg"' }, + 'body': 'cdn-alt' + }, + '/cdn-4': { + 'headers': { 'Content-Disposition': 'attachment; filename="cdn-alt.pkg.sig"' }, + 'body': 'cdn-alt.sig' + }, + # TODO: absolutely terrible hack to prevent pacman from attempting to # validate packages, which causes failure under --valgrind thanks to # a memory leak in gpgme that is too general for inclusion in valgrind.supp @@ -38,7 +56,7 @@ '': 'fallback', }) -self.args = '-Uw {url}/simple.pkg {url}/cd.pkg {url}/redir.pkg {url}/cd-redir.pkg {url}/404'.format(url=url) +self.args = '-Uw {url}/simple.pkg {url}/cd.pkg {url}/redir.pkg {url}/redir-cdn.pkg {url}/cd-redir.pkg {url}/cd-redir-cdn.pkg {url}/404'.format(url=url) # packages/sigs are not valid, error is expected self.addrule('!PACMAN_RETCODE=0') @@ -59,3 +77,6 @@ self.addrule('!CACHE_FEXISTS=cd-redir-dest.pkg') self.addrule('CACHE_FCONTENTS=cd-redir-dest-alt.pkg|cd-redir-dest') self.addrule('CACHE_FCONTENTS=cd-redir-dest-alt.pkg.sig|cd-redir-dest.sig') + +self.addrule('CACHE_FCONTENTS=cdn-alt.pkg|cdn-alt') +self.addrule('CACHE_FCONTENTS=cdn-alt.pkg.sig|cdn-alt.sig') View it on GitLab: https://gitlab.archlinux.org/pacman/pacman/-/compare/fc7986485cad9c93df7e599... -- View it on GitLab: https://gitlab.archlinux.org/pacman/pacman/-/compare/fc7986485cad9c93df7e599... You're receiving this email because of your account on gitlab.archlinux.org.
participants (1)
-
Allan McRae (@allan)