[pacman-dev] [PATCH 1/2] only use effective url for urls containing .db or .pkg
Github and other sites redirect their downloads to a cdn. So the download http://foo.org/myrepo.db may redirect to something like https://cdn.foo.org/83749327439. This then causes pacman to try and download the sig as https://cdn.foo.org/83749327439.sig which is incorrect. In this case pacman should append .sig to the original url. However urls like https://archlinux.org/packages/community/x86_64/0ad/download/ Redirect to the mirror, so .sig has to appended after the redirects and not before. So we decide if we should append .sig on the original or effective url based on if the effective url has .db or .pkg in it. Fixes FS#71148 --- lib/libalpm/dload.c | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/lib/libalpm/dload.c b/lib/libalpm/dload.c index 2c14841f..72e9cfcd 100644 --- a/lib/libalpm/dload.c +++ b/lib/libalpm/dload.c @@ -613,11 +613,28 @@ static int curl_check_finished_download(CURLM *curlm, CURLMsg *msg, /* Let's check if client requested downloading accompanion *.sig file */ if(!payload->signature && payload->download_signature && curlerr == CURLE_OK && payload->respcode < 400) { struct dload_payload *sig = NULL; + char *url = payload->fileurl; + char *_effective_filename; - int len = strlen(effective_url) + 5; + STRDUP(_effective_filename, effective_url, GOTO_ERR(handle, ALPM_ERR_MEMORY, cleanup)); + const char *effective_filename = get_filename(_effective_filename); + char *query = strrchr(effective_filename, '?'); + + if(query) { + query[0] = '\0'; + } + + /* Only use the effective url for sig downloads if the effective_url contains .db or .pkg */ + if(strstr(effective_filename, ".db") || strstr(effective_filename, ".pkg")) { + url = effective_url; + } + + free(_effective_filename); + + int len = strlen(url) + 5; CALLOC(sig, 1, sizeof(*sig), GOTO_ERR(handle, ALPM_ERR_MEMORY, cleanup)); MALLOC(sig->fileurl, len, FREE(sig); GOTO_ERR(handle, ALPM_ERR_MEMORY, cleanup)); - snprintf(sig->fileurl, len, "%s.sig", effective_url); + snprintf(sig->fileurl, len, "%s.sig", url); if(payload->trust_remote_name) { /* In this case server might provide a new name for the main payload. -- 2.32.0
Test for downloads that redirect to some sort of cdn where the redirected url does not relate to the original filename. --- ...rade-download-pkg-and-sig-with-filename.py | 23 ++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-) diff --git a/test/pacman/tests/upgrade-download-pkg-and-sig-with-filename.py b/test/pacman/tests/upgrade-download-pkg-and-sig-with-filename.py index ac5072c9..002a1d2d 100644 --- a/test/pacman/tests/upgrade-download-pkg-and-sig-with-filename.py +++ b/test/pacman/tests/upgrade-download-pkg-and-sig-with-filename.py @@ -22,6 +22,12 @@ '/redir-dest.pkg': 'redir-dest', '/redir-dest.pkg.sig': 'redir-dest.sig', + # redirect cdn + '/redir-cdn.pkg': { 'code': 303, 'headers': { 'Location': '/cdn-1' } }, + '/redir-cdn.pkg.sig': { 'code': 303, 'headers': { 'Location': '/cdn-2' } }, + '/cdn-1': 'redir-dest', + '/cdn-2': 'redir-dest.sig', + # content-disposition and redirect '/cd-redir.pkg': { 'code': 303, 'headers': { 'Location': '/cd-redir-dest.pkg' } }, '/cd-redir-dest.pkg': { @@ -30,6 +36,18 @@ }, '/cd-redir-dest.pkg.sig': 'cd-redir-dest.sig', + # content-disposition and redirect to cdn + '/cd-redir-cdn.pkg': { 'code': 303, 'headers': { 'Location': '/cdn-3' } }, + '/cd-redir-cdn.pkg.sig': { 'code': 303, 'headers': { 'Location': '/cdn-4' } }, + '/cdn-3': { + 'headers': { 'Content-Disposition': 'attachment; filename="cdn-alt.pkg"' }, + 'body': 'cdn-alt' + }, + '/cdn-4': { + 'headers': { 'Content-Disposition': 'attachment; filename="cdn-alt.pkg.sig"' }, + 'body': 'cdn-alt.sig' + }, + # TODO: absolutely terrible hack to prevent pacman from attempting to # validate packages, which causes failure under --valgrind thanks to # a memory leak in gpgme that is too general for inclusion in valgrind.supp @@ -38,7 +56,7 @@ '': 'fallback', }) -self.args = '-Uw {url}/simple.pkg {url}/cd.pkg {url}/redir.pkg {url}/cd-redir.pkg {url}/404'.format(url=url) +self.args = '-Uw {url}/simple.pkg {url}/cd.pkg {url}/redir.pkg {url}/redir-cdn.pkg {url}/cd-redir.pkg {url}/cd-redir-cdn.pkg {url}/404'.format(url=url) # packages/sigs are not valid, error is expected self.addrule('!PACMAN_RETCODE=0') @@ -59,3 +77,6 @@ self.addrule('!CACHE_FEXISTS=cd-redir-dest.pkg') self.addrule('CACHE_FCONTENTS=cd-redir-dest-alt.pkg|cd-redir-dest') self.addrule('CACHE_FCONTENTS=cd-redir-dest-alt.pkg.sig|cd-redir-dest.sig') + +self.addrule('CACHE_FCONTENTS=cdn-alt.pkg|cdn-alt') +self.addrule('CACHE_FCONTENTS=cdn-alt.pkg.sig|cdn-alt.sig') -- 2.32.0
On 14/6/21 10:15 pm, morganamilo wrote:
Test for downloads that redirect to some sort of cdn where the redirected url does not relate to the original filename. --- ...rade-download-pkg-and-sig-with-filename.py | 23 ++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-)
Thanks - this now passes after your effective URL patch. Applied. Allan
On 6/14/21 8:15 AM, morganamilo wrote:
Github and other sites redirect their downloads to a cdn. So the download http://foo.org/myrepo.db may redirect to something like https://cdn.foo.org/83749327439.
This then causes pacman to try and download the sig as https://cdn.foo.org/83749327439.sig which is incorrect. In this case pacman should append .sig to the original url.
However urls like https://archlinux.org/packages/community/x86_64/0ad/download/ Redirect to the mirror, so .sig has to appended after the redirects and not before.
So we decide if we should append .sig on the original or effective url based on if the effective url has .db or .pkg in it.
Fixes FS#71148 --- lib/libalpm/dload.c | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-)
diff --git a/lib/libalpm/dload.c b/lib/libalpm/dload.c index 2c14841f..72e9cfcd 100644 --- a/lib/libalpm/dload.c +++ b/lib/libalpm/dload.c @@ -613,11 +613,28 @@ static int curl_check_finished_download(CURLM *curlm, CURLMsg *msg, /* Let's check if client requested downloading accompanion *.sig file */ if(!payload->signature && payload->download_signature && curlerr == CURLE_OK && payload->respcode < 400) { struct dload_payload *sig = NULL; + char *url = payload->fileurl; + char *_effective_filename;
- int len = strlen(effective_url) + 5; + STRDUP(_effective_filename, effective_url, GOTO_ERR(handle, ALPM_ERR_MEMORY, cleanup)); + const char *effective_filename = get_filename(_effective_filename); + char *query = strrchr(effective_filename, '?'); + + if(query) { + query[0] = '\0'; + } + + /* Only use the effective url for sig downloads if the effective_url contains .db or .pkg */ + if(strstr(effective_filename, ".db") || strstr(effective_filename, ".pkg")) {
For a .db we explicitly need it to be the last component, so we might as well check that .db isn't a redirect to .db.tar.gz, which per IRC discussion lazka's server appears to do. The .pkg case is different since we have .pkg.tar plus optional and fully arbitrary compression extensions emanating from libarchive, so I could understand taking the simple route and not bothering there.
+ url = effective_url; + } + + free(_effective_filename); + + int len = strlen(url) + 5; CALLOC(sig, 1, sizeof(*sig), GOTO_ERR(handle, ALPM_ERR_MEMORY, cleanup)); MALLOC(sig->fileurl, len, FREE(sig); GOTO_ERR(handle, ALPM_ERR_MEMORY, cleanup)); - snprintf(sig->fileurl, len, "%s.sig", effective_url); + snprintf(sig->fileurl, len, "%s.sig", url);
if(payload->trust_remote_name) { /* In this case server might provide a new name for the main payload.
-- Eli Schwartz Bug Wrangler and Trusted User
On 16/06/2021 20:39, Eli Schwartz wrote:
For a .db we explicitly need it to be the last component, so we might as well check that .db isn't a redirect to .db.tar.gz, which per IRC discussion lazka's server appears to do.
No that case is fine. This patch is for when the redirect is unrelated to the filename. So like: /foo.db -> /34783248324 /34783248324.sig would then fail. But in the case of /foo.db -> /foo.db.tar.gz /foo.db.tar.gz.sig is still valid and succeeds.
On 06/14/21 at 01:15pm, morganamilo wrote:
Github and other sites redirect their downloads to a cdn. So the download http://foo.org/myrepo.db may redirect to something like https://cdn.foo.org/83749327439.
This then causes pacman to try and download the sig as https://cdn.foo.org/83749327439.sig which is incorrect. In this case pacman should append .sig to the original url.
However urls like https://archlinux.org/packages/community/x86_64/0ad/download/ Redirect to the mirror, so .sig has to appended after the redirects and not before.
So we decide if we should append .sig on the original or effective url based on if the effective url has .db or .pkg in it.
Fixes FS#71148
At some point we need to stop chasing what sites are doing and lay out specific requirements for mirrors because any heuristic we use is going to break somebody's setup. This patch might work for github's current url scheme, but there's nothing stopping them or others from putting .db somewhere in the middle of the url. I'm really tempted to just always append .sig to the original url and say that it's up to people using redirection to make sure that's valid and all possible destinations are properly synced.
--- lib/libalpm/dload.c | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-)
diff --git a/lib/libalpm/dload.c b/lib/libalpm/dload.c index 2c14841f..72e9cfcd 100644 --- a/lib/libalpm/dload.c +++ b/lib/libalpm/dload.c @@ -613,11 +613,28 @@ static int curl_check_finished_download(CURLM *curlm, CURLMsg *msg, /* Let's check if client requested downloading accompanion *.sig file */ if(!payload->signature && payload->download_signature && curlerr == CURLE_OK && payload->respcode < 400) { struct dload_payload *sig = NULL; + char *url = payload->fileurl; + char *_effective_filename;
Style rules say all variable declarations go here.
- int len = strlen(effective_url) + 5; + STRDUP(_effective_filename, effective_url, GOTO_ERR(handle, ALPM_ERR_MEMORY, cleanup)); + const char *effective_filename = get_filename(_effective_filename); + char *query = strrchr(effective_filename, '?'); + + if(query) { + query[0] = '\0'; + } + + /* Only use the effective url for sig downloads if the effective_url contains .db or .pkg */ + if(strstr(effective_filename, ".db") || strstr(effective_filename, ".pkg")) {
Both .db and .pkg are merely conventions.
+ url = effective_url; + } + + free(_effective_filename); + + int len = strlen(url) + 5; CALLOC(sig, 1, sizeof(*sig), GOTO_ERR(handle, ALPM_ERR_MEMORY, cleanup)); MALLOC(sig->fileurl, len, FREE(sig); GOTO_ERR(handle, ALPM_ERR_MEMORY, cleanup)); - snprintf(sig->fileurl, len, "%s.sig", effective_url); + snprintf(sig->fileurl, len, "%s.sig", url);
if(payload->trust_remote_name) { /* In this case server might provide a new name for the main payload. -- 2.32.0
On 20/6/21 11:24 am, Andrew Gregory wrote:
On 06/14/21 at 01:15pm, morganamilo wrote:
Github and other sites redirect their downloads to a cdn. So the download http://foo.org/myrepo.db may redirect to something like https://cdn.foo.org/83749327439.
This then causes pacman to try and download the sig as https://cdn.foo.org/83749327439.sig which is incorrect. In this case pacman should append .sig to the original url.
However urls like https://archlinux.org/packages/community/x86_64/0ad/download/ Redirect to the mirror, so .sig has to appended after the redirects and not before.
So we decide if we should append .sig on the original or effective url based on if the effective url has .db or .pkg in it.
Fixes FS#71148
At some point we need to stop chasing what sites are doing and lay out specific requirements for mirrors because any heuristic we use is going to break somebody's setup. This patch might work for github's current url scheme, but there's nothing stopping them or others from putting .db somewhere in the middle of the url. I'm really tempted to just always append .sig to the original url and say that it's up to people using redirection to make sure that's valid and all possible destinations are properly synced.
I agree. However, still want to accept this patch for 6.0.1 as it restores 5.2.x behaviour. We can decide limitations on what we support in future releases and make an annoucement later. Allan
On 14/6/21 10:15 pm, morganamilo wrote:
Github and other sites redirect their downloads to a cdn. So the download http://foo.org/myrepo.db may redirect to something like https://cdn.foo.org/83749327439.
This then causes pacman to try and download the sig as https://cdn.foo.org/83749327439.sig which is incorrect. In this case pacman should append .sig to the original url.
However urls like https://archlinux.org/packages/community/x86_64/0ad/download/ Redirect to the mirror, so .sig has to appended after the redirects and not before.
So we decide if we should append .sig on the original or effective url based on if the effective url has .db or .pkg in it.
Added this to the commit message: In addition, strip everything beyond "?" when considering a redirected URL.
Fixes FS#71148 --- lib/libalpm/dload.c | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-)
diff --git a/lib/libalpm/dload.c b/lib/libalpm/dload.c index 2c14841f..72e9cfcd 100644 --- a/lib/libalpm/dload.c +++ b/lib/libalpm/dload.c @@ -613,11 +613,28 @@ static int curl_check_finished_download(CURLM *curlm, CURLMsg *msg, /* Let's check if client requested downloading accompanion *.sig file */ if(!payload->signature && payload->download_signature && curlerr == CURLE_OK && payload->respcode < 400) { struct dload_payload *sig = NULL; + char *url = payload->fileurl; + char *_effective_filename;
- int len = strlen(effective_url) + 5; + STRDUP(_effective_filename, effective_url, GOTO_ERR(handle, ALPM_ERR_MEMORY, cleanup)); + const char *effective_filename = get_filename(_effective_filename); + char *query = strrchr(effective_filename, '?');
Lots of variables being declared beyond the start of the block.
+ + if(query) { + query[0] = '\0'; + } + + /* Only use the effective url for sig downloads if the effective_url contains .db or .pkg */ + if(strstr(effective_filename, ".db") || strstr(effective_filename, ".pkg")) { + url = effective_url; + } + + free(_effective_filename); + + int len = strlen(url) + 5; CALLOC(sig, 1, sizeof(*sig), GOTO_ERR(handle, ALPM_ERR_MEMORY, cleanup)); MALLOC(sig->fileurl, len, FREE(sig); GOTO_ERR(handle, ALPM_ERR_MEMORY, cleanup)); - snprintf(sig->fileurl, len, "%s.sig", effective_url); + snprintf(sig->fileurl, len, "%s.sig", url);
if(payload->trust_remote_name) { /* In this case server might provide a new name for the main payload.
Github and other sites redirect their downloads to a cdn. So the download http://foo.org/myrepo.db may redirect to something like https://cdn.foo.org/83749327439. This then causes pacman to try and download the sig as https://cdn.foo.org/83749327439.sig which is incorrect. In this case pacman should append .sig to the original url. However urls like https://archlinux.org/packages/community/x86_64/0ad/download/ Redirect to the mirror, so .sig has to appended after the redirects and not before. So we decide if we should append .sig on the original or effective url based on if the effective url (minus the query part) has .db or .pkg in it. Fixes FS#71148 --- v2: move variable decleration to start of block --- lib/libalpm/dload.c | 24 ++++++++++++++++++++++-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/lib/libalpm/dload.c b/lib/libalpm/dload.c index 2c14841f..88351780 100644 --- a/lib/libalpm/dload.c +++ b/lib/libalpm/dload.c @@ -613,11 +613,31 @@ static int curl_check_finished_download(CURLM *curlm, CURLMsg *msg, /* Let's check if client requested downloading accompanion *.sig file */ if(!payload->signature && payload->download_signature && curlerr == CURLE_OK && payload->respcode < 400) { struct dload_payload *sig = NULL; + char *url = payload->fileurl; + char *_effective_filename; + const char *effective_filename; + char *query; + int len; + + STRDUP(_effective_filename, effective_url, GOTO_ERR(handle, ALPM_ERR_MEMORY, cleanup)); + effective_filename = get_filename(_effective_filename); + query = strrchr(effective_filename, '?'); + + if(query) { + query[0] = '\0'; + } + + /* Only use the effective url for sig downloads if the effective_url contains .db or .pkg */ + if(strstr(effective_filename, ".db") || strstr(effective_filename, ".pkg")) { + url = effective_url; + } + + free(_effective_filename); - int len = strlen(effective_url) + 5; + len = strlen(url) + 5; CALLOC(sig, 1, sizeof(*sig), GOTO_ERR(handle, ALPM_ERR_MEMORY, cleanup)); MALLOC(sig->fileurl, len, FREE(sig); GOTO_ERR(handle, ALPM_ERR_MEMORY, cleanup)); - snprintf(sig->fileurl, len, "%s.sig", effective_url); + snprintf(sig->fileurl, len, "%s.sig", url); if(payload->trust_remote_name) { /* In this case server might provide a new name for the main payload. -- 2.32.0
Github and other sites redirect their downloads to a cdn. So the download http://foo.org/myrepo.db may redirect to something like https://cdn.foo.org/83749327439. This then causes pacman to try and download the sig as https://cdn.foo.org/83749327439.sig which is incorrect. In this case pacman should append .sig to the original url. However urls like https://archlinux.org/packages/community/x86_64/0ad/download/ Redirect to the mirror, so .sig has to appended after the redirects and not before. So we decide if we should append .sig on the original or effective url based on if the effective url (minus the query part) has .db or .pkg in it. Fixes FS#71148 --- v2: move variable decleration to start of block v3: use dbext instead of .db --- lib/libalpm/dload.c | 25 +++++++++++++++++++++++-- 1 file changed, 23 insertions(+), 2 deletions(-) diff --git a/lib/libalpm/dload.c b/lib/libalpm/dload.c index 2c14841f..2545a425 100644 --- a/lib/libalpm/dload.c +++ b/lib/libalpm/dload.c @@ -613,11 +613,32 @@ static int curl_check_finished_download(CURLM *curlm, CURLMsg *msg, /* Let's check if client requested downloading accompanion *.sig file */ if(!payload->signature && payload->download_signature && curlerr == CURLE_OK && payload->respcode < 400) { struct dload_payload *sig = NULL; + char *url = payload->fileurl; + char *_effective_filename; + const char *effective_filename; + char *query; + const char *dbext = alpm_option_get_dbext(handle); + int len; + + STRDUP(_effective_filename, effective_url, GOTO_ERR(handle, ALPM_ERR_MEMORY, cleanup)); + effective_filename = get_filename(_effective_filename); + query = strrchr(effective_filename, '?'); + + if(query) { + query[0] = '\0'; + } + + /* Only use the effective url for sig downloads if the effective_url contains .dbext or .pkg */ + if(strstr(effective_filename, dbext) || strstr(effective_filename, ".pkg")) { + url = effective_url; + } + + free(_effective_filename); - int len = strlen(effective_url) + 5; + len = strlen(url) + 5; CALLOC(sig, 1, sizeof(*sig), GOTO_ERR(handle, ALPM_ERR_MEMORY, cleanup)); MALLOC(sig->fileurl, len, FREE(sig); GOTO_ERR(handle, ALPM_ERR_MEMORY, cleanup)); - snprintf(sig->fileurl, len, "%s.sig", effective_url); + snprintf(sig->fileurl, len, "%s.sig", url); if(payload->trust_remote_name) { /* In this case server might provide a new name for the main payload. -- 2.32.0
On 24/6/21 10:13 pm, morganamilo wrote:
Github and other sites redirect their downloads to a cdn. So the download http://foo.org/myrepo.db may redirect to something like https://cdn.foo.org/83749327439.
This then causes pacman to try and download the sig as https://cdn.foo.org/83749327439.sig which is incorrect. In this case pacman should append .sig to the original url.
However urls like https://archlinux.org/packages/community/x86_64/0ad/download/ Redirect to the mirror, so .sig has to appended after the redirects and not before.
So we decide if we should append .sig on the original or effective url based on if the effective url (minus the query part) has .db or .pkg in it.
Fixes FS#71148
Pulled. Thanks, Allan
participants (5)
-
Allan McRae
-
Andrew Gregory
-
Eli Schwartz
-
Morgan Adamiec
-
morganamilo