[PATCH 1/1] support http header 'Cache-Control: no-cache' for soft failure
From: Christian Hesse <mail@eworm.de> By setting the HTTP header 'Cache-Control: no-cache' when returning with the status code 404 (not found) the server can indicate that this is a soft failure. No error message is shown, and server's error count is not increased. This can be used by servers that are not expected to be complete, for example when serving a local cache [0]. In nginx this can be achived by adding a single directive in location block: add_header Cache-Control "no-cache"; Also this is a perfect match for pacredir [1]. [0] https://wiki.archlinux.org/title/Pacman/Tips_and_tricks#Network_shared_pacma... [1] https://git.eworm.de/cgit/pacredir/about/ Signed-off-by: Christian Hesse <mail@eworm.de> --- lib/libalpm/dload.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/lib/libalpm/dload.c b/lib/libalpm/dload.c index 4fa17b35..23034584 100644 --- a/lib/libalpm/dload.c +++ b/lib/libalpm/dload.c @@ -274,8 +274,10 @@ static size_t dload_parseheader_cb(void *ptr, size_t size, size_t nmemb, void *u { size_t realsize = size * nmemb; const char *fptr, *endptr = NULL; + const char * const cc_header = "Cache-Control:"; const char * const cd_header = "Content-Disposition:"; const char * const fn_key = "filename="; + const char * const nc_key = "no-cache"; struct dload_payload *payload = (struct dload_payload *)user; long respcode; @@ -302,6 +304,15 @@ static size_t dload_parseheader_cb(void *ptr, size_t size, size_t nmemb, void *u } } + /* By setting the HTTP header 'Cache-Control: no-cache' the server can indicate + that this is a soft failure which should not be cached. No error message is + shown, and server's error count is not increased. */ + if(_alpm_raw_ncmp(cc_header, ptr, strlen(cc_header)) == 0) { + if(strstr(ptr, nc_key)) { + payload->errors_ok = 1; + } + } + curl_easy_getinfo(payload->curl, CURLINFO_RESPONSE_CODE, &respcode); if(payload->respcode != respcode) { payload->respcode = respcode; -- 2.37.3
Christian Hesse <list@eworm.de> on Mon, 2022/09/19 14:52:
By setting the HTTP header 'Cache-Control: no-cache' when returning with the status code 404 (not found) the server can indicate that this is a soft failure. No error message is shown, and server's error count is not increased.
This can be used by servers that are not expected to be complete, for example when serving a local cache [0]. In nginx this can be achived by adding a single directive in location block:
add_header Cache-Control "no-cache";
Yes, I know. We had a very similar patch in May 2021 [2]. That one received a negative review, possibly because back then I wanted to introduce a custom and non-standard http header 'X-Pacman-Expected-Failure'. In January 2022 I send a patch to implement CacheServer [3], that one was neither reviewed nor merged. At least not yet, and it is a lot more complex. As this is still a problem in general (and not just a "kind of esoteric setup", see the number of caching solutions listed on pacman's wiki page) [0] I decided to make another try. In contrast to the original patch I decided to go with a standard http header. If the server sends 'Cache-Control: no-cache' in its response header pacman will honor that as a soft failure and will not increase server's error limit. That said... I am open to other suggestions for a header that matches even better. Could not find one myself. I am running this code myself for month now (with a custom built pacman package) without issue. It does exactly what I need, and will help others as well.
Also this is a perfect match for pacredir [1].
... which will receive a corresponding commit as soon as this is merged. BTW, did you know that pacredir is in our official repositories since 2017? It gives an advice to install pacman (the custom build I mentioned above) from an unofficial repository since version 6.0 entered the repositories. This could finally be dropped.
[0] https://wiki.archlinux.org/title/Pacman/Tips_and_tricks#Network_shared_pacma... [1] https://git.eworm.de/cgit/pacredir/about/
[2] https://lists.archlinux.org/archives/list/pacman-dev@lists.archlinux.org/mes... [3] https://lists.archlinux.org/archives/list/pacman-dev@lists.archlinux.org/mes... -- main(a){char*c=/* Schoene Gruesse */"B?IJj;MEH" "CX:;",b;for(a/* Best regards my address: */=0;b=c[a++];) putchar(b-1/(/* Chris cc -ox -xc - && ./x */b/42*2-3)*42);}
On 19/9/22 22:52, Christian Hesse wrote:
From: Christian Hesse <mail@eworm.de>
By setting the HTTP header 'Cache-Control: no-cache' when returning with the status code 404 (not found) the server can indicate that this is a soft failure. No error message is shown, and server's error count is not increased.
This can be used by servers that are not expected to be complete, for example when serving a local cache [0]. In nginx this can be achived by adding a single directive in location block:
add_header Cache-Control "no-cache";
I am not accepting this patch. 1) "no-cache" to indicate this a cache of packages... 2) using a directive that has a completely different meaning. Allan
Allan McRae <allan@archlinux.org> on Mon, 2022/09/26 20:44:
On 19/9/22 22:52, Christian Hesse wrote:
From: Christian Hesse <mail@eworm.de>
By setting the HTTP header 'Cache-Control: no-cache' when returning with the status code 404 (not found) the server can indicate that this is a soft failure. No error message is shown, and server's error count is not increased.
This can be used by servers that are not expected to be complete, for example when serving a local cache [0]. In nginx this can be achived by adding a single directive in location block:
add_header Cache-Control "no-cache";
I am not accepting this patch. 1) "no-cache" to indicate this a cache of packages...
You missed the point. This does not tell not to cache packages. It does tell not to cache an error. ;) Fun fact: Searching for the keywords yields a lot of matches for some well known cloud providers and their proxy software. Looks like the same is used there not to cache errors.
2) using a directive that has a completely different meaning.
So still unanswered... Could there be a directive that we can agree on? Anyway - still expected. :-p So hoping for the CacheServer thing now. -- main(a){char*c=/* Schoene Gruesse */"B?IJj;MEH" "CX:;",b;for(a/* Best regards my address: */=0;b=c[a++];) putchar(b-1/(/* Chris cc -ox -xc - && ./x */b/42*2-3)*42);}
On 26/9/22 21:54, Christian Hesse wrote:
Allan McRae <allan@archlinux.org> on Mon, 2022/09/26 20:44:
On 19/9/22 22:52, Christian Hesse wrote:
From: Christian Hesse <mail@eworm.de>
By setting the HTTP header 'Cache-Control: no-cache' when returning with the status code 404 (not found) the server can indicate that this is a soft failure. No error message is shown, and server's error count is not increased.
This can be used by servers that are not expected to be complete, for example when serving a local cache [0]. In nginx this can be achived by adding a single directive in location block:
add_header Cache-Control "no-cache";
I am not accepting this patch. 1) "no-cache" to indicate this a cache of packages...
You missed the point. This does not tell not to cache packages. It does tell not to cache an error. ;)
Fun fact: Searching for the keywords yields a lot of matches for some well known cloud providers and their proxy software. Looks like the same is used there not to cache errors.
This is the sort of information that would make considering this as an option more likely. Can you expand on this with some examples? Allan
Allan McRae <allan@archlinux.org> on Mon, 2022/09/26 22:35:
On 26/9/22 21:54, Christian Hesse wrote:
Allan McRae <allan@archlinux.org> on Mon, 2022/09/26 20:44:
On 19/9/22 22:52, Christian Hesse wrote:
From: Christian Hesse <mail@eworm.de>
By setting the HTTP header 'Cache-Control: no-cache' when returning with the status code 404 (not found) the server can indicate that this is a soft failure. No error message is shown, and server's error count is not increased.
This can be used by servers that are not expected to be complete, for example when serving a local cache [0]. In nginx this can be achived by adding a single directive in location block:
add_header Cache-Control "no-cache";
I am not accepting this patch. 1) "no-cache" to indicate this a cache of packages...
You missed the point. This does not tell not to cache packages. It does tell not to cache an error. ;)
Fun fact: Searching for the keywords yields a lot of matches for some well known cloud providers and their proxy software. Looks like the same is used there not to cache errors.
This is the sort of information that would make considering this as an option more likely. Can you expand on this with some examples?
Oh, did not expect that to decide the thing. Here is a selection of links on the topic... https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/HTTPStatu... https://cloud.google.com/cdn/docs/using-negative-caching https://community.cloudflare.com/t/how-to-avoid-caching-404/196262 https://community.cloudflare.com/t/cache-everything-and-dont-cache-404-at-al... For Cloudflare I found the forum discussions only, not sure if the relevant parts have been removed from documentation... There must have been more. :) Default caching on 404 happens for something between several seconds and minutes by default. This can be changed to different time (CacheControl: age=...) or disabled (CacheControl: no-cache). -- main(a){char*c=/* Schoene Gruesse */"B?IJj;MEH" "CX:;",b;for(a/* Best regards my address: */=0;b=c[a++];) putchar(b-1/(/* Chris cc -ox -xc - && ./x */b/42*2-3)*42);}
On 9/19/22 14:52, Christian Hesse wrote:
In nginx this can be achived by adding a single directive in location block:
add_header Cache-Control "no-cache";
If it's added directly to the location block this would disable caching for all responses.
kpcyrd <kpcyrd@rxv.cc> on Mon, 2022/09/26 17:47:
On 9/19/22 14:52, Christian Hesse wrote:
In nginx this can be achived by adding a single directive in location block:
add_header Cache-Control "no-cache";
If it's added directly to the location block this would disable caching for all responses.
Yes, true... Though pacman ignores this. :) And I found another issue: By default nginx does not add headers when return code is 404, but it can be forced with extra parameter "always" [0]. So we should show something like this in the commit message for a complete server block: server { server name _default; listen 8080; root /var/cache/pacman/pkg; error_page 404 = @no-cache; location @no-cache { add_header CacheControl "no-cache" always; } } Allan, if you decide to accept the patch I would like to re-send with an updated commit message. [0] https://nginx.org/en/docs/http/ngx_http_headers_module.html#add_header -- main(a){char*c=/* Schoene Gruesse */"B?IJj;MEH" "CX:;",b;for(a/* Best regards my address: */=0;b=c[a++];) putchar(b-1/(/* Chris cc -ox -xc - && ./x */b/42*2-3)*42);}
From: Christian Hesse <mail@eworm.de> The idea has been borrowed from cloud providers' reverse proxies and load balancers. These accept http headers to control caching behaviour, even to control 'negative caching' when error codes are returned. By setting the http header 'Cache-Control: no-cache' when returning with the status code 404 (not found) the server can indicate that this is a soft failure. No error message is shown, and server's error count is not increased. This can be used by servers that are not expected to be complete, for example when serving a local cache [0]. In nginx this can be achived by acting on error_page and sending the extra header by adding a directive: server { listen 8080 default_server; root /var/cache/pacman/pkg; error_page 404 = @no-cache; location @no-cache { add_header Cache-Control "no-cache" always; } } Also this is a perfect match for pacredir [1]. [0] https://wiki.archlinux.org/title/Pacman/Tips_and_tricks#Network_shared_pacma... [1] https://git.eworm.de/cgit/pacredir/about/ Signed-off-by: Christian Hesse <mail@eworm.de> --- lib/libalpm/dload.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/lib/libalpm/dload.c b/lib/libalpm/dload.c index 4fa17b35..74462bba 100644 --- a/lib/libalpm/dload.c +++ b/lib/libalpm/dload.c @@ -274,8 +274,10 @@ static size_t dload_parseheader_cb(void *ptr, size_t size, size_t nmemb, void *u { size_t realsize = size * nmemb; const char *fptr, *endptr = NULL; + const char * const cc_header = "Cache-Control:"; const char * const cd_header = "Content-Disposition:"; const char * const fn_key = "filename="; + const char * const nc_key = "no-cache"; struct dload_payload *payload = (struct dload_payload *)user; long respcode; @@ -302,6 +304,15 @@ static size_t dload_parseheader_cb(void *ptr, size_t size, size_t nmemb, void *u } } + /* By setting the http header 'Cache-Control: no-cache' the server can indicate + that this is a soft failure which should not be cached. No error message is + shown, and server's error count is not increased. */ + if(_alpm_raw_ncmp(cc_header, ptr, strlen(cc_header)) == 0) { + if(strstr(ptr, nc_key)) { + payload->errors_ok = 1; + } + } + curl_easy_getinfo(payload->curl, CURLINFO_RESPONSE_CODE, &respcode); if(payload->respcode != respcode) { payload->respcode = respcode;
On 09/26/22 at 04:24pm, Christian Hesse wrote:
Allan McRae <allan@archlinux.org> on Mon, 2022/09/26 22:35:
On 26/9/22 21:54, Christian Hesse wrote:
Allan McRae <allan@archlinux.org> on Mon, 2022/09/26 20:44:
On 19/9/22 22:52, Christian Hesse wrote:
From: Christian Hesse <mail@eworm.de>
By setting the HTTP header 'Cache-Control: no-cache' when returning with the status code 404 (not found) the server can indicate that this is a soft failure. No error message is shown, and server's error count is not increased.
This can be used by servers that are not expected to be complete, for example when serving a local cache [0]. In nginx this can be achived by adding a single directive in location block:
add_header Cache-Control "no-cache";
I am not accepting this patch. 1) "no-cache" to indicate this a cache of packages...
You missed the point. This does not tell not to cache packages. It does tell not to cache an error. ;)
Fun fact: Searching for the keywords yields a lot of matches for some well known cloud providers and their proxy software. Looks like the same is used there not to cache errors.
This is the sort of information that would make considering this as an option more likely. Can you expand on this with some examples?
Oh, did not expect that to decide the thing.
Here is a selection of links on the topic...
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/HTTPStatu... https://cloud.google.com/cdn/docs/using-negative-caching https://community.cloudflare.com/t/how-to-avoid-caching-404/196262 https://community.cloudflare.com/t/cache-everything-and-dont-cache-404-at-al...
For Cloudflare I found the forum discussions only, not sure if the relevant parts have been removed from documentation... There must have been more. :)
Default caching on 404 happens for something between several seconds and minutes by default. This can be changed to different time (CacheControl: age=...) or disabled (CacheControl: no-cache).
None of these links looks to me like what you want pacman to do with the no-cache header. The caching is still all per-object. Getting a 404 with no-cache set means to retry the request for that same object, not to make inferences about whether requests for other objects will 404. pacman isn't caching anything in the sense that Cache-Control was meant to affect, so I don't see how the Cache-Control header has any relevance here. apg
Andrew Gregory <andrew.gregory.8@gmail.com> on Mon, 2022/10/10 19:55:
None of these links looks to me like what you want pacman to do with the no-cache header. The caching is still all per-object. Getting a 404 with no-cache set means to retry the request for that same object, not to make inferences about whether requests for other objects will 404.
pacman isn't caching anything in the sense that Cache-Control was meant to affect, so I don't see how the Cache-Control header has any relevance here.
I have to agree with what you say. But I can argue the other way round: A 404 response is for a specific object, not the hole server. The server error limit excludes servers for some missed objects, and does not try others - though it did not check the server for that same object. Is that any better? I can understand to skip a server for hard errors. For example I do not want to run into the same timeout again and again, or resolve a non-existent host again and again. For soft errors (where the server exists and answers) this should be different. Well, perhaps the current behavior is even fine for default behavior and the majority of people, but there should by a way to opt out. In an older thread someone denied to add an option to disable the server error limit. How about a configuration options that disables the server error limit for soft errors only (and keep the behavior for hard errors as is)? Something like `NoServerSoftErrorLimit` (and possibly `--noserversofterrorlimit`) that makes cache servers work? Soft errors are not resource hungry in any way. This es even more true with parallel downloads where no time is wasted for the request when a download is still running. -- main(a){char*c=/* Schoene Gruesse */"B?IJj;MEH" "CX:;",b;for(a/* Best regards my address: */=0;b=c[a++];) putchar(b-1/(/* Chris cc -ox -xc - && ./x */b/42*2-3)*42);}
On 10/25/22 at 04:55pm, Christian Hesse wrote:
Andrew Gregory <andrew.gregory.8@gmail.com> on Mon, 2022/10/10 19:55:
None of these links looks to me like what you want pacman to do with the no-cache header. The caching is still all per-object. Getting a 404 with no-cache set means to retry the request for that same object, not to make inferences about whether requests for other objects will 404.
pacman isn't caching anything in the sense that Cache-Control was meant to affect, so I don't see how the Cache-Control header has any relevance here.
I have to agree with what you say.
But I can argue the other way round: A 404 response is for a specific object, not the hole server. The server error limit excludes servers for some missed objects, and does not try others - though it did not check the server for that same object. Is that any better?
I can understand to skip a server for hard errors. For example I do not want to run into the same timeout again and again, or resolve a non-existent host again and again.
For soft errors (where the server exists and answers) this should be different. Well, perhaps the current behavior is even fine for default behavior and the majority of people, but there should by a way to opt out.
In an older thread someone denied to add an option to disable the server error limit. How about a configuration options that disables the server error limit for soft errors only (and keep the behavior for hard errors as is)? Something like `NoServerSoftErrorLimit` (and possibly `--noserversofterrorlimit`) that makes cache servers work?
Soft errors are not resource hungry in any way. This es even more true with parallel downloads where no time is wasted for the request when a download is still running.
A 404 may not necessarily mean that the server is missing other files as well, but it does indicate that the server is not fully synced, which is generally the expectation. There are countless varieties of unusual setups and it has been our policy of late to not support the many options required by them in the internal downloader. Users with specialized needs can use the external downloader, which lacks any special error handling, or a proxy server that conforms to pacman's expectations. As I understand pacredir, users should still be able to use it fine with the external downloader right now, or pacredir could be adjusted to redirect to an appropriate full mirror instead of returning 404. The only real advantage I see to changing pacman is to allow more intelligent behavior for different types of download (e.g. you should probably not pull the databases from the cache). The simpler ignore-404s or cache-headers options do not provide that ability. apg
participants (4)
-
Allan McRae
-
Andrew Gregory
-
Christian Hesse
-
kpcyrd