[pacman-dev] mirrors.kernel.org serves chunked transfer downloads
Hi, I'm a developer with Arch Linux. We recently saw a report[1] of our package management tool, pacman, behaving inconsistently with regard to reporting download progress. We narrowed this down to usage of the Arch Linux package mirror on mirrors.kernel.org. The symptoms appear to be caused by an inconsistenly served Content-Length header -- that is, sometimes it's simply missing. Further analysis of the response headers shows that nginx is using chunked transfer encoding, so it does not provide a Content-Length header. However, once cached, Varnish serves up the Content-Length. Some shell to show this effect: # a valid URL with some garbage querystring to fool varnish into # serving a cache miss. url=http://mirrors.kernel.org/archlinux/core/os/x86_64/acl-2.2.52-2-x86_64.pkg.t... RANDOM*SECONDS )) # This will be the uncached reply served by nginx -- notice no # content-length is present . curl -I "$url" # Repeat the same url, it'll be served by varnish, and has a # content-length. curl -I "$url" Would it be possible to turn off chunked transfer so that nginx serves a Content-Length header? This is highly preferrable -- the overhead in calculating the response size is that of a simple stat syscall. In addition, knowing the response body size up front potentially allows downloaders to match the remote file size against local metadata, as a method of detecting corrupted or tampered-with files. Also, I offhandedly highlight that your cache varies on querystring. Do you really need to do this for static content? This actually works against you in a the case of a DoS attack -- a malicious user could potentially evict a large amount of the cache by flooding it with variations on a single large blob. If mirrors.kernel.org shares a cache with other sites, it might be a Bad Thing™. Actually, if the Varnish instance used for mirrors.kernel.org is shared with other subdomains, you might consider disabling it entirely for files below mirrors.kernel.org. Relying on the kernel's page cache alone seems like a better strategy. Thanks for your consideration. Cheers, dR [1] https://bbs.archlinux.org/viewtopic.php?id=192604
participants (1)
-
Dave Reisner