[pacman-dev] mirrors.kernel.org serves chunked transfer downloads

Dave Reisner d at falconindy.com
Mon Jan 26 04:53:19 UTC 2015


Hi,

I'm a developer with Arch Linux. We recently saw a report[1] of our
package management tool, pacman, behaving inconsistently with regard to
reporting download progress. We narrowed this down to usage of the Arch
Linux package mirror on mirrors.kernel.org.

The symptoms appear to be caused by an inconsistenly served
Content-Length header -- that is, sometimes it's simply missing. Further
analysis of the response headers shows that nginx is using chunked
transfer encoding, so it does not provide a Content-Length header.
However, once cached, Varnish serves up the Content-Length. Some shell
to show this effect:

  # a valid URL with some garbage querystring to fool varnish into
  # serving a cache miss.
  url=http://mirrors.kernel.org/archlinux/core/os/x86_64/acl-2.2.52-2-x86_64.pkg.tar.xz$(( RANDOM*SECONDS ))

  # This will be the uncached reply served by nginx -- notice no
  # content-length is present .
  curl -I "$url"

  # Repeat the same url, it'll be served by varnish, and has a
  # content-length.
  curl -I "$url"

Would it be possible to turn off chunked transfer so that nginx serves a
Content-Length header? This is highly preferrable -- the overhead in
calculating the response size is that of a simple stat syscall. In
addition, knowing the response body size up front potentially allows
downloaders to match the remote file size against local metadata, as a
method of detecting corrupted or tampered-with files.

Also, I offhandedly highlight that your cache varies on querystring. Do
you really need to do this for static content? This actually works
against you in a the case of a DoS attack -- a malicious user could
potentially evict a large amount of the cache by flooding it with
variations on a single large blob. If mirrors.kernel.org shares a cache
with other sites, it might be a Bad Thing™. Actually, if the Varnish
instance used for mirrors.kernel.org is shared with other subdomains,
you might consider disabling it entirely for files below
mirrors.kernel.org. Relying on the kernel's page cache alone seems like
a better strategy.

Thanks for your consideration.

Cheers,
dR

[1] https://bbs.archlinux.org/viewtopic.php?id=192604


More information about the pacman-dev mailing list