[pacman-dev] [RFC] support for URL query strings and fragments

Xyne xyne at archlinux.ca
Fri May 10 15:11:51 EDT 2013


On 2013-05-10 18:02 +0300
Mohammad_Alsaleh wrote:

>> The first would allow for file names in query strings (e.g. "?file=$file"). It
>> could be used for a build server, for example. Of course, you can extract the
>> file name from the request path but that requires hacking the server code or
>> using something like mod_rewrite to mangle URLS. Having the file name sent in a
>> get variable is much more convenient for server-side programming.
>> 
>
>Maybe I didn't understand your problem. But wouldn't using
>the Content-Disposition header solve it?

That is unrelated. Going back to the Pacserve example. the server runs on
localhost. When a package is requested via an HTTP GET request, it checks the
local cache for the package and returns it if it is there. If not, it queries
other Pacserves on the LAN and sends a simple 303 response to redirect to a
Pacserve server that has the package, if there is one.

Because Pacserve only serves packages (and not databases), there is no need to
have different directories for different repos and architectures. The
architecture is contained in the package name and the repo is unimportant. The
same reasons permit the official servers to keep all of their packages in the
"pool" directories without any confusion.

The problem arises when no local Pacserve servers have the package and Pacserve
needs to redirect to an external mirror. It then needs to know which repo and
which architecture the package is for so that it can select the correct URL
from pacman.conf and replace the "$repo" and "$arch" variables in the URL
before returning it to the client with the package name appended.

Currently this information must be gathered by creating paths such as
/core/os/i686
/core/os/x86_64
/extra/os/i686
/extra/os/x86_64
...
on the server. Even if you have access to the server software and can tweak the
configuration or settings for path rewriting (e.g. with Apache's mod_rewrite,
directly or via .htaccess), it's still a pain and it's silly if all you need
are the repo and arch values. It should be possible to pass those values via
GET parameters in such a way that pacman can convert:

Server = http://example.com/pkgs/?repo=$repo&arch=$arch

to

http://example.com/pkgs/foo-1.3-4-x86_64.pkg.tar.xz?repo=bar&arch=x86_64

Pacman blindly interpolates $repo and $arch, so that works (although it really
should percent-encode them to be sure), but does not understand the query
string and fragment parts of the URI, so it can't append the name to the path.

For now I have worked around it with

Server = http://localhost:15678/pkg/?repo=$repo&arch=$arch&file=

but that is a kludge that requires additional server processing and it still
generates malformed URLs because it will be converted to "...&file/foo...". The
forward slash should be percent encoded and there is no way to get Pacman to
omit the slash, so even if it works in some cases, it is technically wrong.

Obviously I have started this discussion because I could really use this for
pacserve, but it would also be very useful for scripting package servers. You
could send all the desired information to the server via GET parameters (repo,
arch, package) and have the server locate or build the package. With this,
everything is controlled entirely by a single script on the server. Without it,
the path has to be mangled, which requires access to the server software or
particular settings.

To preempt one possible argument, Arch might not officially support such URIs,
but pacman aims to be distro-agnostic. Besides, modularizing code and making it
more robust is a good thing.


I hope that clears up the idea. I'm tired and likely rambling, so I'll shut up
now.



More information about the pacman-dev mailing list