[aur-dev] Making the AUR package list more useful
d at falconindy.com
Wed May 18 12:58:46 UTC 2016
On Mon, May 02, 2016 at 08:53:35AM +0200, Lukas Fleischer wrote:
I haven't really meant to drop this, it's just been a busy month for me.
I'm about to be travelling for work for the next week, too.
> On Sat, 30 Apr 2016 at 18:16:54, Dave Reisner wrote:
> > Hrmm, I don't know that this is an equal comparison. Here's my
> > perception of the current world:
> > pacman relies on distribution of the *entire* DB to mirrors around the
> > world. Due to the tiered mirror system, you can basically only rely on
> > eventual consistency of tier N>0 with tier 0, but the DBs at any given
> > point in time should be consistent with themselves (i.e. assuming
> > they're well-behaved, they won't advertise packages which they don't
> > have). In addition to the sync tarballs, pacman relies on a local
> > database which it mutates as packages are installed, upgraded, and
> > removed.
> Yeah, as I mentioned further below, providing the full database might be
> a long-term goal.
> Are the mirrors part of the basic concept behind pacman? I always had
> the impression they primarily exist to improve download performance. One
> could also distribute the database among servers all over the world and
> allow clients to perform remote procedure calls on each of them. We
> could introduce those mirrors to the AUR as well, independent of whether
> the database is transferred to the clients or not.
I wouldn't consider the mirrors to be a "basic concept" since the
mirrorlist is strictly an Arch Linux provided thing. They act as a
best-effort load balancing strategy since we don't do anything
intelligent with redirects and simply rely on people to pick a mirror
which fulfills some value of "performs well" for their purposes. They
also offer a level of redundancy -- we clearly don't suffer global
outages when mirrors are unavailable.
> > pacman has reduced functionality when it has no reachable mirror -- it's
> > still capable of removing packages, modifying the local DB (to adjust
> > install reasons), and install packages which are present in a file
> > cache.
> I am not sure I follow. How are orthogonal features relevant to the
> discussion of whether the sync DB should be copied to the clients or
> accessed via requests to a server? It really shouldn't matter which
> additional operations on other objects are supported (and even if it
> does, there are clients like yaourt which provide a similar interface).
> The only thing I can think of in this context is that due to copying the
> database, one can perform queries on the sync database while being
> offline (i.e. if you want to find out the name of a package you cannot
> remember using -Ss). The AUR would benefit from that as well.
yaourt's (and other similar helpers) capacity to do things outside of
the AUR is encumbent upon pacman itself. I understand your suggestion
about -Ss being an "offline" operation, but in the current form of the
pkgnames tarball, you'd be getting something more similar to 'pacman
-Slq | grep "$1"'. pacman -Ss would return results based on substring
matches in descriptions, not just names. Sync'ing the entire AUR DB to
clients would allow -Ss as well as richer queries like -Si.
> > In contrast, the AUR currently only offers an API to support adhoc
> > queries. There are no mirrors, and the RPC interface offers strong
> > consistency with the contents of the AUR. I think we can agree that in
> > the current form, packages.gz and pkgbases.gz files aren't very useful
> > as they tend to lag too far behind reality.
> Of course. Which is why I started this thread (even though, actually, I
> do not think it is *too* bad to lag an hour or two behind; it should not
> matter in 99.9% of the use cases).
> > AUR clients currently have a hard dependency on the network. If they
> > cannot reach the AUR, they cannot do anything useful.
> Yeah, again, the same would apply to pacman if we split the sync
> operations into a separate utility as we do in the case of the AUR.
> Conversely, there is yaourt providing the pacman interface and there are
> other AUR helpers that cannot download a source package but still
> build/install a downloaded package when you are offline.
> I might be missing something...
Consider the current API offered by the AUR -- a query interface and
some links to tarballs you can download. In contrast, pacman is a
full-featured package manager which not only allows queries and
downloading of tarballs, but also manipulates your filesystem.
> > Your proposal to make the pkgname/pkgbase tarballs more closely
> > consistent doesn't change the network dependency. All it seems to do is
> > offload the ability to perform more precise searching to the client,
> > *if* they choose to implement it. I'm suggesting that the server should
> > do this, such that we have a single implementation which *everyone* can
> > take advantage of. Not just clients of the RPC interface, but the web UI
> > as well.
> Having the web UI make extensive use of the RPC interface is a good
> argument against moving towards my suggestions indeed. However, that
> would mean we fundamentally change the principles aurweb is currently
> built upon. Everything should work without any annoyances with JS
> disabled and using only a text-mode browser. Maybe that is too
> old-fashioned thinking; maybe even among Arch users, only few users need
> support for that and everyone else might benefit from a more "modern"
GET requests against itself and renders the returned JSON in some
meaningful way. Can't that be done in PHP?
> > Agreed. Regular expressions aren't necessarily what we want to end up
> > with. As an alternative, prefix and suffix matching would be
> > substantially cheaper, less prone to abuse/dos, and would probably
> > fulfill the needs of most people.
> True. But then again, there are some use cases for regular expressions
> that are not covered by matching prefixes and suffixes and we might,
> again, have people requesting support for them (we had people explicitly
> requesting regular expression support a couple of times). If we decide
> to reject those requests and tell people "Hey, you cannot do that on the
> AUR.", this simply means that they will revive their web scrapers and
> build their own package name databases based on the web pages, as they
> did before we introduced packages.gz. I even did that myself to build
> the database for aurdupes (which is another good example that requires
> the full set of names to be available locally) before packages.gz was
> there. This is Arch after all, and our users become creative when they
> are not given the interface they want.
To be clear, I'm not opposed to adding support for regex. If you're in
favor of it, I have to wonder why it hasn't been added yet...
> > If you wanted to offer the ability to return just the size of the
> > resultset for some advanced search method, you could add another
> > parameter to the current search interface which would elide the
> > 'results' list in the reponse JSON. You already have a 'resultcount'
> > field with the size.
> That is what I meant. We need to change the interface for every feature
> request that pops up.
> > > * Directly publish all the information required to answer all possible
> > > requests. Let the clients do whatever they want. Currently, we only
> > > provide package names but in the future, this could be extended to a
> > > more complete database like the one pacman uses.
> > This has the same problems as the current gz files -- you can only
> > offer eventual consistency. It also only scales well if you can
> > distribute the load in the same way that pacman does with a tiered
> > mirror system. This comes with a non-zero maintenance cost.
> True. If it turns out that one server is not sufficient, mirrors need to
> be added. However, since we already have all the infrastructure, I
> expect the extra maintenance cost to be rather small. I also think that
> adding something like zsync support can reduce traffic by orders of
> magnitude. This is also something pacman/libalpm itself could benefit
More information about the aur-dev