[aur-dev] Making the AUR package list more useful

Wed May 18 12:58:46 UTC 2016

On Mon, May 02, 2016 at 08:53:35AM +0200, Lukas Fleischer wrote:

I haven't really meant to drop this, it's just been a busy month for me.
I'm about to be travelling for work for the next week, too.

> On Sat, 30 Apr 2016 at 18:16:54, Dave Reisner wrote:
> > Hrmm, I don't know that this is an equal comparison. Here's my
> > perception of the current world:
> > 
> > pacman relies on distribution of the *entire* DB to mirrors around the
> > world. Due to the tiered mirror system, you can basically only rely on
> > eventual consistency of tier N>0 with tier 0, but the DBs at any given
> > point in time should be consistent with themselves (i.e. assuming
> > they're well-behaved, they won't advertise packages which they don't
> > have). In addition to the sync tarballs, pacman relies on a local
> > database which it mutates as packages are installed, upgraded, and
> > removed.
> > 
> 
> Yeah, as I mentioned further below, providing the full database might be
> a long-term goal.
> 
> Are the mirrors part of the basic concept behind pacman? I always had
> the impression they primarily exist to improve download performance. One
> could also distribute the database among servers all over the world and
> allow clients to perform remote procedure calls on each of them. We
> could introduce those mirrors to the AUR as well, independent of whether
> the database is transferred to the clients or not.

I wouldn't consider the mirrors to be a "basic concept" since the
mirrorlist is strictly an Arch Linux provided thing. They act as a
best-effort load balancing strategy since we don't do anything
intelligent with redirects and simply rely on people to pick a mirror
which fulfills some value of "performs well" for their purposes. They
also offer a level of redundancy -- we clearly don't suffer global
outages when mirrors are unavailable.

> > pacman has reduced functionality when it has no reachable mirror -- it's
> > still capable of removing packages, modifying the local DB (to adjust
> > install reasons), and install packages which are present in a file
> > cache.
> > 
> 
> I am not sure I follow. How are orthogonal features relevant to the
> discussion of whether the sync DB should be copied to the clients or
> accessed via requests to a server? It really shouldn't matter which
> additional operations on other objects are supported (and even if it
> does, there are clients like yaourt which provide a similar interface).
> The only thing I can think of in this context is that due to copying the
> database, one can perform queries on the sync database while being
> offline (i.e. if you want to find out the name of a package you cannot
> remember using -Ss). The AUR would benefit from that as well.
> 

yaourt's (and other similar helpers) capacity to do things outside of
the AUR is encumbent upon pacman itself. I understand your suggestion
about -Ss being an "offline" operation, but in the current form of the
pkgnames tarball, you'd be getting something more similar to 'pacman
-Slq | grep "$1"'. pacman -Ss would return results based on substring
matches in descriptions, not just names. Sync'ing the entire AUR DB to
clients would allow -Ss as well as richer queries like -Si.

> > In contrast, the AUR currently only offers an API to support adhoc
> > queries. There are no mirrors, and the RPC interface offers strong
> > consistency with the contents of the AUR. I think we can agree that in
> > the current form, packages.gz and pkgbases.gz files aren't very useful
> > as they tend to lag too far behind reality.
> > 
> 
> Of course. Which is why I started this thread (even though, actually, I
> do not think it is *too* bad to lag an hour or two behind; it should not
> matter in 99.9% of the use cases).
> 
> > AUR clients currently have a hard dependency on the network. If they
> > cannot reach the AUR, they cannot do anything useful.
> > 
> 
> Yeah, again, the same would apply to pacman if we split the sync
> operations into a separate utility as we do in the case of the AUR.
> Conversely, there is yaourt providing the pacman interface and there are
> other AUR helpers that cannot download a source package but still
> build/install a downloaded package when you are offline.
> 
> I might be missing something...

Consider the current API offered by the AUR -- a query interface and
some links to tarballs you can download. In contrast, pacman is a
full-featured package manager which not only allows queries and
downloading of tarballs, but also manipulates your filesystem.

> > Your proposal to make the pkgname/pkgbase tarballs more closely
> > consistent doesn't change the network dependency. All it seems to do is
> > offload the ability to perform more precise searching to the client,
> > *if* they choose to implement it. I'm suggesting that the server should
> > do this, such that we have a single implementation which *everyone* can
> > take advantage of. Not just clients of the RPC interface, but the web UI
> > as well.
> > 
> 
> Having the web UI make extensive use of the RPC interface is a good
> argument against moving towards my suggestions indeed. However, that
> would mean we fundamentally change the principles aurweb is currently
> built upon. Everything should work without any annoyances with JS
> disabled and using only a text-mode browser. Maybe that is too
> old-fashioned thinking; maybe even among Arch users, only few users need
> support for that and everyone else might benefit from a more "modern"
> interface...

Why would you need javascript? All I'm suggesting is that aurweb issues
GET requests against itself and renders the returned JSON in some
meaningful way. Can't that be done in PHP?

> > Agreed. Regular expressions aren't necessarily what we want to end up
> > with. As an alternative, prefix and suffix matching would be
> > substantially cheaper, less prone to abuse/dos, and would probably
> > fulfill the needs of most people.
> > 
> 
> True. But then again, there are some use cases for regular expressions
> that are not covered by matching prefixes and suffixes and we might,
> again, have people requesting support for them (we had people explicitly
> requesting regular expression support a couple of times). If we decide
> to reject those requests and tell people "Hey, you cannot do that on the
> AUR.", this simply means that they will revive their web scrapers and
> build their own package name databases based on the web pages, as they
> did before we introduced packages.gz. I even did that myself to build
> the database for aurdupes (which is another good example that requires
> the full set of names to be available locally) before packages.gz was
> there. This is Arch after all, and our users become creative when they
> are not given the interface they want.
> 

To be clear, I'm not opposed to adding support for regex. If you're in
favor of it, I have to wonder why it hasn't been added yet...

> > If you wanted to offer the ability to return just the size of the
> > resultset for some advanced search method, you could add another
> > parameter to the current search interface which would elide the
> > 'results' list in the reponse JSON. You already have a 'resultcount'
> > field with the size.
> > 
> 
> That is what I meant. We need to change the interface for every feature
> request that pops up.
> 
> > > * Directly publish all the information required to answer all possible
> > >   requests. Let the clients do whatever they want. Currently, we only
> > >   provide package names but in the future, this could be extended to a
> > >   more complete database like the one pacman uses.
> > 
> > This has the same problems as the current gz files -- you can only
> > offer eventual consistency. It also only scales well if you can
> > distribute the load in the same way that pacman does with a tiered
> > mirror system. This comes with a non-zero maintenance cost.
> > 
> 
> True. If it turns out that one server is not sufficient, mirrors need to
> be added. However, since we already have all the infrastructure, I
> expect the extra maintenance cost to be rather small. I also think that
> adding something like zsync support can reduce traffic by orders of
> magnitude. This is also something pacman/libalpm itself could benefit
> from.
> 
> Regards,
> Lukas