[aur-dev] Making the AUR package list more useful

Sat May 21 08:00:09 UTC 2016

On Wed, 18 May 2016 at 14:58:46, Dave Reisner wrote:
> On Mon, May 02, 2016 at 08:53:35AM +0200, Lukas Fleischer wrote:
> [...]
> > On Sat, 30 Apr 2016 at 18:16:54, Dave Reisner wrote:
> > > Hrmm, I don't know that this is an equal comparison. Here's my
> > > perception of the current world:
> > > 
> > > pacman relies on distribution of the *entire* DB to mirrors around the
> > > world. Due to the tiered mirror system, you can basically only rely on
> > > eventual consistency of tier N>0 with tier 0, but the DBs at any given
> > > point in time should be consistent with themselves (i.e. assuming
> > > they're well-behaved, they won't advertise packages which they don't
> > > have). In addition to the sync tarballs, pacman relies on a local
> > > database which it mutates as packages are installed, upgraded, and
> > > removed.
> > > 
> > 
> > Yeah, as I mentioned further below, providing the full database might be
> > a long-term goal.
> > 
> > Are the mirrors part of the basic concept behind pacman? I always had
> > the impression they primarily exist to improve download performance. One
> > could also distribute the database among servers all over the world and
> > allow clients to perform remote procedure calls on each of them. We
> > could introduce those mirrors to the AUR as well, independent of whether
> > the database is transferred to the clients or not.
> 
> I wouldn't consider the mirrors to be a "basic concept" since the
> mirrorlist is strictly an Arch Linux provided thing. They act as a
> best-effort load balancing strategy since we don't do anything
> intelligent with redirects and simply rely on people to pick a mirror
> which fulfills some value of "performs well" for their purposes. They
> also offer a level of redundancy -- we clearly don't suffer global
> outages when mirrors are unavailable.

Okay, I only wondered why you brought the mirrors up in this discussion.
Both the RPC approach and the full database replication approach work
with mirrors, right? So I think it is fine to ignore them when
discussing the pros and cons of those concepts.

> 
> > > pacman has reduced functionality when it has no reachable mirror -- it's
> > > still capable of removing packages, modifying the local DB (to adjust
> > > install reasons), and install packages which are present in a file
> > > cache.
> > > 
> > 
> > I am not sure I follow. How are orthogonal features relevant to the
> > discussion of whether the sync DB should be copied to the clients or
> > accessed via requests to a server? It really shouldn't matter which
> > additional operations on other objects are supported (and even if it
> > does, there are clients like yaourt which provide a similar interface).
> > The only thing I can think of in this context is that due to copying the
> > database, one can perform queries on the sync database while being
> > offline (i.e. if you want to find out the name of a package you cannot
> > remember using -Ss). The AUR would benefit from that as well.
> > 
> 
> yaourt's (and other similar helpers) capacity to do things outside of
> the AUR is encumbent upon pacman itself. I understand your suggestion
> about -Ss being an "offline" operation, but in the current form of the
> pkgnames tarball, you'd be getting something more similar to 'pacman
> -Slq | grep "$1"'. pacman -Ss would return results based on substring
> matches in descriptions, not just names. Sync'ing the entire AUR DB to
> clients would allow -Ss as well as richer queries like -Si.
> [...]
> Consider the current API offered by the AUR -- a query interface and
> some links to tarballs you can download. In contrast, pacman is a
> full-featured package manager which not only allows queries and
> downloading of tarballs, but also manipulates your filesystem.

yaourt also has support for pulling PKGBUILDs from the ABS which is
something that does not directly depend on pacman. But again, I still do
not see how any of the orthogonal functionality a tool provides is
relevant to this discussion.

And, as I wrote earlier, extending the package name list to a full
database is what we ultimately aim for in the approach I suggested.

> Why would you need javascript? All I'm suggesting is that aurweb issues
> GET requests against itself and renders the returned JSON in some
> meaningful way. Can't that be done in PHP?
> 

Sure, that can be done. But I do not see how that would be useful. The
RPC interface and the web page backend should use the same library
functions internally, sure. But why should we make huge efforts to
replace regular function calls with HTTP GET requests and encode results
in JSON, only to immediately decode them on the same machine afterwards?
Do we plan on splitting the RPC server and the website backend?

> > > Agreed. Regular expressions aren't necessarily what we want to end up
> > > with. As an alternative, prefix and suffix matching would be
> > > substantially cheaper, less prone to abuse/dos, and would probably
> > > fulfill the needs of most people.
> > > 
> > 
> > True. But then again, there are some use cases for regular expressions
> > that are not covered by matching prefixes and suffixes and we might,
> > again, have people requesting support for them (we had people explicitly
> > requesting regular expression support a couple of times). If we decide
> > to reject those requests and tell people "Hey, you cannot do that on the
> > AUR.", this simply means that they will revive their web scrapers and
> > build their own package name databases based on the web pages, as they
> > did before we introduced packages.gz. I even did that myself to build
> > the database for aurdupes (which is another good example that requires
> > the full set of names to be available locally) before packages.gz was
> > there. This is Arch after all, and our users become creative when they
> > are not given the interface they want.
> > 
> 
> To be clear, I'm not opposed to adding support for regex. If you're in
> favor of it, I have to wonder why it hasn't been added yet...

I am not in favor of adding regular expressions on the server side. I
mentioned some of the reasons in an earlier reply and you agreed.
However, instead of saying that we need something less powerful (like
prefix and suffix matching) on the server side, I think that adding
support for regular expressions (and maybe even more powerful things) on
the client side is the way to go.

Lukas