[aur-dev] pkgbase queries via RPC interface
Hi, The RPC search, info and multiinfo methods return no data when the pkgbase is passed as the argument. I would like to discuss the following ideas: A) Return all of the packages in the pkgbase for search, info and multiinfo queries. For search and multiinfo this would be natural. For info it would be left to the client to detect that the result is an array instead of an object and thus infer that the result is from a pkgbase (or simply add that data to a new field in the result object). With the advent of the query version parameter this should not be an issue for backwards compatibility. B) Add a parameter to query pkgbase data (e.g. object=pkgbase) and return a JSON object with pkgbase-specific data (everything that still applies from regular package data, plus an array of the included packages). Given that there are AUR pages for pkgbases, it would be consistent to provide that data via the RPC interface. I believe this would be very useful. What do you think? Would it be difficult? Regards, Xyne
(A) Although, I thought `info` lookups had been converted to return Arrays as well. If not, might I resuggest the merging of `info` and `multiinfo`? Fuse them into just `info`, and return any number of results as an Array, given any number of arguments, also including the ability to search by pkgbase. On 12 August 2014 21:56, Xyne <xyne@archlinux.ca> wrote:
Hi,
The RPC search, info and multiinfo methods return no data when the pkgbase is passed as the argument.
I would like to discuss the following ideas:
A) Return all of the packages in the pkgbase for search, info and multiinfo queries. For search and multiinfo this would be natural. For info it would be left to the client to detect that the result is an array instead of an object and thus infer that the result is from a pkgbase (or simply add that data to a new field in the result object). With the advent of the query version parameter this should not be an issue for backwards compatibility.
B) Add a parameter to query pkgbase data (e.g. object=pkgbase) and return a JSON object with pkgbase-specific data (everything that still applies from regular package data, plus an array of the included packages). Given that there are AUR pages for pkgbases, it would be consistent to provide that data via the RPC interface.
I believe this would be very useful. What do you think? Would it be difficult?
Regards, Xyne
On Wed, Aug 13, 2014 at 12:28:10PM -0700, Colin Woodbury wrote:
(A)
Although, I thought `info` lookups had been converted to return Arrays as well.
This is only true of RPCv3: https://bugs.archlinux.org/task/40963
If not, might I resuggest the merging of `info` and `multiinfo`? Fuse them into just `info`, and return any number of results as an Array, given any number of arguments, also including the ability to search by pkgbase.
On 12 August 2014 21:56, Xyne <xyne@archlinux.ca> wrote:
Hi,
The RPC search, info and multiinfo methods return no data when the pkgbase is passed as the argument.
I would like to discuss the following ideas:
A) Return all of the packages in the pkgbase for search, info and multiinfo queries. For search and multiinfo this would be natural. For info it would be left to the client to detect that the result is an array instead of an object and thus infer that the result is from a pkgbase (or simply add that data to a new field in the result object). With the advent of the query version parameter this should not be an issue for backwards compatibility.
B) Add a parameter to query pkgbase data (e.g. object=pkgbase) and return a JSON object with pkgbase-specific data (everything that still applies from regular package data, plus an array of the included packages). Given that there are AUR pages for pkgbases, it would be consistent to provide that data via the RPC interface.
I believe this would be very useful. What do you think? Would it be difficult?
Regards, Xyne
On 2014-08-13 16:12 -0400 Dave Reisner wrote:
On Wed, Aug 13, 2014 at 12:28:10PM -0700, Colin Woodbury wrote:
(A)
Although, I thought `info` lookups had been converted to return Arrays as well.
This is only true of RPCv3:
Incidentally, shouldn't the optdepends be returned as an object? E.g. {"foolib":"support for foo", "bar":"pub functionality"}
On Aug 14, 2014 8:41 AM, "Xyne" <xyne@archlinux.ca> wrote:
On 2014-08-13 16:12 -0400 Dave Reisner wrote:
On Wed, Aug 13, 2014 at 12:28:10PM -0700, Colin Woodbury wrote:
(A)
Although, I thought `info` lookups had been converted to return Arrays
as
well.
This is only true of RPCv3:
Incidentally, shouldn't the optdepends be returned as an object? E.g.
{"foolib":"support for foo", "bar":"pub functionality"}
Why? makepkg requires that optdeps are well formed. Splitting them in the response here seems like an arbitrary decision. Why not also then split depends, in case they're versioned? What about the pkgver and pkgrel (which are combined)?
On 2014-08-14 08:45 -0400 Dave Reisner wrote:
Incidentally, shouldn't the optdepends be returned as an object? E.g.
{"foolib":"support for foo", "bar":"pub functionality"}
Why? makepkg requires that optdeps are well formed. Splitting them in the response here seems like an arbitrary decision. Why not also then split depends, in case they're versioned? What about the pkgver and pkgrel (which are combined)?
I see the optdeps as a set of key-value pairs that lends itself naturally to a JSON object. I would actually be in favor of splitting deps and the combined pkgver just for the sake of returning a full digest of the information instead of a partial one that leaves the rest to the user, but it's admittedly trivial and probably a waste of CPU cycles on the server so I'm not actually suggesting it. Thinking about it, I also realize that it wouldn't work for ranged deps, e.g. (foo>=3.0, foo<4.0) without relatively complex objects.
On Wed, 13 Aug 2014 at 06:56:00, Xyne wrote:
Hi,
The RPC search, info and multiinfo methods return no data when the pkgbase is passed as the argument.
I would like to discuss the following ideas:
A) Return all of the packages in the pkgbase for search, info and multiinfo queries. For search and multiinfo this would be natural. For info it would be left to the client to detect that the result is an array instead of an object and thus infer that the result is from a pkgbase (or simply add that data to a new field in the result object). With the advent of the query version parameter this should not be an issue for backwards compatibility.
I don't quite understand the idea. What is the argument that is passed to the RPC? * A package name? If so, I personally think it is very counterintuitive to return all "related" packages as well. Also, we are going to waste bandwidth by often returning information the user is not interested in. * A package base name? Does that mean we will no longer be able to search for packages? * Both? What happens if there is a package base and a package with the same name (and the package doesn't belong to that package base)?
B) Add a parameter to query pkgbase data (e.g. object=pkgbase) and return a JSON object with pkgbase-specific data (everything that still applies from regular package data, plus an array of the included packages). Given that there are AUR pages for pkgbases, it would be consistent to provide that data via the RPC interface.
This sounds better to me.
I believe this would be very useful. What do you think? Would it be difficult?
I like this idea. Should not be too hard to implement. Maybe we should make the whole thing more flexible by replacing everything with a single search method (instead of search, info and multiinfo) with different search modes (packages and package bases), a filter to specify the fields on is interested in and different query types (by name, by name and description).
Regards, Xyne
On 2014-08-14 23:05 +0200 Lukas Fleischer wrote:
B) Add a parameter to query pkgbase data (e.g. object=pkgbase) and return a JSON object with pkgbase-specific data (everything that still applies from regular package data, plus an array of the included packages). Given that there are AUR pages for pkgbases, it would be consistent to provide that data via the RPC interface.
This sounds better to me.
I believe this would be very useful. What do you think? Would it be difficult?
I like this idea. Should not be too hard to implement. Maybe we should make the whole thing more flexible by replacing everything with a single search method (instead of search, info and multiinfo) with different search modes (packages and package bases), a filter to specify the fields on is interested in and different query types (by name, by name and description).
That works for me. In that case, the method should accept a list of fields to search (pkgname, pkgbase, pkgdesc, maintainer, deps, url?, etc.). There should be a way to search for exact matches (e.g. to retrieve data about a specific package or package base). That could be done either with regexes (too much server overhead?) or an extra parameter that forces a perfect match. The returned objects should include a "type" field to specify what kind of object they are ("package", "package base"). A filter to reduce the returned fields may be useful in some cases but it's easy enough to filter on the client-side. I suppose it's a matter of cpu vs bandwidth for the server. Search for packages "foo" and "bar" (multiple arguments -> multiple returned objects): https://aur.archlinux.org/search.php?by=pkgname&exact=true&arg=foo&arg=bar Search for package base "baz": https://aur.archlinux.org/search.php?by=pkgbase&exact=true&arg=baz Search for all packages depending on "foo" (not exact to allow for versioned deps): https://aur.archlinux.org/search.php?by=depends&arg=foo You could even split the query string by the "by" parameters to enable multifield searches, e.g. to search for all python packages maintained by "foo": https://aur.archlinux.org/search.php?by=pkgname&arg=python-&by=maintainer&exact=true&arg=foo There may be multiple arguments to each "by" (e.g. by=pkgname&arg=python-&arg=python3-) so this may tricky to do with a single backend pass, but it would be easy with subsequent refining passes for each "by"). I'm not sure what the best way to build in boolean logic would be ("and", "or", "xor"?, etc.) or if it is even something that you would want to implement. Maybe with a custom "advanced" parameter that accepts a string that the server can parse directly (using some existing syntax?). I'm just kicking around some ideas for the sake of discussion.
On Sat, 16 Aug 2014 at 05:25:12, Xyne wrote:
[...] That works for me. In that case, the method should accept a list of fields to search (pkgname, pkgbase, pkgdesc, maintainer, deps, url?, etc.). There should be a way to search for exact matches (e.g. to retrieve data about a specific package or package base). That could be done either with regexes (too much server overhead?) or an extra parameter that forces a perfect match.
The returned objects should include a "type" field to specify what kind of object they are ("package", "package base").
A filter to reduce the returned fields may be useful in some cases but it's easy enough to filter on the client-side. I suppose it's a matter of cpu vs bandwidth for the server. [...] I'm not sure what the best way to build in boolean logic would be ("and", "or", "xor"?, etc.) or if it is even something that you would want to implement. Maybe with a custom "advanced" parameter that accepts a string that the server can parse directly (using some existing syntax?).
I'm just kicking around some ideas for the sake of discussion.
I'd rather not overcomplicate things. Having a "by" parameter, the possibility to pass one or multiple (fixed) strings and an option to enable exact matching is what I was thinking of. I do not think that combining search types gives a substantial benefit. If we really need to support very powerful queries, it might be better to reconsider another idea I had earlier: Replace the RPC interface with a static database. Basically, the result of an RPC query that matches every single package is computed every hour (or so) and stored in a flat file which can be downloaded, similar to pacman databases. AUR helpers can download that file and do whatever they want. Note that this file will probably be quite large, though (roughly 5-10MiB when compressed, did not check with the latest set of packages). I am not sure whether this is the best thing to do, since, unlike in the case of the official repositories, users are usually only interested in a tiny amount of AUR packages.
On 2014-08-16 16:52 +0200 Lukas Fleischer wrote:
I'd rather not overcomplicate things. Having a "by" parameter, the possibility to pass one or multiple (fixed) strings and an option to enable exact matching is what I was thinking of. I do not think that combining search types gives a substantial benefit.
If we really need to support very powerful queries, it might be better to reconsider another idea I had earlier: Replace the RPC interface with a static database. Basically, the result of an RPC query that matches every single package is computed every hour (or so) and stored in a flat file which can be downloaded, similar to pacman databases. AUR helpers can download that file and do whatever they want. Note that this file will probably be quite large, though (roughly 5-10MiB when compressed, did not check with the latest set of packages). I am not sure whether this is the best thing to do, since, unlike in the case of the official repositories, users are usually only interested in a tiny amount of AUR packages.
The advantage of real-time results is that they can be used to confirm uploads and other operations. Making a compressed database available may be useful in its own right but I would not want to see it replace the current system. Given that the current search already searches by both name and description, would a single "by" parameter be able to at least accept a character-delimited list to search multiple fields (e.g. pkgname:pkgbase:pkgdesc)? If not, what about accepting multiple "by"s for a combined OR search?
(Aura author here) I personally use real-time results to confirm uploads all the time. I wouldn't be in favour of a DB replacing the RPC. That said, every once and a while I have feature requests from users along the lines of "Well if you had a local AUR db then feature X would be trivial." On 16 August 2014 11:24, Xyne <xyne@archlinux.ca> wrote:
On 2014-08-16 16:52 +0200 Lukas Fleischer wrote:
I'd rather not overcomplicate things. Having a "by" parameter, the possibility to pass one or multiple (fixed) strings and an option to enable exact matching is what I was thinking of. I do not think that combining search types gives a substantial benefit.
If we really need to support very powerful queries, it might be better to reconsider another idea I had earlier: Replace the RPC interface with a static database. Basically, the result of an RPC query that matches every single package is computed every hour (or so) and stored in a flat file which can be downloaded, similar to pacman databases. AUR helpers can download that file and do whatever they want. Note that this file will probably be quite large, though (roughly 5-10MiB when compressed, did not check with the latest set of packages). I am not sure whether this is the best thing to do, since, unlike in the case of the official repositories, users are usually only interested in a tiny amount of AUR packages.
The advantage of real-time results is that they can be used to confirm uploads and other operations. Making a compressed database available may be useful in its own right but I would not want to see it replace the current system.
Given that the current search already searches by both name and description, would a single "by" parameter be able to at least accept a character-delimited list to search multiple fields (e.g. pkgname:pkgbase:pkgdesc)? If not, what about accepting multiple "by"s for a combined OR search?
On Mon, Aug 18, 2014 at 7:50 PM, Colin Woodbury <colingw@gmail.com> wrote:
That said, every once and a while I have feature requests from users along the lines of "Well if you had a local AUR db then feature X would be trivial."
Do you mean something like https://aur.archlinux.org/packages/aurlist/ ?
`aurlist` seems to do the searching/formatting for you. I meant more of a local source of parsable data for AUR packages. On 18 August 2014 11:34, Karol Blazewicz <karol.blazewicz@gmail.com> wrote:
On Mon, Aug 18, 2014 at 7:50 PM, Colin Woodbury <colingw@gmail.com> wrote:
That said, every once and a while I have feature requests from users along the lines of "Well if you had a local AUR db then feature X would be trivial."
Do you mean something like https://aur.archlinux.org/packages/aurlist/ ?
And for helpers to have to redownload that that frequently would be a pain. Would it update when a new package was uploaded or changed? On 16 August 2014 07:52, Lukas Fleischer <archlinux@cryptocrack.de> wrote:
On Sat, 16 Aug 2014 at 05:25:12, Xyne wrote:
[...] That works for me. In that case, the method should accept a list of fields to search (pkgname, pkgbase, pkgdesc, maintainer, deps, url?, etc.). There should be a way to search for exact matches (e.g. to retrieve data about a specific package or package base). That could be done either with regexes (too much server overhead?) or an extra parameter that forces a perfect match.
The returned objects should include a "type" field to specify what kind of object they are ("package", "package base").
A filter to reduce the returned fields may be useful in some cases but it's easy enough to filter on the client-side. I suppose it's a matter of cpu vs bandwidth for the server. [...] I'm not sure what the best way to build in boolean logic would be ("and", "or", "xor"?, etc.) or if it is even something that you would want to implement. Maybe with a custom "advanced" parameter that accepts a string that the server can parse directly (using some existing syntax?).
I'm just kicking around some ideas for the sake of discussion.
I'd rather not overcomplicate things. Having a "by" parameter, the possibility to pass one or multiple (fixed) strings and an option to enable exact matching is what I was thinking of. I do not think that combining search types gives a substantial benefit.
If we really need to support very powerful queries, it might be better to reconsider another idea I had earlier: Replace the RPC interface with a static database. Basically, the result of an RPC query that matches every single package is computed every hour (or so) and stored in a flat file which can be downloaded, similar to pacman databases. AUR helpers can download that file and do whatever they want. Note that this file will probably be quite large, though (roughly 5-10MiB when compressed, did not check with the latest set of packages). I am not sure whether this is the best thing to do, since, unlike in the case of the official repositories, users are usually only interested in a tiny amount of AUR packages.
On Sat, 16 Aug 2014 at 20:42:21, Colin Woodbury wrote:
And for helpers to have to redownload that that frequently would be a pain. Would it update when a new package was uploaded or changed? [...]
Well, pacman needs to frequently reload the package databases as well (every time you run `pacman -Sy`). If we have delta support, it should not be too bad. As I mentioned before, it would update periodically (every hour or so, and maybe additional delay if mirrors are going to be used). You would have the same delay the you currently have when installing packages from the official repositories.
On 16/08/14 15:52, Lukas Fleischer wrote:
On Sat, 16 Aug 2014 at 05:25:12, Xyne wrote:
I'm just kicking around some ideas for the sake of discussion.
I'd rather not overcomplicate things. Having a "by" parameter, the possibility to pass one or multiple (fixed) strings and an option to enable exact matching is what I was thinking of. I do not think that combining search types gives a substantial benefit.
I think that this is about right. The biggest weakness with the current search interface is not being able to query anything other than the name and description - so something like a "by" parameter would be very welcome. The only other thing that can't be fully implemented on the client side is negation. It's possible to exclude terms if they are combined with at least one other positive term. For instance, to search for "foo" AND NOT "bar", you can first search for "foo" and then do some post-filtering on the client side to exclude "bar". But what if you wanted to search for *all* packages that do not contain "bar"? This is currently impossible, because you'd need to pull the entire package list before it could be filtered on the client-side. So I would like to suggest adding something like an "narg" parameter to complement the "arg" parameter, to allow for the possibility of a list of terms to be excluded from the search results.
If we really need to support very powerful queries, it might be better to reconsider another idea I had earlier: Replace the RPC interface with a static database. Basically, the result of an RPC query that matches every single package is computed every hour (or so) and stored in a flat file which can be downloaded, similar to pacman databases. AUR helpers can download that file and do whatever they want. Note that this file will probably be quite large, though (roughly 5-10MiB when compressed, did not check with the latest set of packages). I am not sure whether this is the best thing to do, since, unlike in the case of the official repositories, users are usually only interested in a tiny amount of AUR packages.
I think the most significant difference between the the official repos and the AUR is volatility. The AUR is subject to far greater and more frequent change, and, unlike the official repos, those changes are far less co-ordinated (to put it mildly). So for the AUR, it is perhaps more important to have the most up-to-date information available, than it is to have to greater control over the entire database. Having said that, I don't think having a downloadable static database is a bad idea in itself - I'm sure many people who maintain AUR helpers would like to have it there as an option. But I think it would be a mistake to have it replace the RPC interface altogether.
On Sat, 16 Aug 2014 at 21:04:05, kachelaqa wrote:
[...] The only other thing that can't be fully implemented on the client side is negation. It's possible to exclude terms if they are combined with at least one other positive term. For instance, to search for "foo" AND NOT "bar", you can first search for "foo" and then do some post-filtering on the client side to exclude "bar". But what if you wanted to search for *all* packages that do not contain "bar"? This is currently impossible, because you'd need to pull the entire package list before it could be filtered on the client-side. So I would like to suggest adding something like an "narg" parameter to complement the "arg" parameter, to allow for the possibility of a list of terms to be excluded from the search results. [...]
Just out of curiosity, can you give an example of such a negative query that would result in a result set with less than 5000 packages? I am asking because queries with results of more than 5000 packages fail. Having said that, I think that having something like narg is a good idea. It is easy to implement and has real use cases.
On 16/08/14 20:20, Lukas Fleischer wrote:
On Sat, 16 Aug 2014 at 21:04:05, kachelaqa wrote:
[...] The only other thing that can't be fully implemented on the client side is negation. It's possible to exclude terms if they are combined with at least one other positive term. For instance, to search for "foo" AND NOT "bar", you can first search for "foo" and then do some post-filtering on the client side to exclude "bar". But what if you wanted to search for *all* packages that do not contain "bar"? This is currently impossible, because you'd need to pull the entire package list before it could be filtered on the client-side. So I would like to suggest adding something like an "narg" parameter to complement the "arg" parameter, to allow for the possibility of a list of terms to be excluded from the search results. [...]
Just out of curiosity, can you give an example of such a negative query that would result in a result set with less than 5000 packages? I am asking because queries with results of more than 5000 packages fail.
I think I see what you're getting at: my actual example is somewhat contrived, because "bare" negations are probably a quite rare use-case. However, you've pointed at a much better use-case: searching for "foo" alone might exceed the 5000 limit, but searching for "foo" AND NOT "bar" could often help keep it below that limit. So, really, "bare" negations are just the most extreme example of that.
Having said that, I think that having something like narg is a good idea. It is easy to implement and has real use cases.
participants (6)
-
Colin Woodbury
-
Dave Reisner
-
kachelaqa
-
Karol Blazewicz
-
Lukas Fleischer
-
Xyne