Re: [aur-dev] Making the AUR package list more useful
On Apr 29, 2016 02:50, "Lukas Fleischer" <lfleischer@archlinux.org> wrote:
Recently, there were a couple of feature requests to make the AUR package search more powerful. While I do not plan on adding more patterns or regular expressions to the RPC interface itself, my idea is that more tools should be using the package name list. However, there seem to be two issues with that:
What are your reasons for not wanting to extend the RPC interface? Splitting the API in this way is cumbersome for clients. We now are supposed to download and maintain data which is essentially an index (which ought to be internal to the server to allow strong consistency). You're also asking clients of a read-only interface to become stateful, which isn't something that really interests me.
1. The list is outdated. Right now, it is updated every two hours. I do not think there is a good reason for those long intervals. Reducing it to, say, ten minutes should be totally fine. Or maybe even trigger list generation whenever a package is created or deleted (which is clearly a lot more work, though). Thoughts?
Or, change the storage for the name list such that updates can be fast. Turns out, you already have such a thing, you'd just need an index on the Packages and PackageBases tables.
2. Transferring the whole package name list is inefficient. Even if we use gzip compression here, the whole list is several hundreds of kilobytes large. We need to retransfer the full list, even if only a single package is added. Maybe we can do better than pacman here. My idea is to add zsync support to the lists such that only relevant parts are downloaded (for those who do not know: zsync is like rsync but it works via HTTP as well and does not require any special software on the server side). I did not experiment with how much bandwidth we can actually save using this yet. Maybe the block size needs to be adjusted. Are there any opinions or other suggestions on this topic?
I'm not understanding why any of this is considered a good direction for the API. What are the reasons for wanting the whole list of package names on the client side? Are there use cases other than search?
Regards, Lukas
On Fri, 29 Apr 2016 at 13:56:58, Dave Reisner wrote:
What are your reasons for not wanting to extend the RPC interface? Splitting the API in this way is cumbersome for clients. We now are supposed to download and maintain data which is essentially an index (which ought to be internal to the server to allow strong consistency). You're also asking clients of a read-only interface to become stateful, which isn't something that really interests me.
Isn't that exactly the way pacman works, though? It downloads a copy of the database locally and uses that database to answer requests and obtain package information. My vision is that, optimally, the official repositories and the AUR build upon the same basic concept. Apart from binary vs. source packages, the only real difference between the official repositories and the AUR is the amount of packages but if we figure out a good way to solve point (2) from my initial email, that should be an non-issue. Apart from that, there are two general directions we can go: * Do everything on the server. Keep extending the server for every feature that is needed by some client. What happens if a user only wants to know the number of packages matched by a given expression? Do we really want to force her to fetch the whole list of matched packages, just to obtain its size, or do we add another request type? And even if regular expressions were the last missing thing, adding them demands a bit more thought than one might expect (what kind of expressions do we support, do we need to care about ReDoS or is that handled by the engines themselves, etc.) * Directly publish all the information required to answer all possible requests. Let the clients do whatever they want. Currently, we only provide package names but in the future, this could be extended to a more complete database like the one pacman uses. I am not saying that the second option is the holy grail. For a simple web application that retrieves information on a package or for a single basic package search, downloading everything might be overkill. That is why I suggest to keep the very basic RPC interface we have right now and, additionally, provide package databases for fancier applications. I am not set on this idea yet. It just seems like the most natural and Arch way of handling this kind of things. I am open for discussion!
Or, change the storage for the name list such that updates can be fast. Turns out, you already have such a thing, you'd just need an index on the Packages and PackageBases tables.
Those indices are there already. Dropping the package list cache completely might be an option (got to investigate the performance impact).
I'm not understanding why any of this is considered a good direction for the API. What are the reasons for wanting the whole list of package names on the client side? Are there use cases other than search?
Search could be extended in many ways, especially now that we have useful meta data. One could build full dependency trees of the AUR add proper support for package groups, just to name two examples. Regards, Lukas
On Sat, Apr 30, 2016 at 12:08:07AM +0200, Lukas Fleischer wrote:
On Fri, 29 Apr 2016 at 13:56:58, Dave Reisner wrote:
What are your reasons for not wanting to extend the RPC interface? Splitting the API in this way is cumbersome for clients. We now are supposed to download and maintain data which is essentially an index (which ought to be internal to the server to allow strong consistency). You're also asking clients of a read-only interface to become stateful, which isn't something that really interests me.
Isn't that exactly the way pacman works, though? It downloads a copy of the database locally and uses that database to answer requests and obtain package information. My vision is that, optimally, the official repositories and the AUR build upon the same basic concept. Apart from binary vs. source packages, the only real difference between the official repositories and the AUR is the amount of packages but if we figure out a good way to solve point (2) from my initial email, that should be an non-issue.
Hrmm, I don't know that this is an equal comparison. Here's my perception of the current world: pacman relies on distribution of the *entire* DB to mirrors around the world. Due to the tiered mirror system, you can basically only rely on eventual consistency of tier N>0 with tier 0, but the DBs at any given point in time should be consistent with themselves (i.e. assuming they're well-behaved, they won't advertise packages which they don't have). In addition to the sync tarballs, pacman relies on a local database which it mutates as packages are installed, upgraded, and removed. pacman has reduced functionality when it has no reachable mirror -- it's still capable of removing packages, modifying the local DB (to adjust install reasons), and install packages which are present in a file cache. In contrast, the AUR currently only offers an API to support adhoc queries. There are no mirrors, and the RPC interface offers strong consistency with the contents of the AUR. I think we can agree that in the current form, packages.gz and pkgbases.gz files aren't very useful as they tend to lag too far behind reality. AUR clients currently have a hard dependency on the network. If they cannot reach the AUR, they cannot do anything useful. Your proposal to make the pkgname/pkgbase tarballs more closely consistent doesn't change the network dependency. All it seems to do is offload the ability to perform more precise searching to the client, *if* they choose to implement it. I'm suggesting that the server should do this, such that we have a single implementation which *everyone* can take advantage of. Not just clients of the RPC interface, but the web UI as well. I might go so far as to say that we should try and find ways to *remove* packages.gz and pkgbases.gz, and devise better solutions to the problems people want to solve with these files.
Apart from that, there are two general directions we can go:
* Do everything on the server. Keep extending the server for every feature that is needed by some client. What happens if a user only wants to know the number of packages matched by a given expression? Do we really want to force her to fetch the whole list of matched packages, just to obtain its size, or do we add another request type? And even if regular expressions were the last missing thing, adding them demands a bit more thought than one might expect (what kind of expressions do we support, do we need to care about ReDoS or is that handled by the engines themselves, etc.)
Agreed. Regular expressions aren't necessarily what we want to end up with. As an alternative, prefix and suffix matching would be substantially cheaper, less prone to abuse/dos, and would probably fulfill the needs of most people. If you wanted to offer the ability to return just the size of the resultset for some advanced search method, you could add another parameter to the current search interface which would elide the 'results' list in the reponse JSON. You already have a 'resultcount' field with the size.
* Directly publish all the information required to answer all possible requests. Let the clients do whatever they want. Currently, we only provide package names but in the future, this could be extended to a more complete database like the one pacman uses.
This has the same problems as the current gz files -- you can only offer eventual consistency. It also only scales well if you can distribute the load in the same way that pacman does with a tiered mirror system. This comes with a non-zero maintenance cost.
I am not saying that the second option is the holy grail. For a simple web application that retrieves information on a package or for a single basic package search, downloading everything might be overkill. That is why I suggest to keep the very basic RPC interface we have right now and, additionally, provide package databases for fancier applications.
I am not set on this idea yet. It just seems like the most natural and Arch way of handling this kind of things. I am open for discussion!
Or, change the storage for the name list such that updates can be fast. Turns out, you already have such a thing, you'd just need an index on the Packages and PackageBases tables.
Those indices are there already. Dropping the package list cache completely might be an option (got to investigate the performance impact).
Serving the package list from the index would essentially be a full table scan -- I don't think it's going to go well.
I'm not understanding why any of this is considered a good direction for the API. What are the reasons for wanting the whole list of package names on the client side? Are there use cases other than search?
Search could be extended in many ways, especially now that we have useful meta data. One could build full dependency trees of the AUR add proper support for package groups, just to name two examples.
I agree. These are useful ideas.
Regards, Lukas
On Sat, 30 Apr 2016 at 18:16:54, Dave Reisner wrote:
Hrmm, I don't know that this is an equal comparison. Here's my perception of the current world:
pacman relies on distribution of the *entire* DB to mirrors around the world. Due to the tiered mirror system, you can basically only rely on eventual consistency of tier N>0 with tier 0, but the DBs at any given point in time should be consistent with themselves (i.e. assuming they're well-behaved, they won't advertise packages which they don't have). In addition to the sync tarballs, pacman relies on a local database which it mutates as packages are installed, upgraded, and removed.
Yeah, as I mentioned further below, providing the full database might be a long-term goal. Are the mirrors part of the basic concept behind pacman? I always had the impression they primarily exist to improve download performance. One could also distribute the database among servers all over the world and allow clients to perform remote procedure calls on each of them. We could introduce those mirrors to the AUR as well, independent of whether the database is transferred to the clients or not.
pacman has reduced functionality when it has no reachable mirror -- it's still capable of removing packages, modifying the local DB (to adjust install reasons), and install packages which are present in a file cache.
I am not sure I follow. How are orthogonal features relevant to the discussion of whether the sync DB should be copied to the clients or accessed via requests to a server? It really shouldn't matter which additional operations on other objects are supported (and even if it does, there are clients like yaourt which provide a similar interface). The only thing I can think of in this context is that due to copying the database, one can perform queries on the sync database while being offline (i.e. if you want to find out the name of a package you cannot remember using -Ss). The AUR would benefit from that as well.
In contrast, the AUR currently only offers an API to support adhoc queries. There are no mirrors, and the RPC interface offers strong consistency with the contents of the AUR. I think we can agree that in the current form, packages.gz and pkgbases.gz files aren't very useful as they tend to lag too far behind reality.
Of course. Which is why I started this thread (even though, actually, I do not think it is *too* bad to lag an hour or two behind; it should not matter in 99.9% of the use cases).
AUR clients currently have a hard dependency on the network. If they cannot reach the AUR, they cannot do anything useful.
Yeah, again, the same would apply to pacman if we split the sync operations into a separate utility as we do in the case of the AUR. Conversely, there is yaourt providing the pacman interface and there are other AUR helpers that cannot download a source package but still build/install a downloaded package when you are offline. I might be missing something...
Your proposal to make the pkgname/pkgbase tarballs more closely consistent doesn't change the network dependency. All it seems to do is offload the ability to perform more precise searching to the client, *if* they choose to implement it. I'm suggesting that the server should do this, such that we have a single implementation which *everyone* can take advantage of. Not just clients of the RPC interface, but the web UI as well.
Having the web UI make extensive use of the RPC interface is a good argument against moving towards my suggestions indeed. However, that would mean we fundamentally change the principles aurweb is currently built upon. Everything should work without any annoyances with JS disabled and using only a text-mode browser. Maybe that is too old-fashioned thinking; maybe even among Arch users, only few users need support for that and everyone else might benefit from a more "modern" interface...
Agreed. Regular expressions aren't necessarily what we want to end up with. As an alternative, prefix and suffix matching would be substantially cheaper, less prone to abuse/dos, and would probably fulfill the needs of most people.
True. But then again, there are some use cases for regular expressions that are not covered by matching prefixes and suffixes and we might, again, have people requesting support for them (we had people explicitly requesting regular expression support a couple of times). If we decide to reject those requests and tell people "Hey, you cannot do that on the AUR.", this simply means that they will revive their web scrapers and build their own package name databases based on the web pages, as they did before we introduced packages.gz. I even did that myself to build the database for aurdupes (which is another good example that requires the full set of names to be available locally) before packages.gz was there. This is Arch after all, and our users become creative when they are not given the interface they want.
If you wanted to offer the ability to return just the size of the resultset for some advanced search method, you could add another parameter to the current search interface which would elide the 'results' list in the reponse JSON. You already have a 'resultcount' field with the size.
That is what I meant. We need to change the interface for every feature request that pops up.
* Directly publish all the information required to answer all possible requests. Let the clients do whatever they want. Currently, we only provide package names but in the future, this could be extended to a more complete database like the one pacman uses.
This has the same problems as the current gz files -- you can only offer eventual consistency. It also only scales well if you can distribute the load in the same way that pacman does with a tiered mirror system. This comes with a non-zero maintenance cost.
True. If it turns out that one server is not sufficient, mirrors need to be added. However, since we already have all the infrastructure, I expect the extra maintenance cost to be rather small. I also think that adding something like zsync support can reduce traffic by orders of magnitude. This is also something pacman/libalpm itself could benefit from. Regards, Lukas
On Mon, 02 May 2016 at 08:53:35, Lukas Fleischer wrote:
In contrast, the AUR currently only offers an API to support adhoc queries. There are no mirrors, and the RPC interface offers strong consistency with the contents of the AUR. I think we can agree that in the current form, packages.gz and pkgbases.gz files aren't very useful as they tend to lag too far behind reality.
Of course. Which is why I started this thread (even though, actually, I do not think it is *too* bad to lag an hour or two behind; it should not matter in 99.9% of the use cases).
Note that I decreased the update interval to five minutes for now which should make the list at least a bit more useful.
On Mon, May 02, 2016 at 08:53:35AM +0200, Lukas Fleischer wrote: I haven't really meant to drop this, it's just been a busy month for me. I'm about to be travelling for work for the next week, too.
On Sat, 30 Apr 2016 at 18:16:54, Dave Reisner wrote:
Hrmm, I don't know that this is an equal comparison. Here's my perception of the current world:
pacman relies on distribution of the *entire* DB to mirrors around the world. Due to the tiered mirror system, you can basically only rely on eventual consistency of tier N>0 with tier 0, but the DBs at any given point in time should be consistent with themselves (i.e. assuming they're well-behaved, they won't advertise packages which they don't have). In addition to the sync tarballs, pacman relies on a local database which it mutates as packages are installed, upgraded, and removed.
Yeah, as I mentioned further below, providing the full database might be a long-term goal.
Are the mirrors part of the basic concept behind pacman? I always had the impression they primarily exist to improve download performance. One could also distribute the database among servers all over the world and allow clients to perform remote procedure calls on each of them. We could introduce those mirrors to the AUR as well, independent of whether the database is transferred to the clients or not.
I wouldn't consider the mirrors to be a "basic concept" since the mirrorlist is strictly an Arch Linux provided thing. They act as a best-effort load balancing strategy since we don't do anything intelligent with redirects and simply rely on people to pick a mirror which fulfills some value of "performs well" for their purposes. They also offer a level of redundancy -- we clearly don't suffer global outages when mirrors are unavailable.
pacman has reduced functionality when it has no reachable mirror -- it's still capable of removing packages, modifying the local DB (to adjust install reasons), and install packages which are present in a file cache.
I am not sure I follow. How are orthogonal features relevant to the discussion of whether the sync DB should be copied to the clients or accessed via requests to a server? It really shouldn't matter which additional operations on other objects are supported (and even if it does, there are clients like yaourt which provide a similar interface). The only thing I can think of in this context is that due to copying the database, one can perform queries on the sync database while being offline (i.e. if you want to find out the name of a package you cannot remember using -Ss). The AUR would benefit from that as well.
yaourt's (and other similar helpers) capacity to do things outside of the AUR is encumbent upon pacman itself. I understand your suggestion about -Ss being an "offline" operation, but in the current form of the pkgnames tarball, you'd be getting something more similar to 'pacman -Slq | grep "$1"'. pacman -Ss would return results based on substring matches in descriptions, not just names. Sync'ing the entire AUR DB to clients would allow -Ss as well as richer queries like -Si.
In contrast, the AUR currently only offers an API to support adhoc queries. There are no mirrors, and the RPC interface offers strong consistency with the contents of the AUR. I think we can agree that in the current form, packages.gz and pkgbases.gz files aren't very useful as they tend to lag too far behind reality.
Of course. Which is why I started this thread (even though, actually, I do not think it is *too* bad to lag an hour or two behind; it should not matter in 99.9% of the use cases).
AUR clients currently have a hard dependency on the network. If they cannot reach the AUR, they cannot do anything useful.
Yeah, again, the same would apply to pacman if we split the sync operations into a separate utility as we do in the case of the AUR. Conversely, there is yaourt providing the pacman interface and there are other AUR helpers that cannot download a source package but still build/install a downloaded package when you are offline.
I might be missing something...
Consider the current API offered by the AUR -- a query interface and some links to tarballs you can download. In contrast, pacman is a full-featured package manager which not only allows queries and downloading of tarballs, but also manipulates your filesystem.
Your proposal to make the pkgname/pkgbase tarballs more closely consistent doesn't change the network dependency. All it seems to do is offload the ability to perform more precise searching to the client, *if* they choose to implement it. I'm suggesting that the server should do this, such that we have a single implementation which *everyone* can take advantage of. Not just clients of the RPC interface, but the web UI as well.
Having the web UI make extensive use of the RPC interface is a good argument against moving towards my suggestions indeed. However, that would mean we fundamentally change the principles aurweb is currently built upon. Everything should work without any annoyances with JS disabled and using only a text-mode browser. Maybe that is too old-fashioned thinking; maybe even among Arch users, only few users need support for that and everyone else might benefit from a more "modern" interface...
Why would you need javascript? All I'm suggesting is that aurweb issues GET requests against itself and renders the returned JSON in some meaningful way. Can't that be done in PHP?
Agreed. Regular expressions aren't necessarily what we want to end up with. As an alternative, prefix and suffix matching would be substantially cheaper, less prone to abuse/dos, and would probably fulfill the needs of most people.
True. But then again, there are some use cases for regular expressions that are not covered by matching prefixes and suffixes and we might, again, have people requesting support for them (we had people explicitly requesting regular expression support a couple of times). If we decide to reject those requests and tell people "Hey, you cannot do that on the AUR.", this simply means that they will revive their web scrapers and build their own package name databases based on the web pages, as they did before we introduced packages.gz. I even did that myself to build the database for aurdupes (which is another good example that requires the full set of names to be available locally) before packages.gz was there. This is Arch after all, and our users become creative when they are not given the interface they want.
To be clear, I'm not opposed to adding support for regex. If you're in favor of it, I have to wonder why it hasn't been added yet...
If you wanted to offer the ability to return just the size of the resultset for some advanced search method, you could add another parameter to the current search interface which would elide the 'results' list in the reponse JSON. You already have a 'resultcount' field with the size.
That is what I meant. We need to change the interface for every feature request that pops up.
* Directly publish all the information required to answer all possible requests. Let the clients do whatever they want. Currently, we only provide package names but in the future, this could be extended to a more complete database like the one pacman uses.
This has the same problems as the current gz files -- you can only offer eventual consistency. It also only scales well if you can distribute the load in the same way that pacman does with a tiered mirror system. This comes with a non-zero maintenance cost.
True. If it turns out that one server is not sufficient, mirrors need to be added. However, since we already have all the infrastructure, I expect the extra maintenance cost to be rather small. I also think that adding something like zsync support can reduce traffic by orders of magnitude. This is also something pacman/libalpm itself could benefit from.
Regards, Lukas
On Wed, 18 May 2016 at 14:58:46, Dave Reisner wrote:
On Mon, May 02, 2016 at 08:53:35AM +0200, Lukas Fleischer wrote: [...]
On Sat, 30 Apr 2016 at 18:16:54, Dave Reisner wrote:
Hrmm, I don't know that this is an equal comparison. Here's my perception of the current world:
pacman relies on distribution of the *entire* DB to mirrors around the world. Due to the tiered mirror system, you can basically only rely on eventual consistency of tier N>0 with tier 0, but the DBs at any given point in time should be consistent with themselves (i.e. assuming they're well-behaved, they won't advertise packages which they don't have). In addition to the sync tarballs, pacman relies on a local database which it mutates as packages are installed, upgraded, and removed.
Yeah, as I mentioned further below, providing the full database might be a long-term goal.
Are the mirrors part of the basic concept behind pacman? I always had the impression they primarily exist to improve download performance. One could also distribute the database among servers all over the world and allow clients to perform remote procedure calls on each of them. We could introduce those mirrors to the AUR as well, independent of whether the database is transferred to the clients or not.
I wouldn't consider the mirrors to be a "basic concept" since the mirrorlist is strictly an Arch Linux provided thing. They act as a best-effort load balancing strategy since we don't do anything intelligent with redirects and simply rely on people to pick a mirror which fulfills some value of "performs well" for their purposes. They also offer a level of redundancy -- we clearly don't suffer global outages when mirrors are unavailable.
Okay, I only wondered why you brought the mirrors up in this discussion. Both the RPC approach and the full database replication approach work with mirrors, right? So I think it is fine to ignore them when discussing the pros and cons of those concepts.
pacman has reduced functionality when it has no reachable mirror -- it's still capable of removing packages, modifying the local DB (to adjust install reasons), and install packages which are present in a file cache.
I am not sure I follow. How are orthogonal features relevant to the discussion of whether the sync DB should be copied to the clients or accessed via requests to a server? It really shouldn't matter which additional operations on other objects are supported (and even if it does, there are clients like yaourt which provide a similar interface). The only thing I can think of in this context is that due to copying the database, one can perform queries on the sync database while being offline (i.e. if you want to find out the name of a package you cannot remember using -Ss). The AUR would benefit from that as well.
yaourt's (and other similar helpers) capacity to do things outside of the AUR is encumbent upon pacman itself. I understand your suggestion about -Ss being an "offline" operation, but in the current form of the pkgnames tarball, you'd be getting something more similar to 'pacman -Slq | grep "$1"'. pacman -Ss would return results based on substring matches in descriptions, not just names. Sync'ing the entire AUR DB to clients would allow -Ss as well as richer queries like -Si. [...] Consider the current API offered by the AUR -- a query interface and some links to tarballs you can download. In contrast, pacman is a full-featured package manager which not only allows queries and downloading of tarballs, but also manipulates your filesystem.
yaourt also has support for pulling PKGBUILDs from the ABS which is something that does not directly depend on pacman. But again, I still do not see how any of the orthogonal functionality a tool provides is relevant to this discussion. And, as I wrote earlier, extending the package name list to a full database is what we ultimately aim for in the approach I suggested.
Why would you need javascript? All I'm suggesting is that aurweb issues GET requests against itself and renders the returned JSON in some meaningful way. Can't that be done in PHP?
Sure, that can be done. But I do not see how that would be useful. The RPC interface and the web page backend should use the same library functions internally, sure. But why should we make huge efforts to replace regular function calls with HTTP GET requests and encode results in JSON, only to immediately decode them on the same machine afterwards? Do we plan on splitting the RPC server and the website backend?
Agreed. Regular expressions aren't necessarily what we want to end up with. As an alternative, prefix and suffix matching would be substantially cheaper, less prone to abuse/dos, and would probably fulfill the needs of most people.
True. But then again, there are some use cases for regular expressions that are not covered by matching prefixes and suffixes and we might, again, have people requesting support for them (we had people explicitly requesting regular expression support a couple of times). If we decide to reject those requests and tell people "Hey, you cannot do that on the AUR.", this simply means that they will revive their web scrapers and build their own package name databases based on the web pages, as they did before we introduced packages.gz. I even did that myself to build the database for aurdupes (which is another good example that requires the full set of names to be available locally) before packages.gz was there. This is Arch after all, and our users become creative when they are not given the interface they want.
To be clear, I'm not opposed to adding support for regex. If you're in favor of it, I have to wonder why it hasn't been added yet...
I am not in favor of adding regular expressions on the server side. I mentioned some of the reasons in an earlier reply and you agreed. However, instead of saying that we need something less powerful (like prefix and suffix matching) on the server side, I think that adding support for regular expressions (and maybe even more powerful things) on the client side is the way to go. Lukas
Lukas Fleischer wrote:
I am not in favor of adding regular expressions on the server side. I mentioned some of the reasons in an earlier reply and you agreed. However, instead of saying that we need something less powerful (like prefix and suffix matching) on the server side, I think that adding support for regular expressions (and maybe even more powerful things) on the client side is the way to go.
Lukas
Hi ho, I'm nearing the end of a big rewrite of powerpill/bauerbill that provides a completely generic interface for extending existing binary repos with build support as well as adding build-only repos such as the AUR. While working on interfaces to generalize Pacman's various sync functionality (-Ss, -Si, -Sg, -Sp, etc.) I have had to provided reduced information for operations that query the full list of packages in the AUR due to limitations of the pkglist (e.g. no versions in -Sl). I have also had to cobble together lazy lookups to get all necessary build information given that the current AUR RPC interface does not provide all fields in the SRCINFO (when I last checked). Having access to a full offline database of SRCINFO would be *very* useful, especially if it can be downloaded incrementally (e.g. with zsync as suggested). Providing the SRCINFO directly without any additional formatting provides a de-facto (official?) standard that is reliable and directly tied to pacman/makepkg. AUR-specific information could be added to a second file (either in JSON format or one that follows SRCINFO). Besides the AUR ID/URL, votes, etc, you could also include the hash of the commit that was used to generate the database entry. This would prevent errors due to the package on the server leading the package in the downloaded database. How big would such a database be? How much overhead is involved? How much overhead would the zsync operation introduce when people begin to regularly query updates? Regards, Xyne
On Tue, 24 May 2016 at 23:52:49, Xyne wrote:
How big would such a database be? How much overhead is involved? How much overhead would the zsync operation introduce when people begin to regularly query updates?
Concatenating the .SRCINFO metadata of all AUR packages yields a file which is roughly 20MiB in size (4MiB compressed). There are also plans to add source package support to pacman. I do not know yet whether we should use that for the AUR, especially since it probably will not have support for examining PKGBUILDs before building, but maybe we can at least use the underlying sync database schema. Regards, Lukas
participants (3)
-
Dave Reisner
-
Lukas Fleischer
-
Xyne