[aur-dev] Making the AUR package list more useful
Recently, there were a couple of feature requests to make the AUR package search more powerful. While I do not plan on adding more patterns or regular expressions to the RPC interface itself, my idea is that more tools should be using the package name list. However, there seem to be two issues with that: 1. The list is outdated. Right now, it is updated every two hours. I do not think there is a good reason for those long intervals. Reducing it to, say, ten minutes should be totally fine. Or maybe even trigger list generation whenever a package is created or deleted (which is clearly a lot more work, though). Thoughts? 2. Transferring the whole package name list is inefficient. Even if we use gzip compression here, the whole list is several hundreds of kilobytes large. We need to retransfer the full list, even if only a single package is added. Maybe we can do better than pacman here. My idea is to add zsync support to the lists such that only relevant parts are downloaded (for those who do not know: zsync is like rsync but it works via HTTP as well and does not require any special software on the server side). I did not experiment with how much bandwidth we can actually save using this yet. Maybe the block size needs to be adjusted. Are there any opinions or other suggestions on this topic? Regards, Lukas
Hey, On 29/04, Lukas Fleischer wrote:
1. The list is outdated. Right now, it is updated every two hours. I do not think there is a good reason for those long intervals. Reducing it to, say, ten minutes should be totally fine. Or maybe even trigger list generation whenever a package is created or deleted (which is clearly a lot more work, though). Thoughts?
Generating it more often sounds good to me, and triggering it on create/delete does sound like a good thing to implement eventually, but would probably want to have a task queue thing set up for that, and have workers do the actual generation.
2. Transferring the whole package name list is inefficient. Even if we use gzip compression here, the whole list is several hundreds of kilobytes large. We need to retransfer the full list, even if only a single package is added. Maybe we can do better than pacman here. My idea is to add zsync support to the lists such that only relevant parts are downloaded (for those who do not know: zsync is like rsync but it works via HTTP as well and does not require any special software on the server side). I did not experiment with how much bandwidth we can actually save using this yet. Maybe the block size needs to be adjusted. Are there any opinions or other suggestions on this topic?
Did some testing locally with zsync, and curling the file locally took between 0.005 and 0.010 seconds. A first zsync download takes between 0.011 and 0.016 seconds. A zsync with no changes takes between 0.004 and 0.012 seconds. It's a bit tricky to reliably test a zsync with a changed packages file since I don't have multiple different ones saved down, and modifying them myself will give different results from what the AUR creates, so it'd be hard to get representative results. When I get a few versions of the file generated by the AUR I'll try doing it from my server, and with multiple zsyncs, but I'm not sure if it'll really matter much, since it takes just ~0.4 seconds to download the file from the AUR in the first place. -- Sincerely, Johannes Löthberg PGP Key ID: 0x50FB9B273A9D0BB5 https://theos.kyriasis.com/~kyrias/
On 29.04.2016 13:29, Johannes Löthberg wrote:
Hey,
On 29/04, Lukas Fleischer wrote:
1. The list is outdated. Right now, it is updated every two hours. I do not think there is a good reason for those long intervals. Reducing it to, say, ten minutes should be totally fine. Or maybe even trigger list generation whenever a package is created or deleted (which is clearly a lot more work, though). Thoughts?
Generating it more often sounds good to me, and triggering it on create/delete does sound like a good thing to implement eventually, but would probably want to have a task queue thing set up for that, and have workers do the actual generation.
2. Transferring the whole package name list is inefficient. Even if we use gzip compression here, the whole list is several hundreds of kilobytes large. We need to retransfer the full list, even if only a single package is added. Maybe we can do better than pacman here. My idea is to add zsync support to the lists such that only relevant parts are downloaded (for those who do not know: zsync is like rsync but it works via HTTP as well and does not require any special software on the server side). I did not experiment with how much bandwidth we can actually save using this yet. Maybe the block size needs to be adjusted. Are there any opinions or other suggestions on this topic?
Did some testing locally with zsync, and curling the file locally took between 0.005 and 0.010 seconds. A first zsync download takes between 0.011 and 0.016 seconds. A zsync with no changes takes between 0.004 and 0.012 seconds.
It's a bit tricky to reliably test a zsync with a changed packages file since I don't have multiple different ones saved down, and modifying them myself will give different results from what the AUR creates, so it'd be hard to get representative results.
When I get a few versions of the file generated by the AUR I'll try doing it from my server, and with multiple zsyncs, but I'm not sure if it'll really matter much, since it takes just ~0.4 seconds to download the file from the AUR in the first place.
As a note, differences between downloading a complete version of the package archive, and just the differences, should be mainly noticeable on poor internet connections (dial-up, 3G, etc.)
On Fri, 29 Apr 2016 at 13:29:04, Johannes Löthberg wrote:
Generating it more often sounds good to me, and triggering it on create/delete does sound like a good thing to implement eventually, but would probably want to have a task queue thing set up for that, and have workers do the actual generation.
Agreed.
Did some testing locally with zsync, and curling the file locally took between 0.005 and 0.010 seconds. A first zsync download takes between 0.011 and 0.016 seconds. A zsync with no changes takes between 0.004 and 0.012 seconds.
It's a bit tricky to reliably test a zsync with a changed packages file since I don't have multiple different ones saved down, and modifying them myself will give different results from what the AUR creates, so it'd be hard to get representative results.
When I get a few versions of the file generated by the AUR I'll try doing it from my server, and with multiple zsyncs, but I'm not sure if it'll really matter much, since it takes just ~0.4 seconds to download the file from the AUR in the first place.
Besides measuring download time, it might be interesting to actually measure the amount of data transferred. It might well be that there is not big difference in running times but the zsync version might still require less bandwidth (because it does multiple small requests instead of one huge request/reply). As I said in the previous email, parameters like the block size could also have a large impact on this and we might need to adjust them. Regards, Lukas
On 29/04, Lukas Fleischer wrote:
Besides measuring download time, it might be interesting to actually measure the amount of data transferred.
It might well be that there is not big difference in running times but the zsync version might still require less bandwidth (because it does multiple small requests instead of one huge request/reply). As I said in the previous email, parameters like the block size could also have a large impact on this and we might need to adjust them.
Yeah, that's why I was planning on doing it when I had a couple of them generated by the AUR. -- Sincerely, Johannes Löthberg PGP Key ID: 0x50FB9B273A9D0BB5 https://theos.kyriasis.com/~kyrias/
participants (3)
-
Alad Wenter
-
Johannes Löthberg
-
Lukas Fleischer