A few more kinks have been worked out, and aur.kmkeen.com is fully synced once more. New features: A bundle of all pkgbuilds (updated daily) at http://aur.kmkeen.com/all_pkgbuilds.tar.xz It is around 5 MB. A regex search. Names only (descriptions later). In the spirit of Arch, netcat is the only supported client.
echo 'names .*pac.*' | nc aur.kmkeen.com 1819 It uses Python regex.
On 1/27/11, Loui Chang <louipc.ist@gmail.com> wrote:
I thought it might be a good idea to just give some resourceful users convenient access to the data
Well, there is no filtering by user agent, so that is pretty convenient already. You can easily download everything except for who voted for what. There are two main groups of people interested in mirrors. First group represents probably the majority* of the interest, and they are easy to appease. They want ABS for the AUR, possibly for bulk static analysis. Could be something as trivial as counting the number of "return 1" or as heavy as testing a new pkgbuild parser against the scariest pkgbuilds known to man. The all_pkgbuilds bundle is meant for them. * Sample size three. There is not much interest. The other group of people (sample size two) will never be happy. They want a mirror, and they will never be happy because there will always be lag behind the original. Right now I am mirroring by hitting the RSS every minute. The RSS window is tiny, and I've seen a single person swamp it with a bulk update. Stuff like deletions or comments are hard to get and require a brute force scan. Mine loops through everything each 24 hours. If I kill all the delays and multithread it, a clone can be hammered out in 30 minutes. Brute force scanning kind of sucks. Figuring out if a package has been deleted is the most complicated bit of logic in my crawler, and I am not 100% certain it works properly.
I'd like to test a theory that one reason we haven't seen much development is because all the data is held hostage on the AUR server.
The AUR is hardly a walled garden. If you want the data, you can get it with minimal effort. (Two lines of bash for the download.) I am more likely to credit apathy or inertia.
Also, do you have an scm repo with the code you've used to implement your interface? Thanks.
Yes, but not public. I've done enough horrible things to the AUR, an accidental DDoS is the last thing I need on my shoulders. -Kyle http://kmkeen.com