[aur-general] new aur mirror
Hi all. I am now mirroring the AUR continuously, and am starting to open it up to public access. It is not a perfect byte-for-byte clone, as I felt like adding a few features :-) For now, there is an enhanced RPC database. It's got everything from the original RPC calls, plus some extra fields that people always seem to want. Access it through http://aur.kmkeen.com/rpc/name_of_package Yes, it is 100% static. This was the only way my little VPS could keep up with the beastly Sigurd machine. I've also got all the tarballs mirrored, but forgot to add the links to the RPC info (mirror/pkgname/pkgname.tar.gz). Will fix that shortly. There are also search indexes prepared for description, depends and required-by but the regex search is not open to the world yet. Maybe after I get some sleep. Much thanks goes to Falconindy for ideas, encouragement and patience. I am currently taking feature requests. Go wild. -Kyle http://kmkeen.com
On Thu, Jan 27, 2011 at 1:23 PM, keenerd <keenerd@gmail.com> wrote:
Hi all. I am now mirroring the AUR continuously, and am starting to open it up to public access. It is not a perfect byte-for-byte clone, as I felt like adding a few features :-)
For now, there is an enhanced RPC database. It's got everything from the original RPC calls, plus some extra fields that people always seem to want. Access it through http://aur.kmkeen.com/rpc/name_of_package Yes, it is 100% static. This was the only way my little VPS could keep up with the beastly Sigurd machine.
I've also got all the tarballs mirrored, but forgot to add the links to the RPC info (mirror/pkgname/pkgname.tar.gz). Will fix that shortly. There are also search indexes prepared for description, depends and required-by but the regex search is not open to the world yet. Maybe after I get some sleep.
Much thanks goes to Falconindy for ideas, encouragement and patience.
I am currently taking feature requests. Go wild.
-Kyle http://kmkeen.com
Is your mirroring complete ? It seems there is some packages that are not mirrored (xbmc-git, xmobar-git, perl-config-properties, perl-rpsl-parser, ...). I've only tested the RPC not the tarballs. -- Cédric Girard
On 1/27/11, Cédric Girard <girard.cedric@gmail.com> wrote:
Is your mirroring complete ? It seems there is some packages that are not mirrored (xbmc-git, xmobar-git, perl-config-properties, perl-rpsl-parser, ...). I've only tested the RPC not the tarballs.
Most strange. It thinks those packages have been deleted from the AUR. Looking at the logs, it seems to think everything from the AUR has been deleted. Oh bother. Erm, ignore that previous announcement (the bit involving things working correctly) while this gets fixed. -Kyle http://kmkeen.com
On Thu, 2011-01-27 at 07:23 -0500, keenerd wrote:
Hi all. I am now mirroring the AUR continuously, and am starting to open it up to public access. It is not a perfect byte-for-byte clone, as I felt like adding a few features :-)
<snip>
I am currently taking feature requests. Go wild.
I'd like two cups of world domination to go.
-Kyle http://kmkeen.com
On Thu, Jan 27, 2011 at 1:23 PM, keenerd <keenerd@gmail.com> wrote:
Hi all. I am now mirroring the AUR continuously, and am starting to open it up to public access. It is not a perfect byte-for-byte clone, as I felt like adding a few features :-)
For now, there is an enhanced RPC database. It's got everything from the original RPC calls, plus some extra fields that people always seem to want. Access it through http://aur.kmkeen.com/rpc/name_of_package Yes, it is 100% static. This was the only way my little VPS could keep up with the beastly Sigurd machine.
I've also got all the tarballs mirrored, but forgot to add the links to the RPC info (mirror/pkgname/pkgname.tar.gz). Will fix that shortly. There are also search indexes prepared for description, depends and required-by but the regex search is not open to the world yet. Maybe after I get some sleep.
Much thanks goes to Falconindy for ideas, encouragement and patience.
I am currently taking feature requests. Go wild.
Nice. I'm always glad to see people working on different ways to access the AUR. It seems like there are a few people that are interested in mirroring the AUR. I've been thinking about how we could improve it for a long time, but recently I thought it might be a good idea to just give some resourceful users convenient access to the data so they can implement novel ways of managing and distributing the packages. I'd like to test a theory that one reason we haven't seen much development is because all the data is held hostage on the AUR server. Also, do you have an scm repo with the code you've used to implement your interface? Thanks.
A few more kinks have been worked out, and aur.kmkeen.com is fully synced once more. New features: A bundle of all pkgbuilds (updated daily) at http://aur.kmkeen.com/all_pkgbuilds.tar.xz It is around 5 MB. A regex search. Names only (descriptions later). In the spirit of Arch, netcat is the only supported client.
echo 'names .*pac.*' | nc aur.kmkeen.com 1819 It uses Python regex.
On 1/27/11, Loui Chang <louipc.ist@gmail.com> wrote:
I thought it might be a good idea to just give some resourceful users convenient access to the data
Well, there is no filtering by user agent, so that is pretty convenient already. You can easily download everything except for who voted for what. There are two main groups of people interested in mirrors. First group represents probably the majority* of the interest, and they are easy to appease. They want ABS for the AUR, possibly for bulk static analysis. Could be something as trivial as counting the number of "return 1" or as heavy as testing a new pkgbuild parser against the scariest pkgbuilds known to man. The all_pkgbuilds bundle is meant for them. * Sample size three. There is not much interest. The other group of people (sample size two) will never be happy. They want a mirror, and they will never be happy because there will always be lag behind the original. Right now I am mirroring by hitting the RSS every minute. The RSS window is tiny, and I've seen a single person swamp it with a bulk update. Stuff like deletions or comments are hard to get and require a brute force scan. Mine loops through everything each 24 hours. If I kill all the delays and multithread it, a clone can be hammered out in 30 minutes. Brute force scanning kind of sucks. Figuring out if a package has been deleted is the most complicated bit of logic in my crawler, and I am not 100% certain it works properly.
I'd like to test a theory that one reason we haven't seen much development is because all the data is held hostage on the AUR server.
The AUR is hardly a walled garden. If you want the data, you can get it with minimal effort. (Two lines of bash for the download.) I am more likely to credit apathy or inertia.
Also, do you have an scm repo with the code you've used to implement your interface? Thanks.
Yes, but not public. I've done enough horrible things to the AUR, an accidental DDoS is the last thing I need on my shoulders. -Kyle http://kmkeen.com
On Fri, Jan 28, 2011 at 04:00:13PM -0500, keenerd wrote:
A few more kinks have been worked out, and aur.kmkeen.com is fully synced once more. New features:
A bundle of all pkgbuilds (updated daily) at http://aur.kmkeen.com/all_pkgbuilds.tar.xz It is around 5 MB.
A regex search. Names only (descriptions later). In the spirit of Arch, netcat is the only supported client.
echo 'names .*pac.*' | nc aur.kmkeen.com 1819 It uses Python regex.
netcat? Too heavy. Bash is my raw socket client. $ exec 3<>/dev/tcp/aur.kmkeen.com/1819 $ echo -e 'names cower.*' >&3 $ cat <&3 cower cower-git dave
Regex searching of descriptions is up. Hit it with
echo 'd .*pacman.*' | nc aur.kmkeen.com 1819 Similarly, names can be searched with just "n".
I am not pleased with the performance. Description queries are 10x slower than names, but whatever. -Kyle http://kmkeen.com
Comments were added last week, forgot to post that to the ML. Get them in JSON form at aur3.org/mirror/pkgname/comments.gz The new thing is uploads to AUR3. I was stuck on a few parts, like how to split a .sig from a .gpg (for single file uploads) and then after than how to PUT more than one file at a time. Turns out neither are possible, so I hacked up something really ugly with CGI. But except for the one little wart of a multifile POST, all the beauty of signed packages is there. If a package has a sig, its repo (in the rpc) is marked as aur3 and there will be a file at aur3.org/mirror/pkgname/pkgname.tar.gz.sig. But as I have not set up a keyserver yet, sigs are mostly for novelty value. If you want to fool around with signing your own packages, download the aur3 reference client (now in bash!) and email me your pubkey. Not the most secure means of establishing trust, but hey. Arch finally has signed (source) packages. More on http://aur3.org -Kyle http://kmkeen.com
participants (5)
-
Cédric Girard
-
Dave Reisner
-
keenerd
-
Loui Chang
-
Ng Oon-Ee