[aur-general] new aur mirror

Fri Jan 28 16:00:13 EST 2011

A few more kinks have been worked out, and aur.kmkeen.com is fully
synced once more.  New features:

A bundle of all pkgbuilds (updated daily) at
http://aur.kmkeen.com/all_pkgbuilds.tar.xz
It is around 5 MB.

A regex search.  Names only (descriptions later).  In the spirit of
Arch, netcat is the only supported client.
 > echo 'names .*pac.*' | nc aur.kmkeen.com 1819
It uses Python regex.

On 1/27/11, Loui Chang <louipc.ist at gmail.com> wrote:
> I thought it might be a good idea to just give some
> resourceful users convenient access to the data

Well, there is no filtering by user agent, so that is pretty
convenient already.  You can easily download everything except for who
voted for what.

There are two main groups of people interested in mirrors.  First
group represents probably the majority* of the interest, and they are
easy to appease.  They want ABS for the AUR, possibly for bulk static
analysis.  Could be something as trivial as counting the number of
"return 1" or as heavy as testing a new pkgbuild parser against the
scariest pkgbuilds known to man.  The all_pkgbuilds bundle is meant
for them.

* Sample size three.  There is not much interest.

The other group of people (sample size two) will never be happy.  They
want a mirror, and they will never be happy because there will always
be lag behind the original.  Right now I am mirroring by hitting the
RSS every minute.  The RSS window is tiny, and I've seen a single
person swamp it with a bulk update.  Stuff like deletions or comments
are hard to get and require a brute force scan.  Mine loops through
everything each 24 hours.  If I kill all the delays and multithread
it, a clone can be hammered out in 30 minutes.  Brute force scanning
kind of sucks.  Figuring out if a package has been deleted is the most
complicated bit of logic in my crawler, and I am not 100% certain it
works properly.

> I'd like to test a theory that one reason we haven't seen much
> development is because all the data is held hostage on the AUR server.

The AUR is hardly a walled garden.  If you want the data, you can get
it with minimal effort.  (Two lines of bash for the download.)  I am
more likely to credit apathy or inertia.

> Also, do you have an scm repo with the code you've used to implement
> your interface? Thanks.

Yes, but not public.  I've done enough horrible things to the AUR, an
accidental DDoS is the last thing I need on my shoulders.

-Kyle
http://kmkeen.com