Re: [aur-dev] [PATCH 3/3] Segment the upload directory by package name prefix

29 Jul 2011


      On Fri, Jul 29, 2011 at 03:50:44PM -0500, Dan McGee wrote:
...
On Fri, Jul 29, 2011 at 3:32 PM, Lukas Fleischer
<archlinux@cryptocrack.de> wrote:
...
On Thu, Jul 28, 2011 at 01:59:07PM -0500, Dan McGee wrote:
...
This implements the following scheme:
* /packages/cower/ --> /packages/co/cower/
* /packages/j/     --> /packages/j/j/
* /packages/zqy/   --> /packages/zq/y/
I hope there's a typo in the last example, otherwise I must have
misunderstood something :)
Yes, typo.
You might want to amend this when resubmitting the patch (details
follow) :)
...
...
...
We take up to the first two characters of each package name as a
intermediate subdirectory, and then the full package name lives
underneath that.
Why, you ask? Well because earlier today the AUR hit 32,000 entries in
the unsupported/ directory, making new package uploads impossible. While
some might argue we shouldn't have so many damn packages in the repos,
we should be able to handle this case.
Why two characters instead of one? Our two biggest two-char groups, 'pe'
and 'py', both start with 'p', and have nearly 2000 packages each. Go
Python and Perl.
Time to move to ext4, eh? No, seriously: Something tells me that we
should neither be filesystem dependant nor depend on the current
distribution of package names. Using some better hash algorithm might
fix the second problem while reducing predictability for the end user.
We wouldn't be the first to use a scheme like this, so it felt like
the right choice. See for example:
http://pypi.python.org/packages/source/D/Django/Django-1.3.tar.gz (one
letter only)
http://search.cpan.org/CPAN/authors/id/S/SR/SRI/Mojolicious-1.68.tar.gz
(segmented by author, multiple levels)
Yeah, I've seen this before.
...
Ext4 came up as an option yesterday, but its best to prevent one from
shooting themselves in the foot like this, and this seems like the
more proper solution. I do share some thoughts that we shouldn't
depend on a certain distribution of package names, but reducing
predictability seemed like a big enough downfall that I didn't want to
go that way, and moving to this scheme greatly increases the time
before we'd hit any problems. If anyone has predictable but scalable
solutions I'm more than open to hearing them. Of course, we do provide
the URL in the JSON request for a reason.
Well, one predictable and scalable solution would be to split after
every character or after every two characters and create nested
directories (as we only allow a subset of all possible file names, that
would still result in less than ~32000 subdirectories per directory).
This would result in a very inscrutable directory structure tho.

As I mentioned before, I'm fine with using the 2-character prefix for
now. It feels like the best compromise. We can think about more
individualized solutions later.
...
...
Given that we will probably run into the same again soon (according to
current statistics, there are about two months left), I will apply this
temporary workaround and prepare a release soon, though.
Yeah, we were able to free ~1000 "spots" or so, so we have breathing
room, but not loads of time. I did this via the cleanup script if that
wasn't obvious, realize I didn't really say that anywhere.
That was kind of obvious, yeah :) Especially since you submitted the
cleanup script patch as well.
...
...
...
Still needed is a "move the existing data" script, as well as a set of
rewrite rules for those wishing to preserve backward compatible URLs for
any helper programs doing the wrong thing and relying on them.
If we provide backward compatible URLs, why not keep them as default? I
doubt this will affect performance...
It wouldn't- I was more concerned with keeping the AUR deployable as
easily as possible, and not tied to a specific webserver and it's
hairy configuration of rewrite rules. Making them optional prevents us
from having to muck with things at that level.
Ack.
...
...
...
Signed-off-by: Dan McGee <dan@archlinux.org>
---
 scripts/cleanup              |   24 ++++++++++++++++--------
 web/html/pkgsubmit.php       |    2 +-
 web/lib/aurjson.class.php    |    2 +-
 web/template/pkg_details.php |    2 +-
 4 files changed, 19 insertions(+), 11 deletions(-)