[aur-dev] [PATCH 1/3] scripts/cleanup: use native PHP only
No need to shell out to the system here. Also fix the script so it actually works. Signed-off-by: Dan McGee <dan@archlinux.org> --- scripts/cleanup | 20 +++++++++++--------- 1 files changed, 11 insertions(+), 9 deletions(-) diff --git a/scripts/cleanup b/scripts/cleanup index 4fc9ea2..f287350 100755 --- a/scripts/cleanup +++ b/scripts/cleanup @@ -16,21 +16,23 @@ if (empty($dir)) { } set_include_path(get_include_path() . PATH_SEPARATOR . "$dir/lib"); -include("config.inc"); -include("aur.inc"); -include("pkgfuncs.inc"); - -exec('ls ' . INCOMING_DIR, $files); +include("config.inc.php"); +include("aur.inc.php"); +include("pkgfuncs.inc.php"); $count = 0; +$files = scandir(INCOMING_DIR); foreach ($files as $pkgname) { - if (!package_exists($pkgname)) { - echo 'Removing ' . INCOMING_DIR . "$pkgname\n"; - system('rm -r ' . INCOMING_DIR . $pkgname); + if ($pkgname == '.' || $pkgname == '..') { + continue; + } + $fullpath = INCOMING_DIR . $pkgname; + if (!package_exists($pkgname) && is_dir($fullpath)) { + echo 'Removing ' . $fullpath . "\n"; + rm_tree($fullpath); $count++; } } echo "\nRemoved $count directories.\n"; - -- 1.7.6
We shouldn't require this as it is a new config parameter and it causes PHP warnings to be spewed everywhere. Signed-off-by: Dan McGee <dan@archlinux.org> --- web/lib/aur.inc.php | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/web/lib/aur.inc.php b/web/lib/aur.inc.php index 00a8c8c..55cc8a9 100644 --- a/web/lib/aur.inc.php +++ b/web/lib/aur.inc.php @@ -235,7 +235,7 @@ function db_query($query="", $db_handle="") { die("DB handle was not provided to db_query"); } - if (SQL_DEBUG == 1) { + if (defined('SQL_DEBUG') && SQL_DEBUG == 1) { $bt = debug_backtrace(); error_log("DEBUG: ".$bt[0]['file'].":".$bt[0]['line']." query: $query\n"); } -- 1.7.6
On Thu, Jul 28, 2011 at 01:59:06PM -0500, Dan McGee wrote:
We shouldn't require this as it is a new config parameter and it causes PHP warnings to be spewed everywhere.
Well, "UPGRADING" tells you to merge "web/lib/config.inc.php.proto" with "web/lib/config.inc.php", so you should never see those warnings unless you run a snapshot/Git checkout and forgot to add this. Anyway, this additional check makes our code cleaner and the patch is simple enough to not require any discussion. I'll apply this as-is.
Signed-off-by: Dan McGee <dan@archlinux.org> --- web/lib/aur.inc.php | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/web/lib/aur.inc.php b/web/lib/aur.inc.php index 00a8c8c..55cc8a9 100644 --- a/web/lib/aur.inc.php +++ b/web/lib/aur.inc.php @@ -235,7 +235,7 @@ function db_query($query="", $db_handle="") { die("DB handle was not provided to db_query"); }
- if (SQL_DEBUG == 1) { + if (defined('SQL_DEBUG') && SQL_DEBUG == 1) { $bt = debug_backtrace(); error_log("DEBUG: ".$bt[0]['file'].":".$bt[0]['line']." query: $query\n"); } -- 1.7.6
On Fri, Jul 29, 2011 at 2:26 PM, Lukas Fleischer <archlinux@cryptocrack.de> wrote:
On Thu, Jul 28, 2011 at 01:59:06PM -0500, Dan McGee wrote:
We shouldn't require this as it is a new config parameter and it causes PHP warnings to be spewed everywhere.
Well, "UPGRADING" tells you to merge "web/lib/config.inc.php.proto" with "web/lib/config.inc.php", so you should never see those warnings unless you run a snapshot/Git checkout and forgot to add this.
I tend to not read UPGRADING carefully when trying to debug production AUR, that's all. While I agree it is the prudent thing to do, I wouldn't have expected a patch like this to require changing my config if I didn't want debug, that's all.
Anyway, this additional check makes our code cleaner and the patch is simple enough to not require any discussion. I'll apply this as-is.
Thanks!
Signed-off-by: Dan McGee <dan@archlinux.org> --- web/lib/aur.inc.php | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/web/lib/aur.inc.php b/web/lib/aur.inc.php index 00a8c8c..55cc8a9 100644 --- a/web/lib/aur.inc.php +++ b/web/lib/aur.inc.php @@ -235,7 +235,7 @@ function db_query($query="", $db_handle="") { die("DB handle was not provided to db_query"); }
- if (SQL_DEBUG == 1) { + if (defined('SQL_DEBUG') && SQL_DEBUG == 1) { $bt = debug_backtrace(); error_log("DEBUG: ".$bt[0]['file'].":".$bt[0]['line']." query: $query\n"); } -- 1.7.6
This implements the following scheme: * /packages/cower/ --> /packages/co/cower/ * /packages/j/ --> /packages/j/j/ * /packages/zqy/ --> /packages/zq/y/ We take up to the first two characters of each package name as a intermediate subdirectory, and then the full package name lives underneath that. Why, you ask? Well because earlier today the AUR hit 32,000 entries in the unsupported/ directory, making new package uploads impossible. While some might argue we shouldn't have so many damn packages in the repos, we should be able to handle this case. Why two characters instead of one? Our two biggest two-char groups, 'pe' and 'py', both start with 'p', and have nearly 2000 packages each. Go Python and Perl. Still needed is a "move the existing data" script, as well as a set of rewrite rules for those wishing to preserve backward compatible URLs for any helper programs doing the wrong thing and relying on them. Signed-off-by: Dan McGee <dan@archlinux.org> --- scripts/cleanup | 24 ++++++++++++++++-------- web/html/pkgsubmit.php | 2 +- web/lib/aurjson.class.php | 2 +- web/template/pkg_details.php | 2 +- 4 files changed, 19 insertions(+), 11 deletions(-) diff --git a/scripts/cleanup b/scripts/cleanup index f287350..813d577 100755 --- a/scripts/cleanup +++ b/scripts/cleanup @@ -22,16 +22,24 @@ include("pkgfuncs.inc.php"); $count = 0; -$files = scandir(INCOMING_DIR); -foreach ($files as $pkgname) { - if ($pkgname == '.' || $pkgname == '..') { +$buckets = scandir(INCOMING_DIR); +foreach ($buckets as $bucket) { + $bucketpath = INCOMING_DIR . $bucket; + if ($bucket == '.' || $bucket == '..' || !is_dir($bucketpath)) { continue; } - $fullpath = INCOMING_DIR . $pkgname; - if (!package_exists($pkgname) && is_dir($fullpath)) { - echo 'Removing ' . $fullpath . "\n"; - rm_tree($fullpath); - $count++; + $files = scandir(INCOMING_DIR . $bucket); + foreach ($files as $pkgname) { + if ($pkgname == '.' || $pkgname == '..') { + continue; + } + $fullpath = INCOMING_DIR . $bucket . "/" . $pkgname; + echo $fullpath . "\n"; + if (!package_exists($pkgname) && is_dir($fullpath)) { + echo 'Removing ' . $fullpath . "\n"; + #rm_tree($fullpath); + $count++; + } } } diff --git a/web/html/pkgsubmit.php b/web/html/pkgsubmit.php index fd51c7e..9637dcd 100644 --- a/web/html/pkgsubmit.php +++ b/web/html/pkgsubmit.php @@ -256,7 +256,7 @@ if ($uid): } if (isset($pkg_name)) { - $incoming_pkgdir = INCOMING_DIR . $pkg_name; + $incoming_pkgdir = INCOMING_DIR . substr($pkg_name, 0, 2) . "/" . $pkg_name; } if (!$error) { diff --git a/web/lib/aurjson.class.php b/web/lib/aurjson.class.php index 5d15b89..277c824 100644 --- a/web/lib/aurjson.class.php +++ b/web/lib/aurjson.class.php @@ -125,7 +125,7 @@ class AurJSON { $search_data = array(); while ( $row = mysql_fetch_assoc($result) ) { $name = $row['Name']; - $row['URLPath'] = URL_DIR . $name . "/" . $name . ".tar.gz"; + $row['URLPath'] = URL_DIR . substr($name, 0, 2) . "/" . $name . "/" . $name . ".tar.gz"; if ($type == 'info') { $search_data = $row; diff --git a/web/template/pkg_details.php b/web/template/pkg_details.php index 0658063..5239123 100644 --- a/web/template/pkg_details.php +++ b/web/template/pkg_details.php @@ -90,7 +90,7 @@ $out_of_date_time = ($row["OutOfDateTS"] == 0) ? $msg : gmdate("r", intval($row[ <p><span class='f3'> <?php - $urlpath = URL_DIR . $row['Name']; + $urlpath = URL_DIR . substr($row['Name'], 0, 2) . "/" . $row['Name']; print "<a href='$urlpath/" . $row['Name'] . ".tar.gz'>".__("Tarball")."</a> :: "; print "<a href='$urlpath/PKGBUILD'>".__("PKGBUILD")."</a></span>"; -- 1.7.6
On Thu, Jul 28, 2011 at 01:59:07PM -0500, Dan McGee wrote:
This implements the following scheme:
* /packages/cower/ --> /packages/co/cower/ * /packages/j/ --> /packages/j/j/ * /packages/zqy/ --> /packages/zq/y/
I hope there's a typo in the last example, otherwise I must have misunderstood something :)
We take up to the first two characters of each package name as a intermediate subdirectory, and then the full package name lives underneath that.
Why, you ask? Well because earlier today the AUR hit 32,000 entries in the unsupported/ directory, making new package uploads impossible. While some might argue we shouldn't have so many damn packages in the repos, we should be able to handle this case.
Why two characters instead of one? Our two biggest two-char groups, 'pe' and 'py', both start with 'p', and have nearly 2000 packages each. Go Python and Perl.
Time to move to ext4, eh? No, seriously: Something tells me that we should neither be filesystem dependant nor depend on the current distribution of package names. Using some better hash algorithm might fix the second problem while reducing predictability for the end user. Given that we will probably run into the same again soon (according to current statistics, there are about two months left), I will apply this temporary workaround and prepare a release soon, though.
Still needed is a "move the existing data" script, as well as a set of rewrite rules for those wishing to preserve backward compatible URLs for any helper programs doing the wrong thing and relying on them.
If we provide backward compatible URLs, why not keep them as default? I doubt this will affect performance...
Signed-off-by: Dan McGee <dan@archlinux.org> --- scripts/cleanup | 24 ++++++++++++++++-------- web/html/pkgsubmit.php | 2 +- web/lib/aurjson.class.php | 2 +- web/template/pkg_details.php | 2 +- 4 files changed, 19 insertions(+), 11 deletions(-)
diff --git a/scripts/cleanup b/scripts/cleanup index f287350..813d577 100755 --- a/scripts/cleanup +++ b/scripts/cleanup @@ -22,16 +22,24 @@ include("pkgfuncs.inc.php");
$count = 0;
-$files = scandir(INCOMING_DIR); -foreach ($files as $pkgname) { - if ($pkgname == '.' || $pkgname == '..') { +$buckets = scandir(INCOMING_DIR); +foreach ($buckets as $bucket) { + $bucketpath = INCOMING_DIR . $bucket; + if ($bucket == '.' || $bucket == '..' || !is_dir($bucketpath)) { continue; } - $fullpath = INCOMING_DIR . $pkgname; - if (!package_exists($pkgname) && is_dir($fullpath)) { - echo 'Removing ' . $fullpath . "\n"; - rm_tree($fullpath); - $count++; + $files = scandir(INCOMING_DIR . $bucket); + foreach ($files as $pkgname) { + if ($pkgname == '.' || $pkgname == '..') { + continue; + } + $fullpath = INCOMING_DIR . $bucket . "/" . $pkgname; + echo $fullpath . "\n"; + if (!package_exists($pkgname) && is_dir($fullpath)) { + echo 'Removing ' . $fullpath . "\n"; + #rm_tree($fullpath); + $count++; + } } }
diff --git a/web/html/pkgsubmit.php b/web/html/pkgsubmit.php index fd51c7e..9637dcd 100644 --- a/web/html/pkgsubmit.php +++ b/web/html/pkgsubmit.php @@ -256,7 +256,7 @@ if ($uid): }
if (isset($pkg_name)) { - $incoming_pkgdir = INCOMING_DIR . $pkg_name; + $incoming_pkgdir = INCOMING_DIR . substr($pkg_name, 0, 2) . "/" . $pkg_name; }
if (!$error) { diff --git a/web/lib/aurjson.class.php b/web/lib/aurjson.class.php index 5d15b89..277c824 100644 --- a/web/lib/aurjson.class.php +++ b/web/lib/aurjson.class.php @@ -125,7 +125,7 @@ class AurJSON { $search_data = array(); while ( $row = mysql_fetch_assoc($result) ) { $name = $row['Name']; - $row['URLPath'] = URL_DIR . $name . "/" . $name . ".tar.gz"; + $row['URLPath'] = URL_DIR . substr($name, 0, 2) . "/" . $name . "/" . $name . ".tar.gz";
if ($type == 'info') { $search_data = $row; diff --git a/web/template/pkg_details.php b/web/template/pkg_details.php index 0658063..5239123 100644 --- a/web/template/pkg_details.php +++ b/web/template/pkg_details.php @@ -90,7 +90,7 @@ $out_of_date_time = ($row["OutOfDateTS"] == 0) ? $msg : gmdate("r", intval($row[
<p><span class='f3'> <?php - $urlpath = URL_DIR . $row['Name']; + $urlpath = URL_DIR . substr($row['Name'], 0, 2) . "/" . $row['Name']; print "<a href='$urlpath/" . $row['Name'] . ".tar.gz'>".__("Tarball")."</a> :: "; print "<a href='$urlpath/PKGBUILD'>".__("PKGBUILD")."</a></span>";
-- 1.7.6
On Fri, Jul 29, 2011 at 3:32 PM, Lukas Fleischer <archlinux@cryptocrack.de> wrote:
On Thu, Jul 28, 2011 at 01:59:07PM -0500, Dan McGee wrote:
This implements the following scheme:
* /packages/cower/ --> /packages/co/cower/ * /packages/j/ --> /packages/j/j/ * /packages/zqy/ --> /packages/zq/y/
I hope there's a typo in the last example, otherwise I must have misunderstood something :) Yes, typo.
We take up to the first two characters of each package name as a intermediate subdirectory, and then the full package name lives underneath that.
Why, you ask? Well because earlier today the AUR hit 32,000 entries in the unsupported/ directory, making new package uploads impossible. While some might argue we shouldn't have so many damn packages in the repos, we should be able to handle this case.
Why two characters instead of one? Our two biggest two-char groups, 'pe' and 'py', both start with 'p', and have nearly 2000 packages each. Go Python and Perl.
Time to move to ext4, eh? No, seriously: Something tells me that we should neither be filesystem dependant nor depend on the current distribution of package names. Using some better hash algorithm might fix the second problem while reducing predictability for the end user. We wouldn't be the first to use a scheme like this, so it felt like the right choice. See for example: http://pypi.python.org/packages/source/D/Django/Django-1.3.tar.gz (one letter only) http://search.cpan.org/CPAN/authors/id/S/SR/SRI/Mojolicious-1.68.tar.gz (segmented by author, multiple levels)
Ext4 came up as an option yesterday, but its best to prevent one from shooting themselves in the foot like this, and this seems like the more proper solution. I do share some thoughts that we shouldn't depend on a certain distribution of package names, but reducing predictability seemed like a big enough downfall that I didn't want to go that way, and moving to this scheme greatly increases the time before we'd hit any problems. If anyone has predictable but scalable solutions I'm more than open to hearing them. Of course, we do provide the URL in the JSON request for a reason.
Given that we will probably run into the same again soon (according to current statistics, there are about two months left), I will apply this temporary workaround and prepare a release soon, though. Yeah, we were able to free ~1000 "spots" or so, so we have breathing room, but not loads of time. I did this via the cleanup script if that wasn't obvious, realize I didn't really say that anywhere.
Still needed is a "move the existing data" script, as well as a set of rewrite rules for those wishing to preserve backward compatible URLs for any helper programs doing the wrong thing and relying on them.
If we provide backward compatible URLs, why not keep them as default? I doubt this will affect performance... It wouldn't- I was more concerned with keeping the AUR deployable as easily as possible, and not tied to a specific webserver and it's hairy configuration of rewrite rules. Making them optional prevents us from having to muck with things at that level.
Signed-off-by: Dan McGee <dan@archlinux.org> --- scripts/cleanup | 24 ++++++++++++++++-------- web/html/pkgsubmit.php | 2 +- web/lib/aurjson.class.php | 2 +- web/template/pkg_details.php | 2 +- 4 files changed, 19 insertions(+), 11 deletions(-)
diff --git a/scripts/cleanup b/scripts/cleanup index f287350..813d577 100755 --- a/scripts/cleanup +++ b/scripts/cleanup @@ -22,16 +22,24 @@ include("pkgfuncs.inc.php");
$count = 0;
-$files = scandir(INCOMING_DIR); -foreach ($files as $pkgname) { - if ($pkgname == '.' || $pkgname == '..') { +$buckets = scandir(INCOMING_DIR); +foreach ($buckets as $bucket) { + $bucketpath = INCOMING_DIR . $bucket; + if ($bucket == '.' || $bucket == '..' || !is_dir($bucketpath)) { continue; } - $fullpath = INCOMING_DIR . $pkgname; - if (!package_exists($pkgname) && is_dir($fullpath)) { - echo 'Removing ' . $fullpath . "\n"; - rm_tree($fullpath); - $count++; + $files = scandir(INCOMING_DIR . $bucket); + foreach ($files as $pkgname) { + if ($pkgname == '.' || $pkgname == '..') { + continue; + } + $fullpath = INCOMING_DIR . $bucket . "/" . $pkgname; + echo $fullpath . "\n"; + if (!package_exists($pkgname) && is_dir($fullpath)) { + echo 'Removing ' . $fullpath . "\n"; + #rm_tree($fullpath); + $count++; + } } }
diff --git a/web/html/pkgsubmit.php b/web/html/pkgsubmit.php index fd51c7e..9637dcd 100644 --- a/web/html/pkgsubmit.php +++ b/web/html/pkgsubmit.php @@ -256,7 +256,7 @@ if ($uid): }
if (isset($pkg_name)) { - $incoming_pkgdir = INCOMING_DIR . $pkg_name; + $incoming_pkgdir = INCOMING_DIR . substr($pkg_name, 0, 2) . "/" . $pkg_name; }
if (!$error) { diff --git a/web/lib/aurjson.class.php b/web/lib/aurjson.class.php index 5d15b89..277c824 100644 --- a/web/lib/aurjson.class.php +++ b/web/lib/aurjson.class.php @@ -125,7 +125,7 @@ class AurJSON { $search_data = array(); while ( $row = mysql_fetch_assoc($result) ) { $name = $row['Name']; - $row['URLPath'] = URL_DIR . $name . "/" . $name . ".tar.gz"; + $row['URLPath'] = URL_DIR . substr($name, 0, 2) . "/" . $name . "/" . $name . ".tar.gz";
if ($type == 'info') { $search_data = $row; diff --git a/web/template/pkg_details.php b/web/template/pkg_details.php index 0658063..5239123 100644 --- a/web/template/pkg_details.php +++ b/web/template/pkg_details.php @@ -90,7 +90,7 @@ $out_of_date_time = ($row["OutOfDateTS"] == 0) ? $msg : gmdate("r", intval($row[
<p><span class='f3'> <?php - $urlpath = URL_DIR . $row['Name']; + $urlpath = URL_DIR . substr($row['Name'], 0, 2) . "/" . $row['Name']; print "<a href='$urlpath/" . $row['Name'] . ".tar.gz'>".__("Tarball")."</a> :: "; print "<a href='$urlpath/PKGBUILD'>".__("PKGBUILD")."</a></span>";
-- 1.7.6
On Fri, Jul 29, 2011 at 03:50:44PM -0500, Dan McGee wrote:
On Fri, Jul 29, 2011 at 3:32 PM, Lukas Fleischer <archlinux@cryptocrack.de> wrote:
On Thu, Jul 28, 2011 at 01:59:07PM -0500, Dan McGee wrote:
This implements the following scheme:
* /packages/cower/ --> /packages/co/cower/ * /packages/j/ --> /packages/j/j/ * /packages/zqy/ --> /packages/zq/y/
I hope there's a typo in the last example, otherwise I must have misunderstood something :) Yes, typo.
You might want to amend this when resubmitting the patch (details follow) :)
We take up to the first two characters of each package name as a intermediate subdirectory, and then the full package name lives underneath that.
Why, you ask? Well because earlier today the AUR hit 32,000 entries in the unsupported/ directory, making new package uploads impossible. While some might argue we shouldn't have so many damn packages in the repos, we should be able to handle this case.
Why two characters instead of one? Our two biggest two-char groups, 'pe' and 'py', both start with 'p', and have nearly 2000 packages each. Go Python and Perl.
Time to move to ext4, eh? No, seriously: Something tells me that we should neither be filesystem dependant nor depend on the current distribution of package names. Using some better hash algorithm might fix the second problem while reducing predictability for the end user. We wouldn't be the first to use a scheme like this, so it felt like the right choice. See for example: http://pypi.python.org/packages/source/D/Django/Django-1.3.tar.gz (one letter only) http://search.cpan.org/CPAN/authors/id/S/SR/SRI/Mojolicious-1.68.tar.gz (segmented by author, multiple levels)
Yeah, I've seen this before.
Ext4 came up as an option yesterday, but its best to prevent one from shooting themselves in the foot like this, and this seems like the more proper solution. I do share some thoughts that we shouldn't depend on a certain distribution of package names, but reducing predictability seemed like a big enough downfall that I didn't want to go that way, and moving to this scheme greatly increases the time before we'd hit any problems. If anyone has predictable but scalable solutions I'm more than open to hearing them. Of course, we do provide the URL in the JSON request for a reason.
Well, one predictable and scalable solution would be to split after every character or after every two characters and create nested directories (as we only allow a subset of all possible file names, that would still result in less than ~32000 subdirectories per directory). This would result in a very inscrutable directory structure tho. As I mentioned before, I'm fine with using the 2-character prefix for now. It feels like the best compromise. We can think about more individualized solutions later.
Given that we will probably run into the same again soon (according to current statistics, there are about two months left), I will apply this temporary workaround and prepare a release soon, though. Yeah, we were able to free ~1000 "spots" or so, so we have breathing room, but not loads of time. I did this via the cleanup script if that wasn't obvious, realize I didn't really say that anywhere.
That was kind of obvious, yeah :) Especially since you submitted the cleanup script patch as well.
Still needed is a "move the existing data" script, as well as a set of rewrite rules for those wishing to preserve backward compatible URLs for any helper programs doing the wrong thing and relying on them.
If we provide backward compatible URLs, why not keep them as default? I doubt this will affect performance... It wouldn't- I was more concerned with keeping the AUR deployable as easily as possible, and not tied to a specific webserver and it's hairy configuration of rewrite rules. Making them optional prevents us from having to muck with things at that level.
Ack.
Signed-off-by: Dan McGee <dan@archlinux.org> --- scripts/cleanup | 24 ++++++++++++++++-------- web/html/pkgsubmit.php | 2 +- web/lib/aurjson.class.php | 2 +- web/template/pkg_details.php | 2 +- 4 files changed, 19 insertions(+), 11 deletions(-)
On Thu, Jul 28, 2011 at 01:59:07PM -0500, Dan McGee wrote:
This implements the following scheme:
* /packages/cower/ --> /packages/co/cower/ * /packages/j/ --> /packages/j/j/ * /packages/zqy/ --> /packages/zq/y/
This should be fixed, as I mentioned before :)
We take up to the first two characters of each package name as a intermediate subdirectory, and then the full package name lives underneath that.
Why, you ask? Well because earlier today the AUR hit 32,000 entries in the unsupported/ directory, making new package uploads impossible. While some might argue we shouldn't have so many damn packages in the repos, we should be able to handle this case.
Why two characters instead of one? Our two biggest two-char groups, 'pe' and 'py', both start with 'p', and have nearly 2000 packages each. Go Python and Perl.
Still needed is a "move the existing data" script, as well as a set of rewrite rules for those wishing to preserve backward compatible URLs for any helper programs doing the wrong thing and relying on them.
Signed-off-by: Dan McGee <dan@archlinux.org> --- scripts/cleanup | 24 ++++++++++++++++-------- web/html/pkgsubmit.php | 2 +- web/lib/aurjson.class.php | 2 +- web/template/pkg_details.php | 2 +- 4 files changed, 19 insertions(+), 11 deletions(-)
diff --git a/scripts/cleanup b/scripts/cleanup index f287350..813d577 100755 --- a/scripts/cleanup +++ b/scripts/cleanup @@ -22,16 +22,24 @@ include("pkgfuncs.inc.php");
$count = 0;
-$files = scandir(INCOMING_DIR); -foreach ($files as $pkgname) { - if ($pkgname == '.' || $pkgname == '..') { +$buckets = scandir(INCOMING_DIR); +foreach ($buckets as $bucket) { + $bucketpath = INCOMING_DIR . $bucket; + if ($bucket == '.' || $bucket == '..' || !is_dir($bucketpath)) { continue; } - $fullpath = INCOMING_DIR . $pkgname; - if (!package_exists($pkgname) && is_dir($fullpath)) { - echo 'Removing ' . $fullpath . "\n"; - rm_tree($fullpath); - $count++; + $files = scandir(INCOMING_DIR . $bucket); + foreach ($files as $pkgname) { + if ($pkgname == '.' || $pkgname == '..') { + continue; + } + $fullpath = INCOMING_DIR . $bucket . "/" . $pkgname; + echo $fullpath . "\n"; + if (!package_exists($pkgname) && is_dir($fullpath)) { + echo 'Removing ' . $fullpath . "\n"; + #rm_tree($fullpath);
Any reason you did comment this out here?
+ $count++; + } } }
diff --git a/web/html/pkgsubmit.php b/web/html/pkgsubmit.php index fd51c7e..9637dcd 100644 --- a/web/html/pkgsubmit.php +++ b/web/html/pkgsubmit.php @@ -256,7 +256,7 @@ if ($uid): }
if (isset($pkg_name)) { - $incoming_pkgdir = INCOMING_DIR . $pkg_name; + $incoming_pkgdir = INCOMING_DIR . substr($pkg_name, 0, 2) . "/" . $pkg_name;
This won't work. You need to patch the mkdir() invocation below and enable "$recursive": ---- if (!@mkdir($incoming_pkgdir, 0777, true)) { ----
}
if (!$error) { diff --git a/web/lib/aurjson.class.php b/web/lib/aurjson.class.php index 5d15b89..277c824 100644 --- a/web/lib/aurjson.class.php +++ b/web/lib/aurjson.class.php @@ -125,7 +125,7 @@ class AurJSON { $search_data = array(); while ( $row = mysql_fetch_assoc($result) ) { $name = $row['Name']; - $row['URLPath'] = URL_DIR . $name . "/" . $name . ".tar.gz"; + $row['URLPath'] = URL_DIR . substr($name, 0, 2) . "/" . $name . "/" . $name . ".tar.gz";
if ($type == 'info') { $search_data = $row; diff --git a/web/template/pkg_details.php b/web/template/pkg_details.php index 0658063..5239123 100644 --- a/web/template/pkg_details.php +++ b/web/template/pkg_details.php @@ -90,7 +90,7 @@ $out_of_date_time = ($row["OutOfDateTS"] == 0) ? $msg : gmdate("r", intval($row[
<p><span class='f3'> <?php - $urlpath = URL_DIR . $row['Name']; + $urlpath = URL_DIR . substr($row['Name'], 0, 2) . "/" . $row['Name']; print "<a href='$urlpath/" . $row['Name'] . ".tar.gz'>".__("Tarball")."</a> :: "; print "<a href='$urlpath/PKGBUILD'>".__("PKGBUILD")."</a></span>";
-- 1.7.6
This implements the following scheme: * /packages/cower/ --> /packages/co/cower/ * /packages/j/ --> /packages/j/j/ * /packages/zqy/ --> /packages/zq/zqy/ We take up to the first two characters of each package name as a intermediate subdirectory, and then the full package name lives underneath that. Shorter named packages live in a single letter directory. Why, you ask? Well because earlier today the AUR hit 32,000 entries in the unsupported/ directory, making new package uploads impossible. While some might argue we shouldn't have so many damn packages in the repos, we should be able to handle this case. Why two characters instead of one? Our two biggest two-char groups, 'pe' and 'py', both start with 'p', and have nearly 2000 packages each. Go Python and Perl. Still needed is a "move the existing data" script, as well as a set of rewrite rules for those wishing to preserve backward compatible URLs for any helper programs doing the wrong thing and relying on them. Signed-off-by: Dan McGee <dan@archlinux.org> --- * commit message fixed * mkdir call is now recursive; mkdir/chdir also have silly @ operator removed as we have no reason not to log those errors to a webserver error log * cleanup script has debug stuff removed (and rm_tree is not commented) scripts/cleanup | 23 +++++++++++++++-------- web/html/pkgsubmit.php | 7 ++++--- web/lib/aurjson.class.php | 2 +- web/template/pkg_details.php | 2 +- 4 files changed, 21 insertions(+), 13 deletions(-) diff --git a/scripts/cleanup b/scripts/cleanup index f287350..d3ba3f9 100755 --- a/scripts/cleanup +++ b/scripts/cleanup @@ -22,16 +22,23 @@ include("pkgfuncs.inc.php"); $count = 0; -$files = scandir(INCOMING_DIR); -foreach ($files as $pkgname) { - if ($pkgname == '.' || $pkgname == '..') { +$buckets = scandir(INCOMING_DIR); +foreach ($buckets as $bucket) { + $bucketpath = INCOMING_DIR . $bucket; + if ($bucket == '.' || $bucket == '..' || !is_dir($bucketpath)) { continue; } - $fullpath = INCOMING_DIR . $pkgname; - if (!package_exists($pkgname) && is_dir($fullpath)) { - echo 'Removing ' . $fullpath . "\n"; - rm_tree($fullpath); - $count++; + $files = scandir(INCOMING_DIR . $bucket); + foreach ($files as $pkgname) { + if ($pkgname == '.' || $pkgname == '..') { + continue; + } + $fullpath = INCOMING_DIR . $bucket . "/" . $pkgname; + if (!package_exists($pkgname) && is_dir($fullpath)) { + echo 'Removing ' . $fullpath . "\n"; + rm_tree($fullpath); + $count++; + } } } diff --git a/web/html/pkgsubmit.php b/web/html/pkgsubmit.php index fd51c7e..6d1b11f 100644 --- a/web/html/pkgsubmit.php +++ b/web/html/pkgsubmit.php @@ -256,7 +256,7 @@ if ($uid): } if (isset($pkg_name)) { - $incoming_pkgdir = INCOMING_DIR . $pkg_name; + $incoming_pkgdir = INCOMING_DIR . substr($pkg_name, 0, 2) . "/" . $pkg_name; } if (!$error) { @@ -268,7 +268,8 @@ if ($uid): rm_tree($incoming_pkgdir); } - if (!@mkdir($incoming_pkgdir)) { + # The mode is masked by the current umask, so not as scary as it looks + if (!mkdir($incoming_pkgdir, 0777, true)) { $error = __( "Could not create directory %s.", $incoming_pkgdir); } } else { @@ -286,7 +287,7 @@ if ($uid): } if (!$error) { - if (!@chdir($incoming_pkgdir)) { + if (!chdir($incoming_pkgdir)) { $error = __("Could not change directory to %s.", $incoming_pkgdir); } diff --git a/web/lib/aurjson.class.php b/web/lib/aurjson.class.php index 5d15b89..277c824 100644 --- a/web/lib/aurjson.class.php +++ b/web/lib/aurjson.class.php @@ -125,7 +125,7 @@ class AurJSON { $search_data = array(); while ( $row = mysql_fetch_assoc($result) ) { $name = $row['Name']; - $row['URLPath'] = URL_DIR . $name . "/" . $name . ".tar.gz"; + $row['URLPath'] = URL_DIR . substr($name, 0, 2) . "/" . $name . "/" . $name . ".tar.gz"; if ($type == 'info') { $search_data = $row; diff --git a/web/template/pkg_details.php b/web/template/pkg_details.php index 0658063..5239123 100644 --- a/web/template/pkg_details.php +++ b/web/template/pkg_details.php @@ -90,7 +90,7 @@ $out_of_date_time = ($row["OutOfDateTS"] == 0) ? $msg : gmdate("r", intval($row[ <p><span class='f3'> <?php - $urlpath = URL_DIR . $row['Name']; + $urlpath = URL_DIR . substr($row['Name'], 0, 2) . "/" . $row['Name']; print "<a href='$urlpath/" . $row['Name'] . ".tar.gz'>".__("Tarball")."</a> :: "; print "<a href='$urlpath/PKGBUILD'>".__("PKGBUILD")."</a></span>"; -- 1.7.6
On Thu, Jul 28, 2011 at 01:59:05PM -0500, Dan McGee wrote:
No need to shell out to the system here. Also fix the script so it actually works.
Signed-off-by: Dan McGee <dan@archlinux.org> --- scripts/cleanup | 20 +++++++++++--------- 1 files changed, 11 insertions(+), 9 deletions(-)
Looks good, thanks!
participants (3)
-
Dan McGee
-
Dan McGee
-
Lukas Fleischer