[arch-dev-public] [PATCH 0/4] create-filelists patchset
This is a series of patches to make create-filelists a lot more efficient at what it does, and also make the files database a lot more useful. The end-user result is that the files database also includes the 'desc' and 'depends' entries found in a normal .db.tar.gz database. As far as other changes, the package loop rework patch is the most important one. Rather than inefficiently unzip every package to get the .PKGINFO file and determine its name and version, we use the .db.tar.gz directly which saves us a ton of work in most cases. Comments welcome, I'm not sure who is the head honcho that will pull these in, but it does help setup re-adding file support in archweb in addition to making this cron job suck a bit less. We might even think about running it more often than once a day now. -Dan Dan McGee (4): create-filelists: general cleanups create-filelists: s/REPO_DB_FILE/FILES_DB_FILE/g create-filelists: rework the package loop completely create-filelists: include desc/depends entries cron-jobs/create-filelists | 78 ++++++++++++++++++++++++++++---------------- 1 files changed, 50 insertions(+), 28 deletions(-)
* Specify lock name once
* Use new script name everywhere
* Clean up tabs/spaces and add a modeline. This isn't necessarily the one we
wanted to standardize on, but I picked the one the entire file is written
to at the moment.
Signed-off-by: Dan McGee
This will set up changes soon to come where we actually use the real repos
DB file so I don't want variable name confusion.
Signed-off-by: Dan McGee
Instead of wasting time extracting .PKGINFO twice from every single package
in the repos, use the package DB to eliminate most of the heavy lifting.
This way we only need to worry about looking at the packages that actually
have changed since the last time we built the package database.
This should give a noticeable performance increase to this job in addition to
reducing IO load and unnecessary reading of every package file.
Signed-off-by: Dan McGee
Make the files DB include everything the original packages DB includes
instead of just being 'files' entries. This will allow tools to do more with
these generated files and they can be used as a drop-in replacement for a
regular package database.
Signed-off-by: Dan McGee
Pierre/Aaron/Thomas, you guys have worked on these the most of anyone.
Any thoughts, or can I just push these?
-Dan
On Sat, Feb 27, 2010 at 12:01 PM, Dan McGee
This is a series of patches to make create-filelists a lot more efficient at what it does, and also make the files database a lot more useful. The end-user result is that the files database also includes the 'desc' and 'depends' entries found in a normal .db.tar.gz database.
As far as other changes, the package loop rework patch is the most important one. Rather than inefficiently unzip every package to get the .PKGINFO file and determine its name and version, we use the .db.tar.gz directly which saves us a ton of work in most cases.
Comments welcome, I'm not sure who is the head honcho that will pull these in, but it does help setup re-adding file support in archweb in addition to making this cron job suck a bit less. We might even think about running it more often than once a day now.
-Dan
Dan McGee (4): create-filelists: general cleanups create-filelists: s/REPO_DB_FILE/FILES_DB_FILE/g create-filelists: rework the package loop completely create-filelists: include desc/depends entries
cron-jobs/create-filelists | 78 ++++++++++++++++++++++++++++---------------- 1 files changed, 50 insertions(+), 28 deletions(-)
On Sun, 28 Feb 2010 19:16:10 -0600, Dan McGee
Pierre/Aaron/Thomas, you guys have worked on these the most of anyone. Any thoughts, or can I just push these?
Didn't had time to really think about it. It looks fine; though I didn't remember why I didn't implement it that way in the first place. Maybe in the long term we should add this to repo-add as an optional task. This would be the most efficient and consistent way of doing this instead of running this script via cron job. However, you can push this changes if you like. And don'T worry about the coding style right now; I'll take care of it once I had finished the "Bash Coding Style" document. And from that time on we only accept patches that apply to these guide lines. :-) -- Pierre Schmitz, https://users.archlinux.de/~pierre
On Mon, Mar 1, 2010 at 01:16, Pierre Schmitz
Maybe in the long term we should add this to repo-add as an optional task. This would be the most efficient and consistent way of doing this instead of running this script via cron job. http://bugs.archlinux.org/task/11302 It would be the best. I haven't implemented it yet though..
Am Samstag, 27. Februar 2010 19:01:32 schrieb Dan McGee:
As far as other changes, the package loop rework patch is the most important one. Rather than inefficiently unzip every package to get the .PKGINFO file and determine its name and version, we use the .db.tar.gz directly which saves us a ton of work in most cases.
Do you think this script should lock the repo while reading it's content? Otherwise there is a chance it reads a package or db file while it's being modified. We might even loose all previous data which would cause the script to reread all packages on the next run. The problem with all this might be that we end up with a bunch of scripts and operations which block each other. ATM, if we move a lot of packages, db-move will lock and unlock the repo for each of them. If one of our cronjobs gets lucky and catches the lock between two packages the move script will fail and we'll end up with an inconsistent repo. I guess one step to solve this is this (hard to read) patch http://code.phraktured.net/cgit.cgi/dbscripts/commit/?h=working&id=bfaec9eb47c1fe042b83f9539f81dca1cad609a2 And maybe we should even have read and write locks. -- Pierre Schmitz, https://users.archlinux.de/~pierre
participants (3)
-
Daenyth Blank
-
Dan McGee
-
Pierre Schmitz