[arch-dev-public] New /arch/db-* scripts
I started writing new database scripts to replace the old db-*/updatesync-many/pkgdb2 scripts. The reason are support for the new pacman 3 naming scheme and fixing some design issues. The old scripts use PKGBUILDs a lot for their operation and don't check the package file for consistency with the PKGBUILD (only the filename). They also move every file from the staging dir, regardless of whether the package will be added in any repository. And they recreate much of the functionality from updatesync, sort of reinventing the wheel. My new draft has a cleaner design, but some new problems appear. It performs the following steps: 1) Check every file in the staging/add dir using a small libalpm-based tool and obtain the pkgname, pkgver, pkgrel and architecture. Compare the arch specified in the commandline with the arch from the package (this step is missing in the old scripts). Find the PKGBUILD and compare pkgname, pkgver and pkgrel with the values from the package. If all checks are okay, add the package to a list. If additionally, a force flag is set in the PKGBUILD, add it to a "force-list". 2) Check every file in the staging/del dir and obtain its pkgname. Add this package to a delete list. 3) Lock the database 4) Move all packages and force-packages to the ftp dir, add them with repo-add. 5) Pass the package files to a pkgdb2-like tool to add them to the web interface. 6) Delete all package from the delete-list with repo-remove. 7) Pass the package names to a pkgdb2-like tool to remove them from the web interface. 8) Release the database lock. The problems I run into are these: a) In step 4) I can't determinte the filename of the old package and remove it from the ftp. I could scan for certain filename schemes, but take into account that the package filename could be anything now, the script doesn't care. We could however rely on our current filename scheme to find/remove the dupes (who would go through the trouble to rename his package to sth like wrongname-notapackage.zip.bz8 anyway?). b) The same thing counts for step 6). But we mv our packages to the staging/del anyway so the script won't have to remove them from the ftp. c) This is the biggest problem currently. In step 5), due to the new script design, I don't have any data from the PKGBUILD any more and don't want to go find it again. That means I only use data from the package file itself to add it to the mysql db. The package file lacks the "package category" and "source" which are in the web interface. I could only solve this by either removing this data from the web interface or adding it to the package file. d) The web interface doesn't support different architectures, so only data for the i686 packages is shown. That is a bad thing imo, but once data for more architectures is added, we need to be more careful about removing things from the web interface, as one architecture might lose a package that the other still keeps. I'd appreciate any comments on this, especially to problem c).
On Sun, July 8, 2007 19:39, Thomas Bächler wrote:
I started writing new database scripts to replace the old db-*/updatesync-many/pkgdb2 scripts. The reason are support for the new pacman 3 naming scheme and fixing some design issues.
The old scripts use PKGBUILDs a lot for their operation and don't check the package file for consistency with the PKGBUILD (only the filename). They also move every file from the staging dir, regardless of whether the package will be added in any repository. And they recreate much of the functionality from updatesync, sort of reinventing the wheel.
My new draft has a cleaner design, but some new problems appear. It performs the following steps:
1) Check every file in the staging/add dir using a small libalpm-based tool and obtain the pkgname, pkgver, pkgrel and architecture. Compare the arch specified in the commandline with the arch from the package (this step is missing in the old scripts). Find the PKGBUILD and compare pkgname, pkgver and pkgrel with the values from the package. If all checks are okay, add the package to a list. If additionally, a force flag is set in the PKGBUILD, add it to a "force-list".
2) Check every file in the staging/del dir and obtain its pkgname. Add this package to a delete list.
3) Lock the database
4) Move all packages and force-packages to the ftp dir, add them with repo-add.
5) Pass the package files to a pkgdb2-like tool to add them to the web interface.
6) Delete all package from the delete-list with repo-remove.
7) Pass the package names to a pkgdb2-like tool to remove them from the web interface.
8) Release the database lock.
The problems I run into are these:
a) In step 4) I can't determinte the filename of the old package and remove it from the ftp. I could scan for certain filename schemes, but take into account that the package filename could be anything now, the script doesn't care. We could however rely on our current filename scheme to find/remove the dupes (who would go through the trouble to rename his package to sth like wrongname-notapackage.zip.bz8 anyway?).
b) The same thing counts for step 6). But we mv our packages to the staging/del anyway so the script won't have to remove them from the ftp.
c) This is the biggest problem currently. In step 5), due to the new script design, I don't have any data from the PKGBUILD any more and don't want to go find it again. That means I only use data from the package file itself to add it to the mysql db. The package file lacks the "package category" and "source" which are in the web interface. I could only solve this by either removing this data from the web interface or adding it to the package file.
IIRC, you also need the PKGBUILD for force=y. That might have been fixed in pacman3. James
On Sun, July 8, 2007 22:10, James Rayner wrote:
On Sun, July 8, 2007 19:39, Thomas Bächler wrote:
I started writing new database scripts to replace the old db-*/updatesync-many/pkgdb2 scripts. The reason are support for the new pacman 3 naming scheme and fixing some design issues.
The old scripts use PKGBUILDs a lot for their operation and don't check the package file for consistency with the PKGBUILD (only the filename). They also move every file from the staging dir, regardless of whether the package will be added in any repository. And they recreate much of the functionality from updatesync, sort of reinventing the wheel.
My new draft has a cleaner design, but some new problems appear. It performs the following steps:
1) Check every file in the staging/add dir using a small libalpm-based tool and obtain the pkgname, pkgver, pkgrel and architecture. Compare the arch specified in the commandline with the arch from the package (this step is missing in the old scripts). Find the PKGBUILD and compare pkgname, pkgver and pkgrel with the values from the package. If all checks are okay, add the package to a list. If additionally, a force flag is set in the PKGBUILD, add it to a "force-list".
2) Check every file in the staging/del dir and obtain its pkgname. Add this package to a delete list.
3) Lock the database
4) Move all packages and force-packages to the ftp dir, add them with repo-add.
5) Pass the package files to a pkgdb2-like tool to add them to the web interface.
6) Delete all package from the delete-list with repo-remove.
7) Pass the package names to a pkgdb2-like tool to remove them from the web interface.
8) Release the database lock.
The problems I run into are these:
a) In step 4) I can't determinte the filename of the old package and remove it from the ftp. I could scan for certain filename schemes, but take into account that the package filename could be anything now, the script doesn't care. We could however rely on our current filename scheme to find/remove the dupes (who would go through the trouble to rename his package to sth like wrongname-notapackage.zip.bz8 anyway?).
b) The same thing counts for step 6). But we mv our packages to the staging/del anyway so the script won't have to remove them from the ftp.
c) This is the biggest problem currently. In step 5), due to the new script design, I don't have any data from the PKGBUILD any more and don't want to go find it again. That means I only use data from the package file itself to add it to the mysql db. The package file lacks the "package category" and "source" which are in the web interface. I could only solve this by either removing this data from the web interface or adding it to the package file.
IIRC, you also need the PKGBUILD for force=y. That might have been fixed in pacman3.
hrm. ignore me, was in a hurry, didnt read.
Thomas Bächler wrote:
I started writing new database scripts to replace the old db-*/updatesync-many/pkgdb2 scripts. The reason are support for the new pacman 3 naming scheme and fixing some design issues.
The old scripts use PKGBUILDs a lot for their operation and don't check the package file for consistency with the PKGBUILD (only the filename). They also move every file from the staging dir, regardless of whether the package will be added in any repository. And they recreate much of the functionality from updatesync, sort of reinventing the wheel.
My new draft has a cleaner design, but some new problems appear. It performs the following steps:
1) Check every file in the staging/add dir using a small libalpm-based tool and obtain the pkgname, pkgver, pkgrel and architecture. Compare the arch specified in the commandline with the arch from the package (this step is missing in the old scripts). Find the PKGBUILD and compare pkgname, pkgver and pkgrel with the values from the package. If all checks are okay, add the package to a list. If additionally, a force flag is set in the PKGBUILD, add it to a "force-list".
2) Check every file in the staging/del dir and obtain its pkgname. Add this package to a delete list.
3) Lock the database
4) Move all packages and force-packages to the ftp dir, add them with repo-add.
5) Pass the package files to a pkgdb2-like tool to add them to the web interface.
6) Delete all package from the delete-list with repo-remove.
7) Pass the package names to a pkgdb2-like tool to remove them from the web interface.
8) Release the database lock.
The problems I run into are these:
a) In step 4) I can't determinte the filename of the old package and remove it from the ftp. I could scan for certain filename schemes, but take into account that the package filename could be anything now, the script doesn't care. We could however rely on our current filename scheme to find/remove the dupes (who would go through the trouble to rename his package to sth like wrongname-notapackage.zip.bz8 anyway?).
b) The same thing counts for step 6). But we mv our packages to the staging/del anyway so the script won't have to remove them from the ftp.
c) This is the biggest problem currently. In step 5), due to the new script design, I don't have any data from the PKGBUILD any more and don't want to go find it again. That means I only use data from the package file itself to add it to the mysql db. The package file lacks the "package category" and "source" which are in the web interface. I could only solve this by either removing this data from the web interface or adding it to the package file.
My recommendation is that while you're in the PKGBUILD, grab these things, and keep that data around associated with that pkgname. At least that's how I do it for tupkgupdate for the TUs. Then you'll have it when you need to add it to the mysql db. I also match up every package in the repo directory with every PKGBUILD in the source tree, so I know which binary packages are missing. At that time, I note the full filename of that package so I can delete it later. Of course, I'm changing most of these tactics in repoman. There'll be a lightweight upload server that takes the file and figures out what to do with it. It will have the full sql db and repo at its disposal, and will know which files were committed by developers but haven't yet been uploaded as well as their sha1sum signatures. - P
On 7/8/07, Thomas Bächler <thomas@archlinux.org> wrote:
I started writing new database scripts to replace the old db-*/updatesync-many/pkgdb2 scripts. The reason are support for the new pacman 3 naming scheme and fixing some design issues.
The old scripts use PKGBUILDs a lot for their operation and don't check the package file for consistency with the PKGBUILD (only the filename). They also move every file from the staging dir, regardless of whether the package will be added in any repository. And they recreate much of the functionality from updatesync, sort of reinventing the wheel.
My new draft has a cleaner design, but some new problems appear. It performs the following steps:
1) Check every file in the staging/add dir using a small libalpm-based tool and obtain the pkgname, pkgver, pkgrel and architecture. Compare the arch specified in the commandline with the arch from the package (this step is missing in the old scripts). Find the PKGBUILD and compare pkgname, pkgver and pkgrel with the values from the package. If all checks are okay, add the package to a list. If additionally, a force flag is set in the PKGBUILD, add it to a "force-list".
2) Check every file in the staging/del dir and obtain its pkgname. Add this package to a delete list.
3) Lock the database
4) Move all packages and force-packages to the ftp dir, add them with repo-add.
5) Pass the package files to a pkgdb2-like tool to add them to the web interface.
6) Delete all package from the delete-list with repo-remove.
7) Pass the package names to a pkgdb2-like tool to remove them from the web interface.
8) Release the database lock.
The problems I run into are these:
a) In step 4) I can't determinte the filename of the old package and remove it from the ftp. I could scan for certain filename schemes, but take into account that the package filename could be anything now, the script doesn't care. We could however rely on our current filename scheme to find/remove the dupes (who would go through the trouble to rename his package to sth like wrongname-notapackage.zip.bz8 anyway?).
For now I'd rely on the packages being <pkgname>-<pkgver>-<pkgrel>*.pkg.tar.gz names. This should cover the bases for both new and old names. We don't need to cover every hard case for our own butts.
b) The same thing counts for step 6). But we mv our packages to the staging/del anyway so the script won't have to remove them from the ftp.
c) This is the biggest problem currently. In step 5), due to the new script design, I don't have any data from the PKGBUILD any more and don't want to go find it again. That means I only use data from the package file itself to add it to the mysql db. The package file lacks the "package category" and "source" which are in the web interface. I could only solve this by either removing this data from the web interface or adding it to the package file.
I personally think we should eventually add category information to a package. This is a lot different than a group in that it isn't something you would normally pacman -S multimedia. The real reason it would be great is something like "I need a movie player, what is available?" (pacman -Q --category multimedia). See here: <http://archlinux.org/pipermail/pacman-dev/2007-June/008556.html> However, for now, can't we just use whatever category the previous revision was in? That way most of them will stay looking good, and we might end up with a few that aren't categorized.
d) The web interface doesn't support different architectures, so only data for the i686 packages is shown. That is a bad thing imo, but once data for more architectures is added, we need to be more careful about removing things from the web interface, as one architecture might lose a package that the other still keeps.
We need to get this up and running soon. I'm don't have a 64-bit machine but it makes no sense not to give it the same first-class recognition as the i686 packages. (By the way- on this same note, I am starting to notice a bit too much of a black and white distinction between the two architectures- we should be working together as much as possible. By this I mean calling people a 32 bit or 64 bit dev- sure, they have those machines, but I'm sure everyone would be willing to build for either if they had the equipment and time. But this is OT, sorry.) -Dan
Dan McGee schrieb:
(By the way- on this same note, I am starting to notice a bit too much of a black and white distinction between the two architectures- we should be working together as much as possible. By this I mean calling people a 32 bit or 64 bit dev- sure, they have those machines, but I'm sure everyone would be willing to build for either if they had the equipment and time. But this is OT, sorry.)
We have actually started that development already: - 64 bit developers can now be package maintainers themselves (this discrimination was weird, I'm glad it is gone) - a number of people with 64 bit hardware compile for both x86_64 and i686 (like pierre, andyrtr, myself). That is easily possible due to the backwards compatibility of x86_64.
On 7/11/07, Thomas Bächler <thomas@archlinux.org> wrote:
Dan McGee schrieb:
(By the way- on this same note, I am starting to notice a bit too much of a black and white distinction between the two architectures- we should be working together as much as possible. By this I mean calling people a 32 bit or 64 bit dev- sure, they have those machines, but I'm sure everyone would be willing to build for either if they had the equipment and time. But this is OT, sorry.)
We have actually started that development already: - 64 bit developers can now be package maintainers themselves (this discrimination was weird, I'm glad it is gone) - a number of people with 64 bit hardware compile for both x86_64 and i686 (like pierre, andyrtr, myself). That is easily possible due to the backwards compatibility of x86_64.
It'd still be nice if we could integrate architecture more closely into the dashboard... perhaps, seeing as no one has the drive to do some django development, we could grab community members to work on it. I have a free box if someone wants to setup a sandbox for playing with the arch site on it... cactus, ya interested?
It'd still be nice if we could integrate architecture more closely into the dashboard... perhaps, seeing as no one has the drive to do some django development, we could grab community members to work on it. I have a free box if someone wants to setup a sandbox for playing with the arch site on it... cactus, ya interested?
yeah. interested. we can pow-wow later about it more. got a few things on my plate..but I also have some open plate.
On 7/11/07, eliott <eliott@cactuswax.net> wrote:
yeah. interested. we can pow-wow later about it more. got a few things on my plate..but I also have some open plate.
Make sure to put some vegetables on there too! They'll keep you healthy!
participants (6)
-
Aaron Griffin
-
Dan McGee
-
eliott
-
James Rayner
-
Paul Mattal
-
Thomas Bächler