On 7/8/07, Thomas Bächler <thomas@archlinux.org> wrote:
I started writing new database scripts to replace the old db-*/updatesync-many/pkgdb2 scripts. The reason are support for the new pacman 3 naming scheme and fixing some design issues.
The old scripts use PKGBUILDs a lot for their operation and don't check the package file for consistency with the PKGBUILD (only the filename). They also move every file from the staging dir, regardless of whether the package will be added in any repository. And they recreate much of the functionality from updatesync, sort of reinventing the wheel.
My new draft has a cleaner design, but some new problems appear. It performs the following steps:
1) Check every file in the staging/add dir using a small libalpm-based tool and obtain the pkgname, pkgver, pkgrel and architecture. Compare the arch specified in the commandline with the arch from the package (this step is missing in the old scripts). Find the PKGBUILD and compare pkgname, pkgver and pkgrel with the values from the package. If all checks are okay, add the package to a list. If additionally, a force flag is set in the PKGBUILD, add it to a "force-list".
2) Check every file in the staging/del dir and obtain its pkgname. Add this package to a delete list.
3) Lock the database
4) Move all packages and force-packages to the ftp dir, add them with repo-add.
5) Pass the package files to a pkgdb2-like tool to add them to the web interface.
6) Delete all package from the delete-list with repo-remove.
7) Pass the package names to a pkgdb2-like tool to remove them from the web interface.
8) Release the database lock.
The problems I run into are these:
a) In step 4) I can't determinte the filename of the old package and remove it from the ftp. I could scan for certain filename schemes, but take into account that the package filename could be anything now, the script doesn't care. We could however rely on our current filename scheme to find/remove the dupes (who would go through the trouble to rename his package to sth like wrongname-notapackage.zip.bz8 anyway?).
For now I'd rely on the packages being <pkgname>-<pkgver>-<pkgrel>*.pkg.tar.gz names. This should cover the bases for both new and old names. We don't need to cover every hard case for our own butts.
b) The same thing counts for step 6). But we mv our packages to the staging/del anyway so the script won't have to remove them from the ftp.
c) This is the biggest problem currently. In step 5), due to the new script design, I don't have any data from the PKGBUILD any more and don't want to go find it again. That means I only use data from the package file itself to add it to the mysql db. The package file lacks the "package category" and "source" which are in the web interface. I could only solve this by either removing this data from the web interface or adding it to the package file.
I personally think we should eventually add category information to a package. This is a lot different than a group in that it isn't something you would normally pacman -S multimedia. The real reason it would be great is something like "I need a movie player, what is available?" (pacman -Q --category multimedia). See here: <http://archlinux.org/pipermail/pacman-dev/2007-June/008556.html> However, for now, can't we just use whatever category the previous revision was in? That way most of them will stay looking good, and we might end up with a few that aren't categorized.
d) The web interface doesn't support different architectures, so only data for the i686 packages is shown. That is a bad thing imo, but once data for more architectures is added, we need to be more careful about removing things from the web interface, as one architecture might lose a package that the other still keeps.
We need to get this up and running soon. I'm don't have a 64-bit machine but it makes no sense not to give it the same first-class recognition as the i686 packages. (By the way- on this same note, I am starting to notice a bit too much of a black and white distinction between the two architectures- we should be working together as much as possible. By this I mean calling people a 32 bit or 64 bit dev- sure, they have those machines, but I'm sure everyone would be willing to build for either if they had the equipment and time. But this is OT, sorry.) -Dan