[pacman-dev] Repo database(s) layout
Hi, We currently have a .db and .files databases, with .files being a superset of .db. An idea was formed on IRC to completely separate these. I.e. .db stays as it is and .files only includes the file lists. We would then add .source to include the source package information. I would set repo-add to automatically create all these files. We would then add something like "-S --refresh-files" and "-S --refresh-source" to download those files as a one off, printing a warning when using them if they are out of date compared to the repo. Another option is to use Usage as a flag for when to download them, but refreshing all those every update seems excessive. This would also allow us to have some basic pkgfile functionality in pacman (-So). So, there much to work out, but does the general idea sound good to people? Allan
On Mon, Nov 4, 2013 at 7:23 PM, Allan McRae <allan@archlinux.org> wrote:
Hi,
We currently have a .db and .files databases, with .files being a superset of .db.
An idea was formed on IRC to completely separate these. I.e. .db stays as it is and .files only includes the file lists. We would then add .source to include the source package information. I would set repo-add to automatically create all these files.
We would then add something like "-S --refresh-files" and "-S --refresh-source" to download those files as a one off, printing a warning when using them if they are out of date compared to the repo. Another option is to use Usage as a flag for when to download them, but refreshing all those every update seems excessive.
This would also allow us to have some basic pkgfile functionality in pacman (-So).
So, there much to work out, but does the general idea sound good to people?
No, this sounds like a step backwards to me, so -1 (multiplied by as many times as I'm allowed to vote -1).
For a while, repo-add didn't know how to create .files databases. This was added in January 2011: https://projects.archlinux.org/pacman.git/commit/scripts/repo-add.sh.in?id=e... That allowed us to commit a large change to the way dbscripts generated these package files (which was error-prone, slow, and they were not immediately up-to-date like they are now): https://projects.archlinux.org/dbscripts.git/commit/?id=fc6a6ab07bde03c7f20d... Why did I start down this road? Because it was absolutely impossible to get consistent, "transactional", database data in any way shape or form that didn't require 82 special cases in Archweb to handle parsing and loading the data into a database. Once I open a .files database file, I know I don't need anything else to have a consistent view of that database. As soon as we have to pry into two different files, things were an absolute mess, and one has to cross-reference two different files, guess and pray that the architectures are actually correct on the files data (because there is no way to tell if you don't have the other data, keep this in mind), and have no real way of telling which database file lags the other. I'm not sure what the rationale is for removing the non-files data from the files databases. Does it make them notably larger or slower to process? -Dan
On 05/11/13 13:58, Dan McGee wrote:
On Mon, Nov 4, 2013 at 7:23 PM, Allan McRae <allan@archlinux.org> wrote:
Hi,
We currently have a .db and .files databases, with .files being a superset of .db.
An idea was formed on IRC to completely separate these. I.e. .db stays as it is and .files only includes the file lists. We would then add .source to include the source package information. I would set repo-add to automatically create all these files.
We would then add something like "-S --refresh-files" and "-S --refresh-source" to download those files as a one off, printing a warning when using them if they are out of date compared to the repo. Another option is to use Usage as a flag for when to download them, but refreshing all those every update seems excessive.
This would also allow us to have some basic pkgfile functionality in pacman (-So).
So, there much to work out, but does the general idea sound good to people?
No, this sounds like a step backwards to me, so -1 (multiplied by as many times as I'm allowed to vote -1).
For a while, repo-add didn't know how to create .files databases. This was added in January 2011: https://projects.archlinux.org/pacman.git/commit/scripts/repo-add.sh.in?id=e...
That allowed us to commit a large change to the way dbscripts generated these package files (which was error-prone, slow, and they were not immediately up-to-date like they are now): https://projects.archlinux.org/dbscripts.git/commit/?id=fc6a6ab07bde03c7f20d...
Why did I start down this road? Because it was absolutely impossible to get consistent, "transactional", database data in any way shape or form that didn't require 82 special cases in Archweb to handle parsing and loading the data into a database. Once I open a .files database file, I know I don't need anything else to have a consistent view of that database. As soon as we have to pry into two different files, things were an absolute mess, and one has to cross-reference two different files, guess and pray that the architectures are actually correct on the files data (because there is no way to tell if you don't have the other data, keep this in mind), and have no real way of telling which database file lags the other.
I'm not sure what the rationale is for removing the non-files data from the files databases. Does it make them notably larger or slower to process?
The non-files data makes up ~5% of the files database. But I am not understanding your argument against this. My idea is to have repo-add ALWAYS create a .db and .files databases instead of having to run repo-add twice to generate the separate files. In that case I find it redundant to have the .db information within .files database. But I really want to implement repo-add generating/updating both the .db and .files databases in a single call regardless of what information stays in the .files database. I suppose this comes down to the following questions. Where should the source package information go? The .db file? At a rough guess, the PGP signature for the source package would increase the repo database by an extra 30-40%. So perhaps a separate .source db? If separate, what information should go there? And should there be a type of database containing ALL information? Allan
participants (2)
-
Allan McRae
-
Dan McGee