[pacman-dev] Repo database(s) layout

Mon Nov 4 22:58:38 EST 2013

On Mon, Nov 4, 2013 at 7:23 PM, Allan McRae <allan at archlinux.org> wrote:

> Hi,
>
> We currently have a .db and .files databases, with .files being a
> superset of .db.
>
> An idea was formed on IRC to completely separate these.   I.e. .db stays
> as it is and .files only includes the file lists.   We would then add
> .source to include the source package information.   I would set
> repo-add to automatically create all these files.
>
> We would then add something like "-S --refresh-files" and "-S
> --refresh-source" to download those files as a one off, printing a
> warning when using them if they are out of date compared to the repo.
> Another option is to use Usage as a flag for when to download them, but
> refreshing all those every update seems excessive.
>
> This would also allow us to have some basic pkgfile functionality in
> pacman (-So).
>
> So, there much to work out, but does the general idea sound good to people?
>
>
> No, this sounds like a step backwards to me, so -1 (multiplied by as many
times as I'm allowed to vote -1).

For a while, repo-add didn't know how to create .files databases. This was
added in January 2011:
https://projects.archlinux.org/pacman.git/commit/scripts/repo-add.sh.in?id=eda4d9ec00be1108ab4336a438299a283c5a0a90

That allowed us to commit a large change to the way dbscripts generated
these package files (which was error-prone, slow, and they were not
immediately up-to-date like they are now):
https://projects.archlinux.org/dbscripts.git/commit/?id=fc6a6ab07bde03c7f20d5a4ed971f8e699ee9b20

Why did I start down this road? Because it was absolutely impossible to get
consistent, "transactional", database data in any way shape or form that
didn't require 82 special cases in Archweb to handle parsing and loading
the data into a database. Once I open a .files database file, I know I
don't need anything else to have a consistent view of that database. As
soon as we have to pry into two different files, things were an absolute
mess, and one has to cross-reference two different files, guess and pray
that the architectures are actually correct on the files data (because
there is no way to tell if you don't have the other data, keep this in
mind), and have no real way of telling which database file lags the other.

I'm not sure what the rationale is for removing the non-files data from the
files databases. Does it make them notably larger or slower to process?

-Dan