[pacman-dev] Thoughts on database initialization and loading process
Hey guys, Hopefully someone has some insight, but I fear that this involves some pretty intimate knowledge with the alpm dirty internals. I implemented the local DB "version" check as something on the transaction process, due to the fact that we call get_pkgcache or find_package and all those fun methods all over the place with the expectation that they normally won't fail in any way. This was a bit of a hack, and has become more apparent as I look at some of the GPG issues. Due to the lazy loading of the package cache for databases, we never really know when a database may be initialized, or which call site will do so. This makes it hard for us to 1) do the local DB sanity check ASAP and in the right place, and 2) implement signature checking on databases each time it is accessed, which seems like the prudent decision to me. That leaves us with a few options: 1) Don't check database signatures every time, only when we "need to", e.g. right after download. Not a fan of this one. 2) Only check database signatures if we go through the transaction layer- just like we do now with the local version check. 3) Add some explicit alpm_dbs_set_in_stone() method that needs to be called before databases can be used, but after they are registered, that takes care of some bookkeeping stuff like this. Stinks because it introduces a rather kludgy API that depends on methods being called in procedural order (not to mention all frontends will need modification), but by far the easiest way to resolve this. 4) Fix callsites everywhere that may force a load of the pkgcache to actually check error return values for the "local db too old" and "gpg sig check failed" cases, among other, rather than just assuming the pkgcache is NULL without looking at pm_errno. Best solution as it requires no API change, but will definitely have a large LOC impact. Thoughts? -Dan
On 17/04/11 09:40, Dan McGee wrote:
Hey guys,
Hopefully someone has some insight, but I fear that this involves some pretty intimate knowledge with the alpm dirty internals.
I implemented the local DB "version" check as something on the transaction process, due to the fact that we call get_pkgcache or find_package and all those fun methods all over the place with the expectation that they normally won't fail in any way. This was a bit of a hack, and has become more apparent as I look at some of the GPG issues.
Due to the lazy loading of the package cache for databases, we never really know when a database may be initialized, or which call site will do so. This makes it hard for us to 1) do the local DB sanity check ASAP and in the right place, and 2) implement signature checking on databases each time it is accessed, which seems like the prudent decision to me.
That leaves us with a few options: 1) Don't check database signatures every time, only when we "need to", e.g. right after download. Not a fan of this one.
I am not a fan of this either, but I was slowly leaning that way earlier... In order for someone to tamper with your database once you have downloaded and verified it, they would need root access on your machine. If I have root access to your machine, I am not going to spend my time altering a database that will be overwritten on the next update. Saying that, I want to see this done properly, and checking a signature every time we load the db falls into what I define as "properly".
2) Only check database signatures if we go through the transaction layer- just like we do now with the local version check.
-1. That could give interesting issues. e.g. "pacman -Ss pkg" -> everything seems fine. "pacman -S pkg" -> your database is not signed properly. That is just not consistent.
3) Add some explicit alpm_dbs_set_in_stone() method that needs to be called before databases can be used, but after they are registered, that takes care of some bookkeeping stuff like this. Stinks because it introduces a rather kludgy API that depends on methods being called in procedural order (not to mention all frontends will need modification), but by far the easiest way to resolve this. 4) Fix callsites everywhere that may force a load of the pkgcache to actually check error return values for the "local db too old" and "gpg sig check failed" cases, among other, rather than just assuming the pkgcache is NULL without looking at pm_errno. Best solution as it requires no API change, but will definitely have a large LOC impact.
I think this is the probably the best solution, not because lack of API change but because it seems the most correct thing to be doing. I also think we should do more return value checking in a lot of places in the code base... Allan
On Sun 17 April 2011 at 10:53 +1000, Allan McRae wrote:
On 17/04/11 09:40, Dan McGee wrote:
4) Fix callsites everywhere that may force a load of the pkgcache to actually check error return values for the "local db too old" and "gpg sig check failed" cases, among other, rather than just assuming the pkgcache is NULL without looking at pm_errno. Best solution as it requires no API change, but will definitely have a large LOC impact.
I think this is the probably the best solution, not because lack of API change but because it seems the most correct thing to be doing. I also think we should do more return value checking in a lot of places in the code base...
We are looking at a broader problem: it is actually impossible to do any sane return value checking for all functions returning an alpm_list_t* since NULL is used both as the error value and as the empty list. [adding to this the fact that I am really not a fan of linked lists especially when there doesn't seem to be any reason to use them. NULL-terminated arrays of pointers work in almost all cases] I suggest adding a special value to return when an error occurs. Something like this: callers would check the value of pm_errno whenever return value == pm_listerror. I strongly support solving this return value problem (independently of the DB signature checking protocol). diff --git a/lib/libalpm/alpm_list.c b/lib/libalpm/alpm_list.c index 0a3f650..501a33c 100644 --- a/lib/libalpm/alpm_list.c +++ b/lib/libalpm/alpm_list.c @@ -75,6 +75,9 @@ void SYMEXPORT alpm_list_free_inner(alpm_list_t *list, alpm_list_fn_free fn) } } +/** Error value */ +static struct __alpm_list_t _alpm_list_dummy = { NULL, NULL, NULL }; +alpm_list_t * const pm_listerror SYMEXPORT = &_alpm_list_dummy; /* Mutators */ diff --git a/lib/libalpm/alpm_list.h b/lib/libalpm/alpm_list.h index 1f6393a..6fb7fde 100644 --- a/lib/libalpm/alpm_list.h +++ b/lib/libalpm/alpm_list.h @@ -42,6 +42,9 @@ typedef struct __alpm_list_t { struct __alpm_list_t *next; } alpm_list_t; +/** Return value when error occurs. */ +extern alpm_list_t * const pm_listerror; + #define FREELIST(p) do { alpm_list_free_inner(p, free); alpm_list_free(p); p = NULL; } while(0) typedef void (*alpm_list_fn_free)(void *); /* item deallocation callback */ diff --git a/lib/libalpm/handle.c b/lib/libalpm/handle.c index b55b02a..4939926 100644 --- a/lib/libalpm/handle.c +++ b/lib/libalpm/handle.c @@ -269,7 +269,7 @@ alpm_list_t SYMEXPORT *alpm_option_get_syncdbs() { if (handle == NULL) { pm_errno = PM_ERR_HANDLE_NULL; - return NULL; + return pm_listerror; } return handle->dbs_sync; } -- Rémy.
On Sun, Apr 17, 2011 at 9:32 AM, Rémy Oudompheng <remyoudompheng@gmail.com> wrote:
We are looking at a broader problem: it is actually impossible to do any sane return value checking for all functions returning an alpm_list_t* since NULL is used both as the error value and as the empty list.
[adding to this the fact that I am really not a fan of linked lists especially when there doesn't seem to be any reason to use them. NULL-terminated arrays of pointers work in almost all cases]
I suggest adding a special value to return when an error occurs. Something like this: callers would check the value of pm_errno whenever return value == pm_listerror.
IMO it looks better to return an int and give an alpm_list ** as argument.
On Sun 17 April 2011 at 17:48 +0200, Xavier Chantry wrote:
On Sun, Apr 17, 2011 at 9:32 AM, Rémy Oudompheng <remyoudompheng@gmail.com> wrote:
We are looking at a broader problem: it is actually impossible to do any sane return value checking for all functions returning an alpm_list_t* since NULL is used both as the error value and as the empty list.
[adding to this the fact that I am really not a fan of linked lists especially when there doesn't seem to be any reason to use them. NULL-terminated arrays of pointers work in almost all cases]
I suggest adding a special value to return when an error occurs. Something like this: callers would check the value of pm_errno whenever return value == pm_listerror.
IMO it looks better to return an int and give an alpm_list ** as argument.
I agree with that. -- Rémy
participants (4)
-
Allan McRae
-
Dan McGee
-
Rémy Oudompheng
-
Xavier Chantry