Re: [pacman-dev] logging output crazy
2007/7/4, Xavier <shiningxc@gmail.com>:
I ran : LANG=C src/pacman/pacman --conf pacman.conf -Ud /var/cache/pacman/pkg/filesystem-0.8-3.pkg.tar.gz --debug > log1
Check the end of the log1 file, beginning at line 575. It's just crazy...
compressing the log since it was too big otherwise..
On Wed, Jul 04, 2007 at 10:50:05PM +0200, Xavier wrote:
2007/7/4, Xavier <shiningxc@gmail.com>:
I ran : LANG=C src/pacman/pacman --conf pacman.conf -Ud /var/cache/pacman/pkg/filesystem-0.8-3.pkg.tar.gz --debug > log1
Check the end of the log1 file, beginning at line 575. It's just crazy...
compressing the log since it was too big otherwise..
hmm I made a tiny progress on this one, looks like the bug appears with this error: 550 debug: chrooting in /home/xav/dev/pacman/foo/ 551 error: could not change the root directory (Operation not permitted) This happens when : 1) running as user 2) installing a package with a scriptlet (like filesystem) When running as root, the operation is permitted, so that error doesn't appear, and pacman debug output is fine. And that chroot is apparently made for scriptlets, so when there aren't any, it doesn't happen (and in this case also, the log is fine too). So, I'm wondering several things : is this chroot needed ? if it's needed, and it isn't allowed as an user, then is there no way to correctly a package as a simple user ? That's again an unusual case, but well I would still like to know more about it :) There is a second problem (still unusual case) that I need to figure out : when first installing a package as root, then installing a package as user, I get the error and behavior as described here : http://www.archlinux.org/pipermail/pacman-dev/2007-April/008008.html ie pacman freezes and can't even be cancelled with ctrl+C. It would be nicer if it exited cleanly instead :)
On Fri, Jul 06, 2007 at 11:29:55PM +0200, Xavier wrote:
hmm I made a tiny progress on this one, looks like the bug appears with this error: 550 debug: chrooting in /home/xav/dev/pacman/foo/ 551 error: could not change the root directory (Operation not permitted)
This happens when : 1) running as user 2) installing a package with a scriptlet (like filesystem)
When running as root, the operation is permitted, so that error doesn't appear, and pacman debug output is fine. And that chroot is apparently made for scriptlets, so when there aren't any, it doesn't happen (and in this case also, the log is fine too).
Hm think I found it. a fork was made for running the scriptlet. When it succeeded, exit(0) was used, but in case of errors, it used return(1). So it probably didn't exit, and the following code was executed twice, resulting in duplicated output. If someone could confirm this is correct, it would be great. At least, the following patch fix it :
From e4834035a5d928b2a85d1ada690ecd9b0d66d4e4 Mon Sep 17 00:00:00 2001 From: Chantry Xavier <shiningxc@gmail.com> Date: Sat, 7 Jul 2007 17:11:18 +0200 Subject: [PATCH] libalpm/trans.c : exit the forked process correctly in case of errors.
Signed-off-by: Chantry Xavier <shiningxc@gmail.com> --- lib/libalpm/trans.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/libalpm/trans.c b/lib/libalpm/trans.c index cde8bf0..3264bee 100644 --- a/lib/libalpm/trans.c +++ b/lib/libalpm/trans.c @@ -645,11 +645,11 @@ int _alpm_runscriptlet(const char *root, const char *installfn, _alpm_log(PM_LOG_DEBUG, _("chrooting in %s"), root); if(chroot(root) != 0) { _alpm_log(PM_LOG_ERROR, _("could not change the root directory (%s)"), strerror(errno)); - return(1); + exit(1); } if(chdir("/") != 0) { _alpm_log(PM_LOG_ERROR, _("could not change directory to / (%s)"), strerror(errno)); - return(1); + exit(1); } umask(0022); _alpm_log(PM_LOG_DEBUG, _("executing \"%s\""), cmdline); -- 1.5.2.2
On Fri, Jul 06, 2007 at 11:29:55PM +0200, Xavier wrote:
There is a second problem (still unusual case) that I need to figure out : when first installing a package as root, then installing a package as user, I get the error and behavior as described here : http://www.archlinux.org/pipermail/pacman-dev/2007-April/008008.html ie pacman freezes and can't even be cancelled with ctrl+C.
It would be nicer if it exited cleanly instead :)
The problem is that when a transaction fails, trans->state isn't set to STATE_COMMITED, but stays on STATE_COMMITING (see _alpm_trans_commit function in libalm/trans.c). Then when _alpm_trans_release is called, it does the following : 195 /* during a commit do not interrupt immediately, just after a target */ 196 if(trans->state == STATE_COMMITING || trans->state == STATE_INTERRUPTED) { 197 if(trans->state == STATE_COMMITING) { 198 trans->state = STATE_INTERRUPTED; 199 } 200 pm_errno = PM_ERR_TRANS_COMMITING; 201 return(-1); 202 } Does anyone understand the comment above? Anyway, now the state is set to INTERRUPTED. And the alpm_release function in libalpm/alpm.c : 77 while((dbs_left = alpm_list_count(handle->dbs_sync)) > 0) { 78 pmdb_t *db = (pmdb_t *)handle->dbs_sync->data; 79 _alpm_log(PM_LOG_DEBUG, _("removing DB %s, %d remaining..."), db->treename, 80 alpm_db_unregister(db); 81 db = NULL; 82 } There is an infinite loop there when alpm_db_unregister (db.c) fails, and it fails when handle->trans isn't NULL : 86 /* Do not unregister a database if a transaction is on-going */ 87 ASSERT(handle->trans == NULL, RET_ERR(PM_ERR_TRANS_NOT_NULL, -1)); Could we consider that when trans is not NULL, but trans->state is set to STATE_INTERRUPTED, then it isn't a on-going transaction anymore, and so we can unregister the database safely ?
From b130e08ed216e3a9caa0026bec213d9a0a5fa094 Mon Sep 17 00:00:00 2001 From: Chantry Xavier <shiningxc@gmail.com> Date: Sat, 7 Jul 2007 21:24:30 +0200 Subject: [PATCH] libalpm/db.c : allow unregistering db for interrupted transaction.
This prevents alpm_release to loop infinitely in case of an interrupted transaction, where the database wasn't unregistered. alpm_release should probably also be fixed, as it can still loop if db_unregister fails for another reason. Signed-off-by: Chantry Xavier <shiningxc@gmail.com> --- lib/libalpm/db.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/lib/libalpm/db.c b/lib/libalpm/db.c index 95a1ecb..7cc9390 100644 --- a/lib/libalpm/db.c +++ b/lib/libalpm/db.c @@ -84,7 +84,8 @@ int SYMEXPORT alpm_db_unregister(pmdb_t *db) ASSERT(handle != NULL, RET_ERR(PM_ERR_HANDLE_NULL, -1)); ASSERT(db != NULL, RET_ERR(PM_ERR_WRONG_ARGS, -1)); /* Do not unregister a database if a transaction is on-going */ - ASSERT(handle->trans == NULL, RET_ERR(PM_ERR_TRANS_NOT_NULL, -1)); + ASSERT(handle->trans == NULL || handle->trans->state == STATE_INTERRUPTED, + RET_ERR(PM_ERR_TRANS_NOT_NULL, -1)); if(db == handle->db_local) { handle->db_local = NULL; -- 1.5.2.2
On Sat, Jul 07, 2007 at 09:27:33PM +0200, Xavier wrote:
And the alpm_release function in libalpm/alpm.c :
77 while((dbs_left = alpm_list_count(handle->dbs_sync)) > 0) { 78 pmdb_t *db = (pmdb_t *)handle->dbs_sync->data; 79 _alpm_log(PM_LOG_DEBUG, _("removing DB %s, %d remaining..."), db->treename, dbs_left); 80 alpm_db_unregister(db); 81 db = NULL; 82 }
There is an infinite loop there when alpm_db_unregister (db.c) fails,
oops the _alpm_log line was cut during pasting, fixed it :) Anyway, I forgot to mention I found where this code was introduced : http://projects.archlinux.org/git/?p=pacman.git;a=commitdiff;h=55f178c629ada... So it looks like what the previous code did was even worse :) While I'm not very happy about how it was fixed, I can't find a better way to fix it.. I still think it's not very nice to have a code like that which can loop infinitely so easily.
On Sat, Jul 07, 2007 at 09:44:47PM +0200, Xavier wrote:
On Sat, Jul 07, 2007 at 09:27:33PM +0200, Xavier wrote:
And the alpm_release function in libalpm/alpm.c :
77 while((dbs_left = alpm_list_count(handle->dbs_sync)) > 0) { 78 pmdb_t *db = (pmdb_t *)handle->dbs_sync->data; 79 _alpm_log(PM_LOG_DEBUG, _("removing DB %s, %d remaining..."), db->treename, dbs_left); 80 alpm_db_unregister(db); 81 db = NULL; 82 }
There is an infinite loop there when alpm_db_unregister (db.c) fails,
oops the _alpm_log line was cut during pasting, fixed it :)
Anyway, I forgot to mention I found where this code was introduced : http://projects.archlinux.org/git/?p=pacman.git;a=commitdiff;h=55f178c629ada...
So it looks like what the previous code did was even worse :) While I'm not very happy about how it was fixed, I can't find a better way to fix it.. I still think it's not very nice to have a code like that which can loop infinitely so easily.
Erm, what can make alpm_db_unregister(db) fail isn't even specific to the db, so the same checks are done multiple times. Couldn't we have these checks done in alpm_release() instead, and have a private _alpm_db_unregister(db) that wouldn't fail, instead of a public one? The only thing that alpm_release do is unregistering databases anyway, so the public alpm_db_unregister(db) isn't even used by pacman. That would make the code cleaner and safer without losing any functionality imo. The only thing that is worrying me is that currently, there are symetrical pmdb_t *alpm_db_register() and alpm_db_unregister(pmdb_t*) functions, which makes sense, but the following patch removes that (from the client pov).
From 642a6509094b130ca7edaf312cf5e1a75eafff1e Mon Sep 17 00:00:00 2001 From: Chantry Xavier <shiningxc@gmail.com> Date: Mon, 9 Jul 2007 17:02:29 +0200 Subject: [PATCH] makes alpm_db_unregister(pmdb_t *) private.
This allows to clean the alpm_db_unregister and alpm_release functions a bit, and prevent alpm_release from looping infinitely. Signed-off-by: Chantry Xavier <shiningxc@gmail.com> --- lib/libalpm/alpm.c | 26 +++++++++++++------------- lib/libalpm/db.c | 25 +------------------------ lib/libalpm/db.h | 1 + 3 files changed, 15 insertions(+), 37 deletions(-) diff --git a/lib/libalpm/alpm.c b/lib/libalpm/alpm.c index 11d00d4..a1abebc 100644 --- a/lib/libalpm/alpm.c +++ b/lib/libalpm/alpm.c @@ -62,23 +62,23 @@ int SYMEXPORT alpm_initialize(void) */ int SYMEXPORT alpm_release(void) { - int dbs_left = 0; - ALPM_LOG_FUNC; + alpm_list_t *i; + + /* Sanity checks */ ASSERT(handle != NULL, RET_ERR(PM_ERR_HANDLE_NULL, -1)); + /* Do not unregister a database if a transaction is on-going */ + ASSERT(handle->trans == NULL || handle->trans->state == STATE_INTERRUPTED, + RET_ERR(PM_ERR_TRANS_NOT_NULL, -1)); - /* close local database */ - if(handle->db_local) { - alpm_db_unregister(handle->db_local); - handle->db_local = NULL; - } - /* and also sync ones */ - while((dbs_left = alpm_list_count(handle->dbs_sync)) > 0) { - pmdb_t *db = (pmdb_t *)handle->dbs_sync->data; - _alpm_log(PM_LOG_DEBUG, _("removing DB %s, %d remaining..."), db->treename, dbs_left); - alpm_db_unregister(db); - db = NULL; + _alpm_db_unregister(handle->db_local); + handle->db_local = NULL; + + for(i = handle->dbs_sync; i; i = i->next) { + pmdb_t *db = i->data; + _alpm_db_unregister(db); + i->data = NULL; } _alpm_handle_free(handle); diff --git a/lib/libalpm/db.c b/lib/libalpm/db.c index 7cc9390..5177aeb 100644 --- a/lib/libalpm/db.c +++ b/lib/libalpm/db.c @@ -74,33 +74,10 @@ pmdb_t SYMEXPORT *alpm_db_register(const char *treename) * @param db pointer to the package database to unregister * @return 0 on success, -1 on error (pm_errno is set accordingly) */ -int SYMEXPORT alpm_db_unregister(pmdb_t *db) +int _alpm_db_unregister(pmdb_t *db) { - int found = 0; - - ALPM_LOG_FUNC; - /* Sanity checks */ - ASSERT(handle != NULL, RET_ERR(PM_ERR_HANDLE_NULL, -1)); ASSERT(db != NULL, RET_ERR(PM_ERR_WRONG_ARGS, -1)); - /* Do not unregister a database if a transaction is on-going */ - ASSERT(handle->trans == NULL || handle->trans->state == STATE_INTERRUPTED, - RET_ERR(PM_ERR_TRANS_NOT_NULL, -1)); - - if(db == handle->db_local) { - handle->db_local = NULL; - found = 1; - } else { - void *data; - handle->dbs_sync = alpm_list_remove(handle->dbs_sync, db, _alpm_db_cmp, &data); - if(data) { - found = 1; - } - } - - if(!found) { - RET_ERR(PM_ERR_DB_NOT_FOUND, -1); - } _alpm_log(PM_LOG_DEBUG, _("unregistering database '%s'"), db->treename); diff --git a/lib/libalpm/db.h b/lib/libalpm/db.h index 2597c8c..5aaca69 100644 --- a/lib/libalpm/db.h +++ b/lib/libalpm/db.h @@ -53,6 +53,7 @@ void _alpm_db_free(pmdb_t *db); int _alpm_db_cmp(const void *db1, const void *db2); alpm_list_t *_alpm_db_search(pmdb_t *db, const alpm_list_t *needles); pmdb_t *_alpm_db_register(const char *treename); +int _alpm_db_unregister(pmdb_t *db); /* be.c, backend specific calls */ int _alpm_db_install(pmdb_t *db, const char *dbfile); -- 1.5.2.2
On Mon, Jul 09, 2007 at 05:06:05PM +0200, Xavier wrote:
On Sat, Jul 07, 2007 at 09:44:47PM +0200, Xavier wrote:
On Sat, Jul 07, 2007 at 09:27:33PM +0200, Xavier wrote:
And the alpm_release function in libalpm/alpm.c :
77 while((dbs_left = alpm_list_count(handle->dbs_sync)) > 0) { 78 pmdb_t *db = (pmdb_t *)handle->dbs_sync->data; 79 _alpm_log(PM_LOG_DEBUG, _("removing DB %s, %d remaining..."), db->treename, dbs_left); 80 alpm_db_unregister(db); 81 db = NULL; 82 }
There is an infinite loop there when alpm_db_unregister (db.c) fails,
oops the _alpm_log line was cut during pasting, fixed it :)
Anyway, I forgot to mention I found where this code was introduced : http://projects.archlinux.org/git/?p=pacman.git;a=commitdiff;h=55f178c629ada...
So it looks like what the previous code did was even worse :) While I'm not very happy about how it was fixed, I can't find a better way to fix it.. I still think it's not very nice to have a code like that which can loop infinitely so easily.
Erm, what can make alpm_db_unregister(db) fail isn't even specific to the db, so the same checks are done multiple times. Couldn't we have these checks done in alpm_release() instead, and have a private _alpm_db_unregister(db) that wouldn't fail, instead of a public one? The only thing that alpm_release do is unregistering databases anyway, so the public alpm_db_unregister(db) isn't even used by pacman.
That would make the code cleaner and safer without losing any functionality imo. The only thing that is worrying me is that currently, there are symetrical pmdb_t *alpm_db_register() and alpm_db_unregister(pmdb_t*) functions, which makes sense, but the following patch removes that (from the client pov).
Maybe it was not ok to remove that alpm_db_unregister public function, so I kept it, and added a warning in a comment instead. But now I provide an additional alpm_db_unregister_all() function, so this should fix all problems. So this might be a better patch :
From c31156cf14201fa7533057d75be82f330ca3dad4 Mon Sep 17 00:00:00 2001 From: Chantry Xavier <shiningxc@gmail.com> Date: Mon, 9 Jul 2007 17:02:29 +0200 Subject: [PATCH] libalpm/db.c : add alpm_db_unregister_all.
This basically moves the code from alpm_release, which was mostly about unregistering all databases, to a safer alpm_db_unregister_all. This allows to avoid modifying the dbs_sync list while iterating over it, and and also prevent alpm_release from looping infinitely when a database can't be unregistered. Signed-off-by: Chantry Xavier <shiningxc@gmail.com> --- lib/libalpm/alpm.c | 16 +----------- lib/libalpm/alpm.h | 1 + lib/libalpm/db.c | 63 ++++++++++++++++++++++++++++++++++++++++++--------- lib/libalpm/db.h | 1 + 4 files changed, 56 insertions(+), 25 deletions(-) diff --git a/lib/libalpm/alpm.c b/lib/libalpm/alpm.c index e3ad211..20d4a74 100644 --- a/lib/libalpm/alpm.c +++ b/lib/libalpm/alpm.c @@ -62,24 +62,12 @@ int SYMEXPORT alpm_initialize(void) */ int SYMEXPORT alpm_release(void) { - int dbs_left = 0; - ALPM_LOG_FUNC; ASSERT(handle != NULL, RET_ERR(PM_ERR_HANDLE_NULL, -1)); - /* close local database */ - if(handle->db_local) { - alpm_db_unregister(handle->db_local); - handle->db_local = NULL; - } - /* and also sync ones */ - while((dbs_left = alpm_list_count(handle->dbs_sync)) > 0) { - pmdb_t *db = (pmdb_t *)handle->dbs_sync->data; - _alpm_log(PM_LOG_DEBUG, "removing DB %s, %d remaining...", - db->treename, dbs_left); - alpm_db_unregister(db); - db = NULL; + if(alpm_db_unregister_all() == -1) { + return(-1); } _alpm_handle_free(handle); diff --git a/lib/libalpm/alpm.h b/lib/libalpm/alpm.h index 9e641f3..98028de 100644 --- a/lib/libalpm/alpm.h +++ b/lib/libalpm/alpm.h @@ -145,6 +145,7 @@ alpm_list_t *alpm_option_get_syncdbs(); pmdb_t *alpm_db_register(const char *treename); int alpm_db_unregister(pmdb_t *db); +int alpm_db_unregister_all(); const char *alpm_db_get_name(const pmdb_t *db); const char *alpm_db_get_url(const pmdb_t *db); diff --git a/lib/libalpm/db.c b/lib/libalpm/db.c index 306d0b9..aa02352 100644 --- a/lib/libalpm/db.c +++ b/lib/libalpm/db.c @@ -70,6 +70,35 @@ pmdb_t SYMEXPORT *alpm_db_register(const char *treename) return(_alpm_db_register(treename)); } +/** Unregister all package databases + * @return 0 on success, -1 on error (pm_errno is set accordingly) + */ +int SYMEXPORT alpm_db_unregister_all() +{ + alpm_list_t *i; + + ALPM_LOG_FUNC; + + /* Sanity checks */ + ASSERT(handle != NULL, RET_ERR(PM_ERR_HANDLE_NULL, -1)); + /* Do not unregister a database if a transaction is on-going */ + ASSERT(handle->trans == NULL || handle->trans->state == STATE_INTERRUPTED, + RET_ERR(PM_ERR_TRANS_NOT_NULL, -1)); + + /* close local database */ + _alpm_db_unregister(handle->db_local); + handle->db_local = NULL; + + /* and also sync ones */ + for(i = handle->dbs_sync; i; i = i->next) { + pmdb_t *db = i->data; + _alpm_db_unregister(db); + i->data = NULL; + } + FREELIST(handle->dbs_sync); + return(0); +} + /** Unregister a package database * @param db pointer to the package database to unregister * @return 0 on success, -1 on error (pm_errno is set accordingly) @@ -84,13 +113,17 @@ int SYMEXPORT alpm_db_unregister(pmdb_t *db) ASSERT(handle != NULL, RET_ERR(PM_ERR_HANDLE_NULL, -1)); ASSERT(db != NULL, RET_ERR(PM_ERR_WRONG_ARGS, -1)); /* Do not unregister a database if a transaction is on-going */ - ASSERT(handle->trans == NULL || handle->trans->state == STATE_INTERRUPTED, + ASSERT(handle->trans == NULL || handle->trans->state == STATE_INTERRUPTED, RET_ERR(PM_ERR_TRANS_NOT_NULL, -1)); if(db == handle->db_local) { handle->db_local = NULL; found = 1; } else { + /* Warning : this function shouldn't be used to unregister all sync databases + * by walking through the list returned by alpm_option_get_syncdbs, + * because the db is removed from that list here. + */ void *data; handle->dbs_sync = alpm_list_remove(handle->dbs_sync, db, _alpm_db_cmp, &data); if(data) { @@ -102,16 +135,7 @@ int SYMEXPORT alpm_db_unregister(pmdb_t *db) RET_ERR(PM_ERR_DB_NOT_FOUND, -1); } - _alpm_log(PM_LOG_DEBUG, "unregistering database '%s'", db->treename); - - /* Cleanup */ - _alpm_db_free_pkgcache(db); - - _alpm_log(PM_LOG_DEBUG, "closing database '%s'", db->treename); - _alpm_db_close(db); - - _alpm_db_free(db); - + _alpm_db_unregister(db); return(0); } @@ -542,6 +566,23 @@ error: /** @} */ +/* Helper function for alpm_db_unregister{_all} */ +void _alpm_db_unregister(pmdb_t *db) +{ + if(db == NULL) { + return; + } + _alpm_log(PM_LOG_DEBUG, "unregistering database '%s'", db->treename); + + /* Cleanup */ + _alpm_db_free_pkgcache(db); + + _alpm_log(PM_LOG_DEBUG, "closing database '%s'", db->treename); + _alpm_db_close(db); + + _alpm_db_free(db); +} + pmdb_t *_alpm_db_new(const char *dbpath, const char *treename) { pmdb_t *db; diff --git a/lib/libalpm/db.h b/lib/libalpm/db.h index 2597c8c..d31bf60 100644 --- a/lib/libalpm/db.h +++ b/lib/libalpm/db.h @@ -53,6 +53,7 @@ void _alpm_db_free(pmdb_t *db); int _alpm_db_cmp(const void *db1, const void *db2); alpm_list_t *_alpm_db_search(pmdb_t *db, const alpm_list_t *needles); pmdb_t *_alpm_db_register(const char *treename); +void _alpm_db_unregister(pmdb_t *db); /* be.c, backend specific calls */ int _alpm_db_install(pmdb_t *db, const char *dbfile); -- 1.5.2.4
On Sat, Jul 07, 2007 at 09:27:33PM +0200, Xavier wrote:
On Fri, Jul 06, 2007 at 11:29:55PM +0200, Xavier wrote:
There is a second problem (still unusual case) that I need to figure out : when first installing a package as root, then installing a package as user, I get the error and behavior as described here : http://www.archlinux.org/pipermail/pacman-dev/2007-April/008008.html ie pacman freezes and can't even be cancelled with ctrl+C.
It would be nicer if it exited cleanly instead :)
The problem is that when a transaction fails, trans->state isn't set to STATE_COMMITED, but stays on STATE_COMMITING (see _alpm_trans_commit function in libalm/trans.c). Then when _alpm_trans_release is called, it does the following :
195 /* during a commit do not interrupt immediately, just after a target */ 196 if(trans->state == STATE_COMMITING || trans->state == STATE_INTERRUPTED) { 197 if(trans->state == STATE_COMMITING) { 198 trans->state = STATE_INTERRUPTED; 199 } 200 pm_errno = PM_ERR_TRANS_COMMITING; 201 return(-1); 202 }
Does anyone understand the comment above? Anyway, now the state is set to INTERRUPTED. And the alpm_release function in libalpm/alpm.c :
77 while((dbs_left = alpm_list_count(handle->dbs_sync)) > 0) { 78 pmdb_t *db = (pmdb_t *)handle->dbs_sync->data; 79 _alpm_log(PM_LOG_DEBUG, _("removing DB %s, %d remaining..."), db->treename, 80 alpm_db_unregister(db); 81 db = NULL; 82 }
There is an infinite loop there when alpm_db_unregister (db.c) fails, and it fails when handle->trans isn't NULL : 86 /* Do not unregister a database if a transaction is on-going */ 87 ASSERT(handle->trans == NULL, RET_ERR(PM_ERR_TRANS_NOT_NULL, -1));
Could we consider that when trans is not NULL, but trans->state is set to STATE_INTERRUPTED, then it isn't a on-going transaction anymore, and so we can unregister the database safely ?
From b130e08ed216e3a9caa0026bec213d9a0a5fa094 Mon Sep 17 00:00:00 2001 From: Chantry Xavier <shiningxc@gmail.com> Date: Sat, 7 Jul 2007 21:24:30 +0200 Subject: [PATCH] libalpm/db.c : allow unregistering db for interrupted transaction.
This prevents alpm_release to loop infinitely in case of an interrupted transaction, where the database wasn't unregistered. alpm_release should probably also be fixed, as it can still loop if db_unregister fails for another reason.
Signed-off-by: Chantry Xavier <shiningxc@gmail.com> --- lib/libalpm/db.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/lib/libalpm/db.c b/lib/libalpm/db.c index 95a1ecb..7cc9390 100644 --- a/lib/libalpm/db.c +++ b/lib/libalpm/db.c @@ -84,7 +84,8 @@ int SYMEXPORT alpm_db_unregister(pmdb_t *db) ASSERT(handle != NULL, RET_ERR(PM_ERR_HANDLE_NULL, -1)); ASSERT(db != NULL, RET_ERR(PM_ERR_WRONG_ARGS, -1)); /* Do not unregister a database if a transaction is on-going */ - ASSERT(handle->trans == NULL, RET_ERR(PM_ERR_TRANS_NOT_NULL, -1)); + ASSERT(handle->trans == NULL || handle->trans->state == STATE_INTERRUPTED, + RET_ERR(PM_ERR_TRANS_NOT_NULL, -1));
if(db == handle->db_local) { handle->db_local = NULL; -- 1.5.2.2
While looking at FS#7652, I found something new : this problem doesn't happen on -S operations, only on -U, because of the following code in sync.c , _alpm_sync_commit function : 1027 error: 1028 _alpm_trans_free(tr); 1029 tr = NULL; 1030 /* commiting failed, so this is still just a prepared transaction */ 1031 trans->state = STATE_PREPARED; 1032 return(-1); So if an error happens in sync code, the state goes back to STATE_PREPARED. That looks even more hackish than my patch above, but if it's better, then the same should be done in _alpm_add_commit and _alpm_remove_commit. Otherwise, it should be removed and handled somewhere else, like with my patch. Which way should we take?
On Thu, Aug 09, 2007 at 12:06:06PM +0200, Xavier wrote:
While looking at FS#7652, I found something new : this problem doesn't happen on -S operations, only on -U, because of the following code in sync.c , _alpm_sync_commit function :
1027 error: 1028 _alpm_trans_free(tr); 1029 tr = NULL; 1030 /* commiting failed, so this is still just a prepared transaction */ 1031 trans->state = STATE_PREPARED; 1032 return(-1);
So if an error happens in sync code, the state goes back to STATE_PREPARED. That looks even more hackish than my patch above, but if it's better, then the same should be done in _alpm_add_commit and _alpm_remove_commit. Otherwise, it should be removed and handled somewhere else, like with my patch. Which way should we take?
Since Dan apparently started merging my unregister branch, I removed the two above lines, since I think it doesn't make sense, and it's not consistent with the add and remove transactions.
From c2370ccbd322951d5c11f70096a440782de8c269 Mon Sep 17 00:00:00 2001 From: Chantry Xavier <shiningxc@gmail.com> Date: Fri, 17 Aug 2007 14:07:56 +0200 Subject: [PATCH] libalpm/sync.c : don't go back on STATE_PREPARED when committing fails.
In my opinion, a commiting transaction that failed isn't equivalent to a prepared transaction. Some things could have been done in the meantime. Signed-off-by: Chantry Xavier <shiningxc@gmail.com> --- lib/libalpm/sync.c | 2 -- 1 files changed, 0 insertions(+), 2 deletions(-) diff --git a/lib/libalpm/sync.c b/lib/libalpm/sync.c index 005123d..6c98aa3 100644 --- a/lib/libalpm/sync.c +++ b/lib/libalpm/sync.c @@ -1027,8 +1027,6 @@ int _alpm_sync_commit(pmtrans_t *trans, pmdb_t *db_local, alpm_list_t **data) error: _alpm_trans_free(tr); tr = NULL; - /* commiting failed, so this is still just a prepared transaction */ - trans->state = STATE_PREPARED; return(-1); } -- 1.5.2.5
On Fri, Aug 17, 2007 at 03:47:22PM +0200, Xavier wrote:
On Thu, Aug 09, 2007 at 12:06:06PM +0200, Xavier wrote:
While looking at FS#7652, I found something new : this problem doesn't happen on -S operations, only on -U, because of the following code in sync.c , _alpm_sync_commit function :
1027 error: 1028 _alpm_trans_free(tr); 1029 tr = NULL; 1030 /* commiting failed, so this is still just a prepared transaction */ 1031 trans->state = STATE_PREPARED; 1032 return(-1);
So if an error happens in sync code, the state goes back to STATE_PREPARED. That looks even more hackish than my patch above, but if it's better, then the same should be done in _alpm_add_commit and _alpm_remove_commit. Otherwise, it should be removed and handled somewhere else, like with my patch. Which way should we take?
Since Dan apparently started merging my unregister branch, I removed the two above lines, since I think it doesn't make sense, and it's not consistent with the add and remove transactions.
From c2370ccbd322951d5c11f70096a440782de8c269 Mon Sep 17 00:00:00 2001 From: Chantry Xavier <shiningxc@gmail.com> Date: Fri, 17 Aug 2007 14:07:56 +0200 Subject: [PATCH] libalpm/sync.c : don't go back on STATE_PREPARED when committing fails.
In my opinion, a commiting transaction that failed isn't equivalent to a prepared transaction. Some things could have been done in the meantime.
Signed-off-by: Chantry Xavier <shiningxc@gmail.com> --- lib/libalpm/sync.c | 2 -- 1 files changed, 0 insertions(+), 2 deletions(-)
diff --git a/lib/libalpm/sync.c b/lib/libalpm/sync.c index 005123d..6c98aa3 100644 --- a/lib/libalpm/sync.c +++ b/lib/libalpm/sync.c @@ -1027,8 +1027,6 @@ int _alpm_sync_commit(pmtrans_t *trans, pmdb_t *db_local, alpm_list_t **data) error: _alpm_trans_free(tr); tr = NULL; - /* commiting failed, so this is still just a prepared transaction */ - trans->state = STATE_PREPARED; return(-1); }
-- 1.5.2.5
That causes another problem though... in _alpm_trans_release, we have : 184 /* during a commit do not interrupt immediately, just after a target */ 185 if(trans->state == STATE_COMMITING || trans->state == STATE_INTERRUPTED) { 186 if(trans->state == STATE_COMMITING) { 187 trans->state = STATE_INTERRUPTED; 188 } 189 pm_errno = PM_ERR_TRANS_COMMITING; 190 return(-1); 191 } So since the state now stays on STATE_COMMITING (without going back to STATE_PREPARED) when a transaction fails, this code is now run, and the function returns here, without removing the lock. Now, I'm totally unable to fix this correctly, the thing is, I really don't understand the purpose of this part, what it's intended to do, in which cases, etc ... It annoyed me that pacman didn't remove the lock on each error (for example, conflicting targets), so I had to hack this, but that most likely kills totally what the original author had in mind, making the code worse than it is. I guess what I'm trying to say here is that I need someone to look at this, and fix it correctly. Maybe I could revert that previous change in sync.c, so that the state goes back to STATE_PREPARED on failure, and then do the same in add.c / remove.c , but I still don't like that idea..
From 7204d3afc160fe02f6d7d8a4be094a8cc0861dad Mon Sep 17 00:00:00 2001 From: Chantry Xavier <shiningxc@gmail.com> Date: Thu, 30 Aug 2007 00:39:57 +0200 Subject: [PATCH] libalpm/trans.c : remove the lock even on interrupted transactions.
Signed-off-by: Chantry Xavier <shiningxc@gmail.com> --- lib/libalpm/trans.c | 5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/lib/libalpm/trans.c b/lib/libalpm/trans.c index c0e38c7..fe37a1f 100644 --- a/lib/libalpm/trans.c +++ b/lib/libalpm/trans.c @@ -171,6 +171,7 @@ int SYMEXPORT alpm_trans_commit(alpm_list_t **data) int SYMEXPORT alpm_trans_release() { pmtrans_t *trans; + int ret = 0; ALPM_LOG_FUNC; @@ -187,7 +188,7 @@ int SYMEXPORT alpm_trans_release() trans->state = STATE_INTERRUPTED; } pm_errno = PM_ERR_TRANS_COMMITING; - return(-1); + ret = -1; } _alpm_trans_free(trans); @@ -205,7 +206,7 @@ int SYMEXPORT alpm_trans_release() alpm_option_get_lockfile()); } - return(0); + return(ret); } /** @} */ -- 1.5.2.5
Xavier wrote:
On Fri, Aug 17, 2007 at 03:47:22PM +0200, Xavier wrote:
On Thu, Aug 09, 2007 at 12:06:06PM +0200, Xavier wrote:
While looking at FS#7652, I found something new : this problem doesn't happen on -S operations, only on -U, because of the following code in sync.c , _alpm_sync_commit function :
1027 error: 1028 _alpm_trans_free(tr); 1029 tr = NULL; 1030 /* commiting failed, so this is still just a prepared transaction */ 1031 trans->state = STATE_PREPARED; 1032 return(-1);
So if an error happens in sync code, the state goes back to STATE_PREPARED. That looks even more hackish than my patch above, but if it's better, then the same should be done in _alpm_add_commit and _alpm_remove_commit. Otherwise, it should be removed and handled somewhere else, like with my patch. Which way should we take? Since Dan apparently started merging my unregister branch, I removed the two above lines, since I think it doesn't make sense, and it's not consistent with the add and remove transactions.
From c2370ccbd322951d5c11f70096a440782de8c269 Mon Sep 17 00:00:00 2001 From: Chantry Xavier <shiningxc@gmail.com> Date: Fri, 17 Aug 2007 14:07:56 +0200 Subject: [PATCH] libalpm/sync.c : don't go back on STATE_PREPARED when committing fails.
In my opinion, a commiting transaction that failed isn't equivalent to a prepared transaction. Some things could have been done in the meantime.
Signed-off-by: Chantry Xavier <shiningxc@gmail.com> --- lib/libalpm/sync.c | 2 -- 1 files changed, 0 insertions(+), 2 deletions(-)
diff --git a/lib/libalpm/sync.c b/lib/libalpm/sync.c index 005123d..6c98aa3 100644 --- a/lib/libalpm/sync.c +++ b/lib/libalpm/sync.c @@ -1027,8 +1027,6 @@ int _alpm_sync_commit(pmtrans_t *trans, pmdb_t *db_local, alpm_list_t **data) error: _alpm_trans_free(tr); tr = NULL; - /* commiting failed, so this is still just a prepared transaction */ - trans->state = STATE_PREPARED; return(-1); }
-- 1.5.2.5
That causes another problem though... in _alpm_trans_release, we have :
184 /* during a commit do not interrupt immediately, just after a target */ 185 if(trans->state == STATE_COMMITING || trans->state == STATE_INTERRUPTED) { 186 if(trans->state == STATE_COMMITING) { 187 trans->state = STATE_INTERRUPTED; 188 } 189 pm_errno = PM_ERR_TRANS_COMMITING; 190 return(-1); 191 }
So since the state now stays on STATE_COMMITING (without going back to STATE_PREPARED) when a transaction fails, this code is now run, and the function returns here, without removing the lock.
Now, I'm totally unable to fix this correctly, the thing is, I really don't understand the purpose of this part, what it's intended to do, in which cases, etc ...
It annoyed me that pacman didn't remove the lock on each error (for example, conflicting targets), so I had to hack this, but that most likely kills totally what the original author had in mind, making the code worse than it is.
I guess what I'm trying to say here is that I need someone to look at this, and fix it correctly. Maybe I could revert that previous change in sync.c, so that the state goes back to STATE_PREPARED on failure, and then do the same in add.c / remove.c , but I still don't like that idea..
I think this is another +1 for a rewrite. It seems for every bug we fix another harder to fix one pops up to take its place. Andrew
On Thu, Aug 30, 2007 at 11:50:51PM +0100, Andrew Fyfe wrote:
Xavier wrote:
That causes another problem though... in _alpm_trans_release, we have :
184 /* during a commit do not interrupt immediately, just after a target */ 185 if(trans->state == STATE_COMMITING || trans->state == STATE_INTERRUPTED) { 186 if(trans->state == STATE_COMMITING) { 187 trans->state = STATE_INTERRUPTED; 188 } 189 pm_errno = PM_ERR_TRANS_COMMITING; 190 return(-1); 191 }
So since the state now stays on STATE_COMMITING (without going back to STATE_PREPARED) when a transaction fails, this code is now run, and the function returns here, without removing the lock.
Now, I'm totally unable to fix this correctly, the thing is, I really don't understand the purpose of this part, what it's intended to do, in which cases, etc ...
It annoyed me that pacman didn't remove the lock on each error (for example, conflicting targets), so I had to hack this, but that most likely kills totally what the original author had in mind, making the code worse than it is.
I guess what I'm trying to say here is that I need someone to look at this, and fix it correctly. Maybe I could revert that previous change in sync.c, so that the state goes back to STATE_PREPARED on failure, and then do the same in add.c / remove.c , but I still don't like that idea..
I think this is another +1 for a rewrite. It seems for every bug we fix another harder to fix one pops up to take its place.
Andrew
Right, probably. But well, it at least requires a temporary fix in the meantime. I found where this part I don't understand was introduced. Fortunately, it is a small commit. Maybe looking at the commit as the whole, and the one line description can help understand it : http://projects.archlinux.org/git/?p=pacman.git;a=commit;h=54008798efcc9646f...
participants (2)
-
Andrew Fyfe
-
Xavier