[pacman-dev] [PATCH v2] Introduce alpm_dbs_update() function for parallel db updates
This is an equivalent of alpm_db_update but for multiplexed (parallel) download. The difference is that this function accepts list of databases to update. And then ALPM internals download it in parallel if possible. Add a stub for _alpm_multi_download the function that will do parallel payloads downloads in the future. Introduce dload_payload->filepath field that contains url path to the file we download. It is like fileurl field but does not contain protocol/server part. The rationale for having this field is that with the curl multidownload the server retry logic is going to move to a curl callback. And the callback needs to be able to reconstruct the 'next' fileurl. One will be able to do it by getting the next server url from 'servers' list and then concat with filepath. Once the 'parallel download' refactoring is over 'fileurl' field will go away. Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com> --- lib/libalpm/alpm.h | 29 ++++++++++ lib/libalpm/be_sync.c | 128 ++++++++++++++++++++++++++++++++++++++++++ lib/libalpm/dload.c | 12 ++++ lib/libalpm/dload.h | 5 ++ 4 files changed, 174 insertions(+) diff --git a/lib/libalpm/alpm.h b/lib/libalpm/alpm.h index 5d559db1..c985429d 100644 --- a/lib/libalpm/alpm.h +++ b/lib/libalpm/alpm.h @@ -1049,6 +1049,35 @@ int alpm_db_remove_server(alpm_db_t *db, const char *url); */ int alpm_db_update(int force, alpm_db_t *db); +/** Update package databases + * + * An update of the package databases \a dbs will be attempted. Unless + * \a force is true, the update will only be performed if the remote + * databases were modified since the last update. + * + * This operation requires a database lock, and will return an applicable error + * if the lock could not be obtained. + * + * Example: + * @code + * alpm_list_t *dbs = alpm_get_syncdbs(); + * ret = alpm_dbs_update(config->handle, dbs, force); + * if(ret < 0) { + * pm_printf(ALPM_LOG_ERROR, _("failed to synchronize all databases (%s)\n"), + * alpm_strerror(alpm_errno(config->handle))); + * } + * @endcode + * + * @note After a successful update, the \link alpm_db_get_pkgcache() + * package cache \endlink will be invalidated + * @param handle the context handle + * @param dbs list of package databases to update + * @param force if true, then forces the update, otherwise update only in case + * the databases aren't up to date + * @return 0 on success, -1 on error (pm_errno is set accordingly) + */ +int alpm_dbs_update(alpm_handle_t *handle, alpm_list_t *dbs, int force); + /** Get a package entry from a package database. * @param db pointer to the package database to get the package from * @param name of the package diff --git a/lib/libalpm/be_sync.c b/lib/libalpm/be_sync.c index aafed15d..497e1054 100644 --- a/lib/libalpm/be_sync.c +++ b/lib/libalpm/be_sync.c @@ -301,6 +301,134 @@ int SYMEXPORT alpm_db_update(int force, alpm_db_t *db) return ret; } +int SYMEXPORT alpm_dbs_update(alpm_handle_t *handle, alpm_list_t *dbs, int force) { + char *syncpath; + const char *dbext = handle->dbext; + alpm_list_t *i; + int ret = -1; + mode_t oldmask; + alpm_list_t *payloads = NULL; + + /* Sanity checks */ + ASSERT(dbs != NULL, return -1); + handle->pm_errno = ALPM_ERR_OK; + + syncpath = get_sync_dir(handle); + ASSERT(syncpath != NULL, return -1); + + /* make sure we have a sane umask */ + oldmask = umask(0022); + + for(i = dbs; i; i = i->next) { + alpm_db_t *db = i->data; + int dbforce = force; + struct dload_payload *payload = NULL; + size_t len; + int siglevel; + + if(!(db->usage & ALPM_DB_USAGE_SYNC)) { + continue; + } + + ASSERT(db != handle->db_local, RET_ERR(handle, ALPM_ERR_WRONG_ARGS, -1)); + ASSERT(db->servers != NULL, RET_ERR(handle, ALPM_ERR_SERVER_NONE, -1)); + + /* force update of invalid databases to fix potential mismatched database/signature */ + if(db->status & DB_STATUS_INVALID) { + dbforce = 1; + } + + CALLOC(payload, 1, sizeof(*payload), RET_ERR(handle, ALPM_ERR_MEMORY, -1)); + + /* set hard upper limit of 128MiB */ + payload->max_size = 128 * 1024 * 1024; + ASSERT(db->servers != NULL, RET_ERR(handle, ALPM_ERR_SERVER_NONE, -1)); + payload->servers = db->servers; + + /* print server + filename into a buffer */ + len = strlen(db->treename) + strlen(dbext) + 1; + MALLOC(payload->filepath, len, GOTO_ERR(handle, ALPM_ERR_MEMORY, cleanup)); + snprintf(payload->filepath, len, "%s%s", db->treename, dbext); + payload->handle = handle; + payload->force = dbforce; + payload->unlink_on_fail = 1; + + payloads = alpm_list_add(payloads, payload); + + siglevel = alpm_db_get_siglevel(db); + if(siglevel & ALPM_SIG_DATABASE) { + struct dload_payload *sig_payload; + CALLOC(sig_payload, 1, sizeof(*sig_payload), GOTO_ERR(handle, ALPM_ERR_MEMORY, cleanup)); + + /* print filename into a buffer (leave space for separator and .sig) */ + len = strlen(db->treename) + strlen(dbext) + 5; + MALLOC(sig_payload->filepath, len, goto cleanup); + snprintf(sig_payload->filepath, len, "%s%s.sig", db->treename, dbext); + + sig_payload->handle = handle; + sig_payload->force = dbforce; + sig_payload->errors_ok = (siglevel & ALPM_SIG_DATABASE_OPTIONAL); + + /* set hard upper limit of 16KiB */ + sig_payload->max_size = 16 * 1024; + sig_payload->servers = db->servers; + + payloads = alpm_list_add(payloads, sig_payload); + } + } + + /* attempt to grab a lock */ + if(_alpm_handle_lock(handle)) { + GOTO_ERR(handle, ALPM_ERR_HANDLE_LOCK, cleanup); + } + + ret = _alpm_multi_download(handle, payloads, syncpath); + if(ret < 0) { + goto cleanup; + } + + for(i = dbs; i; i = i->next) { + alpm_db_t *db = i->data; + if(!(db->usage & ALPM_DB_USAGE_SYNC)) { + continue; + } + + /* Cache needs to be rebuilt */ + _alpm_db_free_pkgcache(db); + + /* clear all status flags regarding validity/existence */ + db->status &= ~DB_STATUS_VALID; + db->status &= ~DB_STATUS_INVALID; + db->status &= ~DB_STATUS_EXISTS; + db->status &= ~DB_STATUS_MISSING; + + /* if the download failed skip validation to preserve the download error */ + if(sync_db_validate(db) != 0) { + /* pm_errno should be set */ + ret = -1; + } + } + +cleanup: + _alpm_handle_unlock(handle); + + if(ret == -1) { + /* pm_errno was set by the download code */ + _alpm_log(handle, ALPM_LOG_DEBUG, "failed to sync dbs: %s\n", + alpm_strerror(handle->pm_errno)); + } else { + handle->pm_errno = ALPM_ERR_OK; + } + + if(payloads) { + alpm_list_free_inner(payloads, (alpm_list_fn_free)_alpm_dload_payload_reset); + FREELIST(payloads); + } + free(syncpath); + umask(oldmask); + return ret; +} + /* Forward decl so I don't reorganize the whole file right now */ static int sync_db_read(alpm_db_t *db, struct archive *archive, struct archive_entry *entry, alpm_pkg_t **likely_pkg); diff --git a/lib/libalpm/dload.c b/lib/libalpm/dload.c index 670da03d..7cd3e3a4 100644 --- a/lib/libalpm/dload.c +++ b/lib/libalpm/dload.c @@ -636,6 +636,16 @@ int _alpm_download(struct dload_payload *payload, const char *localpath, } } +int _alpm_multi_download(alpm_handle_t *handle, + alpm_list_t *payloads /* struct dload_payload */, + const char *localpath) +{ + (void)handle; + (void)payloads; + (void)localpath; + return 0; +} + static char *filecache_find_url(alpm_handle_t *handle, const char *url) { const char *filebase = strrchr(url, '/'); @@ -738,6 +748,7 @@ void _alpm_dload_payload_reset(struct dload_payload *payload) FREE(payload->destfile_name); FREE(payload->content_disp_name); FREE(payload->fileurl); + FREE(payload->filepath); *payload = (struct dload_payload){0}; } @@ -746,6 +757,7 @@ void _alpm_dload_payload_reset_for_retry(struct dload_payload *payload) ASSERT(payload, return); FREE(payload->fileurl); + FREE(payload->filepath); payload->initial_size += payload->prevprogress; payload->prevprogress = 0; payload->unlink_on_fail = 0; diff --git a/lib/libalpm/dload.h b/lib/libalpm/dload.h index 1e8f75f3..3eb7fbe1 100644 --- a/lib/libalpm/dload.h +++ b/lib/libalpm/dload.h @@ -31,6 +31,7 @@ struct dload_payload { char *destfile_name; char *content_disp_name; char *fileurl; + char *filepath; /* download URL path */ alpm_list_t *servers; long respcode; off_t initial_size; @@ -53,4 +54,8 @@ void _alpm_dload_payload_reset_for_retry(struct dload_payload *payload); int _alpm_download(struct dload_payload *payload, const char *localpath, char **final_file, const char **final_url); +int _alpm_multi_download(alpm_handle_t *handle, + alpm_list_t *payloads /* struct dload_payload */, + const char *localpath); + #endif /* ALPM_DLOAD_H */ -- 2.25.1
On 9/3/20 6:59 am, Anatol Pomozov wrote:
This is an equivalent of alpm_db_update but for multiplexed (parallel) download. The difference is that this function accepts list of databases to update. And then ALPM internals download it in parallel if possible.
Add a stub for _alpm_multi_download the function that will do parallel payloads downloads in the future.
Introduce dload_payload->filepath field that contains url path to the file we download. It is like fileurl field but does not contain protocol/server part. The rationale for having this field is that with the curl multidownload the server retry logic is going to move to a curl callback. And the callback needs to be able to reconstruct the 'next' fileurl. One will be able to do it by getting the next server url from 'servers' list and then concat with filepath. Once the 'parallel download' refactoring is over 'fileurl' field will go away.
Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com> --- lib/libalpm/alpm.h | 29 ++++++++++ lib/libalpm/be_sync.c | 128 ++++++++++++++++++++++++++++++++++++++++++ lib/libalpm/dload.c | 12 ++++ lib/libalpm/dload.h | 5 ++ 4 files changed, 174 insertions(+)
diff --git a/lib/libalpm/alpm.h b/lib/libalpm/alpm.h index 5d559db1..c985429d 100644 --- a/lib/libalpm/alpm.h +++ b/lib/libalpm/alpm.h @@ -1049,6 +1049,35 @@ int alpm_db_remove_server(alpm_db_t *db, const char *url); */ int alpm_db_update(int force, alpm_db_t *db);
+/** Update package databases + * + * An update of the package databases \a dbs will be attempted. Unless
...package databases in the list \a dbs will ...
+ * \a force is true, the update will only be performed if the remote + * databases were modified since the last update. + * + * This operation requires a database lock, and will return an applicable error + * if the lock could not be obtained. + * + * Example: + * @code + * alpm_list_t *dbs = alpm_get_syncdbs(); + * ret = alpm_dbs_update(config->handle, dbs, force); + * if(ret < 0) { + * pm_printf(ALPM_LOG_ERROR, _("failed to synchronize all databases (%s)\n"), + * alpm_strerror(alpm_errno(config->handle))); + * } + * @endcode + * + * @note After a successful update, the \link alpm_db_get_pkgcache() + * package cache \endlink will be invalidated + * @param handle the context handle + * @param dbs list of package databases to update + * @param force if true, then forces the update, otherwise update only in case + * the databases aren't up to date + * @return 0 on success, -1 on error (pm_errno is set accordingly) + */ +int alpm_dbs_update(alpm_handle_t *handle, alpm_list_t *dbs, int force); + /** Get a package entry from a package database. * @param db pointer to the package database to get the package from * @param name of the package diff --git a/lib/libalpm/be_sync.c b/lib/libalpm/be_sync.c index aafed15d..497e1054 100644 --- a/lib/libalpm/be_sync.c +++ b/lib/libalpm/be_sync.c @@ -301,6 +301,134 @@ int SYMEXPORT alpm_db_update(int force, alpm_db_t *db) return ret; }
+int SYMEXPORT alpm_dbs_update(alpm_handle_t *handle, alpm_list_t *dbs, int force) { + char *syncpath; + const char *dbext = handle->dbext;
I'll note that we don't test if handle is NULL throughout the codebase, so OK...
+ alpm_list_t *i; + int ret = -1; + mode_t oldmask; + alpm_list_t *payloads = NULL; + + /* Sanity checks */ + ASSERT(dbs != NULL, return -1); + handle->pm_errno = ALPM_ERR_OK; + + syncpath = get_sync_dir(handle); + ASSERT(syncpath != NULL, return -1); + + /* make sure we have a sane umask */ + oldmask = umask(0022); +
OK
+ for(i = dbs; i; i = i->next) { + alpm_db_t *db = i->data; + int dbforce = force; + struct dload_payload *payload = NULL; + size_t len; + int siglevel; + + if(!(db->usage & ALPM_DB_USAGE_SYNC)) { + continue; + } + + ASSERT(db != handle->db_local, RET_ERR(handle, ALPM_ERR_WRONG_ARGS, -1)); + ASSERT(db->servers != NULL, RET_ERR(handle, ALPM_ERR_SERVER_NONE, -1)); +
OK.
+ /* force update of invalid databases to fix potential mismatched database/signature */ + if(db->status & DB_STATUS_INVALID) { + dbforce = 1; + }
OK.
+ + CALLOC(payload, 1, sizeof(*payload), RET_ERR(handle, ALPM_ERR_MEMORY, -1)); + + /* set hard upper limit of 128MiB */ + payload->max_size = 128 * 1024 * 1024; + ASSERT(db->servers != NULL, RET_ERR(handle, ALPM_ERR_SERVER_NONE, -1));
Already done above.
+ payload->servers = db->servers; + + /* print server + filename into a buffer */ + len = strlen(db->treename) + strlen(dbext) + 1; + MALLOC(payload->filepath, len, GOTO_ERR(handle, ALPM_ERR_MEMORY, cleanup)); + snprintf(payload->filepath, len, "%s%s", db->treename, dbext); + payload->handle = handle; + payload->force = dbforce; + payload->unlink_on_fail = 1; + + payloads = alpm_list_add(payloads, payload);
OK
+ + siglevel = alpm_db_get_siglevel(db); + if(siglevel & ALPM_SIG_DATABASE) { + struct dload_payload *sig_payload; + CALLOC(sig_payload, 1, sizeof(*sig_payload), GOTO_ERR(handle, ALPM_ERR_MEMORY, cleanup)); +
OK.
+ /* print filename into a buffer (leave space for separator and .sig) */ + len = strlen(db->treename) + strlen(dbext) + 5; + MALLOC(sig_payload->filepath, len, goto cleanup); + snprintf(sig_payload->filepath, len, "%s%s.sig", db->treename, dbext); +
OK.
+ sig_payload->handle = handle; + sig_payload->force = dbforce;
Always "force" the signature. sig_payload->force = 1;
+ sig_payload->errors_ok = (siglevel & ALPM_SIG_DATABASE_OPTIONAL); + + /* set hard upper limit of 16KiB */ + sig_payload->max_size = 16 * 1024; + sig_payload->servers = db->servers; + + payloads = alpm_list_add(payloads, sig_payload);
OK.
+ } + } + + /* attempt to grab a lock */ + if(_alpm_handle_lock(handle)) { + GOTO_ERR(handle, ALPM_ERR_HANDLE_LOCK, cleanup); + } +
This should be done much earlier in the function.
+ ret = _alpm_multi_download(handle, payloads, syncpath); + if(ret < 0) { + goto cleanup; + } +
OK.
+ for(i = dbs; i; i = i->next) { + alpm_db_t *db = i->data; + if(!(db->usage & ALPM_DB_USAGE_SYNC)) { + continue; + } + + /* Cache needs to be rebuilt */ + _alpm_db_free_pkgcache(db); + + /* clear all status flags regarding validity/existence */ + db->status &= ~DB_STATUS_VALID; + db->status &= ~DB_STATUS_INVALID; + db->status &= ~DB_STATUS_EXISTS; + db->status &= ~DB_STATUS_MISSING; + + /* if the download failed skip validation to preserve the download error */ + if(sync_db_validate(db) != 0) { + /* pm_errno should be set */ + ret = -1;
There are some issues here... If one database fails to download, but the rest succeed, we skip validating the databases that worked. Even if all succeeded, we only set the error for the validation failure for the last database in the list.
+ } + } + +cleanup: + _alpm_handle_unlock(handle); + + if(ret == -1) { + /* pm_errno was set by the download code */ + _alpm_log(handle, ALPM_LOG_DEBUG, "failed to sync dbs: %s\n", + alpm_strerror(handle->pm_errno));
Following on from above, this needs work. There could be multiple distinct failures on different databases, and only the last one is reported. Not too bad, given it is only for debug, but it would be better to report the errors to debug output as we came across them.
+ } else { + handle->pm_errno = ALPM_ERR_OK; + } + + if(payloads) { + alpm_list_free_inner(payloads, (alpm_list_fn_free)_alpm_dload_payload_reset); + FREELIST(payloads); + } + free(syncpath); + umask(oldmask); + return ret; +}
OK.
+ /* Forward decl so I don't reorganize the whole file right now */ static int sync_db_read(alpm_db_t *db, struct archive *archive, struct archive_entry *entry, alpm_pkg_t **likely_pkg); diff --git a/lib/libalpm/dload.c b/lib/libalpm/dload.c index 670da03d..7cd3e3a4 100644 --- a/lib/libalpm/dload.c +++ b/lib/libalpm/dload.c @@ -636,6 +636,16 @@ int _alpm_download(struct dload_payload *payload, const char *localpath, } }
+int _alpm_multi_download(alpm_handle_t *handle, + alpm_list_t *payloads /* struct dload_payload */, + const char *localpath) +{ + (void)handle; + (void)payloads; + (void)localpath; + return 0; +} +
OK.
static char *filecache_find_url(alpm_handle_t *handle, const char *url) { const char *filebase = strrchr(url, '/'); @@ -738,6 +748,7 @@ void _alpm_dload_payload_reset(struct dload_payload *payload) FREE(payload->destfile_name); FREE(payload->content_disp_name); FREE(payload->fileurl); + FREE(payload->filepath); *payload = (struct dload_payload){0}; }
OK.
@@ -746,6 +757,7 @@ void _alpm_dload_payload_reset_for_retry(struct dload_payload *payload) ASSERT(payload, return);
FREE(payload->fileurl); + FREE(payload->filepath); payload->initial_size += payload->prevprogress; payload->prevprogress = 0; payload->unlink_on_fail = 0;
OK.
diff --git a/lib/libalpm/dload.h b/lib/libalpm/dload.h index 1e8f75f3..3eb7fbe1 100644 --- a/lib/libalpm/dload.h +++ b/lib/libalpm/dload.h @@ -31,6 +31,7 @@ struct dload_payload { char *destfile_name; char *content_disp_name; char *fileurl; + char *filepath; /* download URL path */
This struct is poorly documented I know... but remote_name seems to store the same information? @Dave: can that be different in weird server configurations?
alpm_list_t *servers; long respcode; off_t initial_size; @@ -53,4 +54,8 @@ void _alpm_dload_payload_reset_for_retry(struct dload_payload *payload); int _alpm_download(struct dload_payload *payload, const char *localpath, char **final_file, const char **final_url);
+int _alpm_multi_download(alpm_handle_t *handle, + alpm_list_t *payloads /* struct dload_payload */, + const char *localpath); +
OK.
#endif /* ALPM_DLOAD_H */
On 13/3/20 11:16 am, Allan McRae wrote:
diff --git a/lib/libalpm/dload.h b/lib/libalpm/dload.h index 1e8f75f3..3eb7fbe1 100644 --- a/lib/libalpm/dload.h +++ b/lib/libalpm/dload.h @@ -31,6 +31,7 @@ struct dload_payload { char *destfile_name; char *content_disp_name; char *fileurl; + char *filepath; /* download URL path */ This struct is poorly documented I know... but remote_name seems to store the same information?
@Dave: can that be different in weird server configurations?
After chatting on IRC, we think remote_name can be different with redirection. So adding the extra field is fine. We need a testsuite for the download part of libalpm... A
Hi On Thu, Mar 12, 2020 at 6:16 PM Allan McRae <allan@archlinux.org> wrote:
On 9/3/20 6:59 am, Anatol Pomozov wrote:
This is an equivalent of alpm_db_update but for multiplexed (parallel) download. The difference is that this function accepts list of databases to update. And then ALPM internals download it in parallel if possible.
Add a stub for _alpm_multi_download the function that will do parallel payloads downloads in the future.
Introduce dload_payload->filepath field that contains url path to the file we download. It is like fileurl field but does not contain protocol/server part. The rationale for having this field is that with the curl multidownload the server retry logic is going to move to a curl callback. And the callback needs to be able to reconstruct the 'next' fileurl. One will be able to do it by getting the next server url from 'servers' list and then concat with filepath. Once the 'parallel download' refactoring is over 'fileurl' field will go away.
Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com> --- lib/libalpm/alpm.h | 29 ++++++++++ lib/libalpm/be_sync.c | 128 ++++++++++++++++++++++++++++++++++++++++++ lib/libalpm/dload.c | 12 ++++ lib/libalpm/dload.h | 5 ++ 4 files changed, 174 insertions(+)
diff --git a/lib/libalpm/alpm.h b/lib/libalpm/alpm.h index 5d559db1..c985429d 100644 --- a/lib/libalpm/alpm.h +++ b/lib/libalpm/alpm.h @@ -1049,6 +1049,35 @@ int alpm_db_remove_server(alpm_db_t *db, const char *url); */ int alpm_db_update(int force, alpm_db_t *db);
+/** Update package databases + * + * An update of the package databases \a dbs will be attempted. Unless
...package databases in the list \a dbs will ...
Done
+ * \a force is true, the update will only be performed if the remote + * databases were modified since the last update. + * + * This operation requires a database lock, and will return an applicable error + * if the lock could not be obtained. + * + * Example: + * @code + * alpm_list_t *dbs = alpm_get_syncdbs(); + * ret = alpm_dbs_update(config->handle, dbs, force); + * if(ret < 0) { + * pm_printf(ALPM_LOG_ERROR, _("failed to synchronize all databases (%s)\n"), + * alpm_strerror(alpm_errno(config->handle))); + * } + * @endcode + * + * @note After a successful update, the \link alpm_db_get_pkgcache() + * package cache \endlink will be invalidated + * @param handle the context handle + * @param dbs list of package databases to update + * @param force if true, then forces the update, otherwise update only in case + * the databases aren't up to date + * @return 0 on success, -1 on error (pm_errno is set accordingly) + */ +int alpm_dbs_update(alpm_handle_t *handle, alpm_list_t *dbs, int force); + /** Get a package entry from a package database. * @param db pointer to the package database to get the package from * @param name of the package diff --git a/lib/libalpm/be_sync.c b/lib/libalpm/be_sync.c index aafed15d..497e1054 100644 --- a/lib/libalpm/be_sync.c +++ b/lib/libalpm/be_sync.c @@ -301,6 +301,134 @@ int SYMEXPORT alpm_db_update(int force, alpm_db_t *db) return ret; }
+int SYMEXPORT alpm_dbs_update(alpm_handle_t *handle, alpm_list_t *dbs, int force) { + char *syncpath; + const char *dbext = handle->dbext;
I'll note that we don't test if handle is NULL throughout the codebase,
Added CHECK_HANDLE()
so OK...
+ alpm_list_t *i; + int ret = -1; + mode_t oldmask; + alpm_list_t *payloads = NULL; + + /* Sanity checks */ + ASSERT(dbs != NULL, return -1); + handle->pm_errno = ALPM_ERR_OK; + + syncpath = get_sync_dir(handle); + ASSERT(syncpath != NULL, return -1); + + /* make sure we have a sane umask */ + oldmask = umask(0022); +
OK
+ for(i = dbs; i; i = i->next) { + alpm_db_t *db = i->data; + int dbforce = force; + struct dload_payload *payload = NULL; + size_t len; + int siglevel; + + if(!(db->usage & ALPM_DB_USAGE_SYNC)) { + continue; + } + + ASSERT(db != handle->db_local, RET_ERR(handle, ALPM_ERR_WRONG_ARGS, -1)); + ASSERT(db->servers != NULL, RET_ERR(handle, ALPM_ERR_SERVER_NONE, -1)); +
OK.
+ /* force update of invalid databases to fix potential mismatched database/signature */ + if(db->status & DB_STATUS_INVALID) { + dbforce = 1; + }
OK.
+ + CALLOC(payload, 1, sizeof(*payload), RET_ERR(handle, ALPM_ERR_MEMORY, -1)); + + /* set hard upper limit of 128MiB */ + payload->max_size = 128 * 1024 * 1024; + ASSERT(db->servers != NULL, RET_ERR(handle, ALPM_ERR_SERVER_NONE, -1));
Already done above.
Ok, removing the duplicate.
+ payload->servers = db->servers; + + /* print server + filename into a buffer */ + len = strlen(db->treename) + strlen(dbext) + 1; + MALLOC(payload->filepath, len, GOTO_ERR(handle, ALPM_ERR_MEMORY, cleanup)); + snprintf(payload->filepath, len, "%s%s", db->treename, dbext); + payload->handle = handle; + payload->force = dbforce; + payload->unlink_on_fail = 1; + + payloads = alpm_list_add(payloads, payload);
OK
+ + siglevel = alpm_db_get_siglevel(db); + if(siglevel & ALPM_SIG_DATABASE) { + struct dload_payload *sig_payload; + CALLOC(sig_payload, 1, sizeof(*sig_payload), GOTO_ERR(handle, ALPM_ERR_MEMORY, cleanup)); +
OK.
+ /* print filename into a buffer (leave space for separator and .sig) */ + len = strlen(db->treename) + strlen(dbext) + 5; + MALLOC(sig_payload->filepath, len, goto cleanup); + snprintf(sig_payload->filepath, len, "%s%s.sig", db->treename, dbext); +
OK.
+ sig_payload->handle = handle; + sig_payload->force = dbforce;
Always "force" the signature. sig_payload->force = 1;
Good catch, done.
+ sig_payload->errors_ok = (siglevel & ALPM_SIG_DATABASE_OPTIONAL); + + /* set hard upper limit of 16KiB */ + sig_payload->max_size = 16 * 1024; + sig_payload->servers = db->servers; + + payloads = alpm_list_add(payloads, sig_payload);
OK.
+ } + } + + /* attempt to grab a lock */ + if(_alpm_handle_lock(handle)) { + GOTO_ERR(handle, ALPM_ERR_HANDLE_LOCK, cleanup); + } +
This should be done much earlier in the function.
Sure I can move it above to make it more consistent with the rest of the code. Although technically we need this lock only when we access concurrently modifiable data on disk (such as database storage). The code above iterates over non-modifiable list and reads db properties that are also read-only.
+ ret = _alpm_multi_download(handle, payloads, syncpath); + if(ret < 0) { + goto cleanup; + } +
OK.
+ for(i = dbs; i; i = i->next) { + alpm_db_t *db = i->data; + if(!(db->usage & ALPM_DB_USAGE_SYNC)) { + continue; + } + + /* Cache needs to be rebuilt */ + _alpm_db_free_pkgcache(db); + + /* clear all status flags regarding validity/existence */ + db->status &= ~DB_STATUS_VALID; + db->status &= ~DB_STATUS_INVALID; + db->status &= ~DB_STATUS_EXISTS; + db->status &= ~DB_STATUS_MISSING; + + /* if the download failed skip validation to preserve the download error */ + if(sync_db_validate(db) != 0) { + /* pm_errno should be set */ + ret = -1;
There are some issues here... If one database fails to download, but the rest succeed, we skip validating the databases that worked.
Yep, if any file fails to download then we go to cleanup right after _alpm_multi_download(). And then error code -1 returned to the client. Isn't it what we want?
Even if all succeeded, we only set the error for the validation failure for the last database in the list.
return code is set to "-1" if *any* db fails to validate.
+ } + } + +cleanup: + _alpm_handle_unlock(handle); + + if(ret == -1) { + /* pm_errno was set by the download code */ + _alpm_log(handle, ALPM_LOG_DEBUG, "failed to sync dbs: %s\n", + alpm_strerror(handle->pm_errno));
Following on from above, this needs work. There could be multiple distinct failures on different databases, and only the last one is reported.
Per database download events (such as download progress, download error, up-to-date) are going to be reported via callback. Thus client will have a chance to react to any issues right after the download is complete (or failed). The idea was proposed here https://www.mail-archive.com/pacman-dev@archlinux.org/msg17770.html and implemented in my patch set here https://github.com/anatol/pacman/commit/ab3412fe0aa2d8bb64f524c6849664738cb9.... The CL for the download callback will follow later. alpm_dbs_update() return code is a summary for the whole download transaction. return code 0 essentially means "all downloads are completed successfully", and -1 returned otherwise.
Not too bad, given it is only for debug, but it would be better to report the errors to debug output as we came across them.
+ } else { + handle->pm_errno = ALPM_ERR_OK; + } + + if(payloads) { + alpm_list_free_inner(payloads, (alpm_list_fn_free)_alpm_dload_payload_reset); + FREELIST(payloads); + } + free(syncpath); + umask(oldmask); + return ret; +}
OK.
+ /* Forward decl so I don't reorganize the whole file right now */ static int sync_db_read(alpm_db_t *db, struct archive *archive, struct archive_entry *entry, alpm_pkg_t **likely_pkg); diff --git a/lib/libalpm/dload.c b/lib/libalpm/dload.c index 670da03d..7cd3e3a4 100644 --- a/lib/libalpm/dload.c +++ b/lib/libalpm/dload.c @@ -636,6 +636,16 @@ int _alpm_download(struct dload_payload *payload, const char *localpath, } }
+int _alpm_multi_download(alpm_handle_t *handle, + alpm_list_t *payloads /* struct dload_payload */, + const char *localpath) +{ + (void)handle; + (void)payloads; + (void)localpath; + return 0; +} +
OK.
static char *filecache_find_url(alpm_handle_t *handle, const char *url) { const char *filebase = strrchr(url, '/'); @@ -738,6 +748,7 @@ void _alpm_dload_payload_reset(struct dload_payload *payload) FREE(payload->destfile_name); FREE(payload->content_disp_name); FREE(payload->fileurl); + FREE(payload->filepath); *payload = (struct dload_payload){0}; }
OK.
@@ -746,6 +757,7 @@ void _alpm_dload_payload_reset_for_retry(struct dload_payload *payload) ASSERT(payload, return);
FREE(payload->fileurl); + FREE(payload->filepath); payload->initial_size += payload->prevprogress; payload->prevprogress = 0; payload->unlink_on_fail = 0;
OK.
diff --git a/lib/libalpm/dload.h b/lib/libalpm/dload.h index 1e8f75f3..3eb7fbe1 100644 --- a/lib/libalpm/dload.h +++ b/lib/libalpm/dload.h @@ -31,6 +31,7 @@ struct dload_payload { char *destfile_name; char *content_disp_name; char *fileurl; + char *filepath; /* download URL path */
This struct is poorly documented I know... but remote_name seems to store the same information?
@Dave: can that be different in weird server configurations?
alpm_list_t *servers; long respcode; off_t initial_size; @@ -53,4 +54,8 @@ void _alpm_dload_payload_reset_for_retry(struct dload_payload *payload); int _alpm_download(struct dload_payload *payload, const char *localpath, char **final_file, const char **final_url);
+int _alpm_multi_download(alpm_handle_t *handle, + alpm_list_t *payloads /* struct dload_payload */, + const char *localpath); +
OK.
#endif /* ALPM_DLOAD_H */
I am going to send an updated patch in a minute. PTAL.
On 27/3/20 6:17 am, Anatol Pomozov wrote:
Hi
Most comments were addressed. See below:
On Thu, Mar 12, 2020 at 6:16 PM Allan McRae <allan@archlinux.org> wrote:
On 9/3/20 6:59 am, Anatol Pomozov wrote:
<snipping...>
+ + /* attempt to grab a lock */ + if(_alpm_handle_lock(handle)) { + GOTO_ERR(handle, ALPM_ERR_HANDLE_LOCK, cleanup); + } +
This should be done much earlier in the function.
Sure I can move it above to make it more consistent with the rest of the code.
Although technically we need this lock only when we access concurrently modifiable data on disk (such as database storage). The code above iterates over non-modifiable list and reads db properties that are also read-only.
We don't have particularly fine grained locking here, so I prefer the function to bail as early as possible. <snip>
+ for(i = dbs; i; i = i->next) { + alpm_db_t *db = i->data; + if(!(db->usage & ALPM_DB_USAGE_SYNC)) { + continue; + } + + /* Cache needs to be rebuilt */ + _alpm_db_free_pkgcache(db); + + /* clear all status flags regarding validity/existence */ + db->status &= ~DB_STATUS_VALID; + db->status &= ~DB_STATUS_INVALID; + db->status &= ~DB_STATUS_EXISTS; + db->status &= ~DB_STATUS_MISSING; + + /* if the download failed skip validation to preserve the download error */ + if(sync_db_validate(db) != 0) { + /* pm_errno should be set */ + ret = -1;
There are some issues here... If one database fails to download, but the rest succeed, we skip validating the databases that worked.
Yep, if any file fails to download then we go to cleanup right after _alpm_multi_download(). And then error code -1 returned to the client. Isn't it what we want?
Even if all succeeded, we only set the error for the validation failure for the last database in the list.
return code is set to "-1" if *any* db fails to validate.
My concern was about fine grained errors being lost but...
+ } + } + +cleanup: + _alpm_handle_unlock(handle); + + if(ret == -1) { + /* pm_errno was set by the download code */ + _alpm_log(handle, ALPM_LOG_DEBUG, "failed to sync dbs: %s\n", + alpm_strerror(handle->pm_errno));
Following on from above, this needs work. There could be multiple distinct failures on different databases, and only the last one is reported.
Per database download events (such as download progress, download error, up-to-date) are going to be reported via callback. Thus client will have a chance to react to any issues right after the download is complete (or failed).
The idea was proposed here https://www.mail-archive.com/pacman-dev@archlinux.org/msg17770.html and implemented in my patch set here https://github.com/anatol/pacman/commit/ab3412fe0aa2d8bb64f524c6849664738cb9....
The CL for the download callback will follow later.
... this looks like it will do what I want to be able to handle. I had been working on a patchset that would download dbs to a temporary location and only move them to the sync db location once validated. Seems there is enough individual error handling to handle that, or that can be alter to handle that. I will put the patch on my patchqueue branch. A
Hi On Thu, Mar 12, 2020 at 6:16 PM Allan McRae <allan@archlinux.org> wrote:
On 9/3/20 6:59 am, Anatol Pomozov wrote:
This is an equivalent of alpm_db_update but for multiplexed (parallel) download. The difference is that this function accepts list of databases to update. And then ALPM internals download it in parallel if possible.
Add a stub for _alpm_multi_download the function that will do parallel payloads downloads in the future.
Introduce dload_payload->filepath field that contains url path to the file we download. It is like fileurl field but does not contain protocol/server part. The rationale for having this field is that with the curl multidownload the server retry logic is going to move to a curl callback. And the callback needs to be able to reconstruct the 'next' fileurl. One will be able to do it by getting the next server url from 'servers' list and then concat with filepath. Once the 'parallel download' refactoring is over 'fileurl' field will go away.
Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com> --- lib/libalpm/alpm.h | 29 ++++++++++ lib/libalpm/be_sync.c | 128 ++++++++++++++++++++++++++++++++++++++++++ lib/libalpm/dload.c | 12 ++++ lib/libalpm/dload.h | 5 ++ 4 files changed, 174 insertions(+)
diff --git a/lib/libalpm/alpm.h b/lib/libalpm/alpm.h index 5d559db1..c985429d 100644 --- a/lib/libalpm/alpm.h +++ b/lib/libalpm/alpm.h @@ -1049,6 +1049,35 @@ int alpm_db_remove_server(alpm_db_t *db, const char *url); */ int alpm_db_update(int force, alpm_db_t *db);
+/** Update package databases + * + * An update of the package databases \a dbs will be attempted. Unless
...package databases in the list \a dbs will ...
+ * \a force is true, the update will only be performed if the remote + * databases were modified since the last update. + * + * This operation requires a database lock, and will return an applicable error + * if the lock could not be obtained. + * + * Example: + * @code + * alpm_list_t *dbs = alpm_get_syncdbs(); + * ret = alpm_dbs_update(config->handle, dbs, force); + * if(ret < 0) { + * pm_printf(ALPM_LOG_ERROR, _("failed to synchronize all databases (%s)\n"), + * alpm_strerror(alpm_errno(config->handle))); + * } + * @endcode + * + * @note After a successful update, the \link alpm_db_get_pkgcache() + * package cache \endlink will be invalidated + * @param handle the context handle + * @param dbs list of package databases to update + * @param force if true, then forces the update, otherwise update only in case + * the databases aren't up to date + * @return 0 on success, -1 on error (pm_errno is set accordingly) + */ +int alpm_dbs_update(alpm_handle_t *handle, alpm_list_t *dbs, int force); + /** Get a package entry from a package database. * @param db pointer to the package database to get the package from * @param name of the package diff --git a/lib/libalpm/be_sync.c b/lib/libalpm/be_sync.c index aafed15d..497e1054 100644 --- a/lib/libalpm/be_sync.c +++ b/lib/libalpm/be_sync.c @@ -301,6 +301,134 @@ int SYMEXPORT alpm_db_update(int force, alpm_db_t *db) return ret; }
+int SYMEXPORT alpm_dbs_update(alpm_handle_t *handle, alpm_list_t *dbs, int force) { + char *syncpath; + const char *dbext = handle->dbext;
I'll note that we don't test if handle is NULL throughout the codebase, so OK...
+ alpm_list_t *i; + int ret = -1; + mode_t oldmask; + alpm_list_t *payloads = NULL; + + /* Sanity checks */ + ASSERT(dbs != NULL, return -1); + handle->pm_errno = ALPM_ERR_OK; + + syncpath = get_sync_dir(handle); + ASSERT(syncpath != NULL, return -1); + + /* make sure we have a sane umask */ + oldmask = umask(0022); +
OK
+ for(i = dbs; i; i = i->next) { + alpm_db_t *db = i->data; + int dbforce = force; + struct dload_payload *payload = NULL; + size_t len; + int siglevel; + + if(!(db->usage & ALPM_DB_USAGE_SYNC)) { + continue; + } + + ASSERT(db != handle->db_local, RET_ERR(handle, ALPM_ERR_WRONG_ARGS, -1)); + ASSERT(db->servers != NULL, RET_ERR(handle, ALPM_ERR_SERVER_NONE, -1)); +
OK.
+ /* force update of invalid databases to fix potential mismatched database/signature */ + if(db->status & DB_STATUS_INVALID) { + dbforce = 1; + }
OK.
+ + CALLOC(payload, 1, sizeof(*payload), RET_ERR(handle, ALPM_ERR_MEMORY, -1)); + + /* set hard upper limit of 128MiB */ + payload->max_size = 128 * 1024 * 1024; + ASSERT(db->servers != NULL, RET_ERR(handle, ALPM_ERR_SERVER_NONE, -1));
Already done above.
+ payload->servers = db->servers; + + /* print server + filename into a buffer */ + len = strlen(db->treename) + strlen(dbext) + 1; + MALLOC(payload->filepath, len, GOTO_ERR(handle, ALPM_ERR_MEMORY, cleanup)); + snprintf(payload->filepath, len, "%s%s", db->treename, dbext); + payload->handle = handle; + payload->force = dbforce; + payload->unlink_on_fail = 1; + + payloads = alpm_list_add(payloads, payload);
OK
+ + siglevel = alpm_db_get_siglevel(db); + if(siglevel & ALPM_SIG_DATABASE) { + struct dload_payload *sig_payload; + CALLOC(sig_payload, 1, sizeof(*sig_payload), GOTO_ERR(handle, ALPM_ERR_MEMORY, cleanup)); +
OK.
+ /* print filename into a buffer (leave space for separator and .sig) */ + len = strlen(db->treename) + strlen(dbext) + 5; + MALLOC(sig_payload->filepath, len, goto cleanup); + snprintf(sig_payload->filepath, len, "%s%s.sig", db->treename, dbext); +
OK.
+ sig_payload->handle = handle; + sig_payload->force = dbforce;
Always "force" the signature. sig_payload->force = 1;
I sent a change with "->force = 1;" and it got applied to Allan's patchqueu. But I would like to review this line one more time and switch it back to "sig_payload->force = dbforce;" Having "sig_payload->force = 1;" forces ALPM to ignore modification time and always download the file. It was fine before when *.sig file downloaded only after we found that *.db file is changed (in this case *.sig is changed as well). With the new logic we start downloading the files in parallel. *.db file content is download only if modification time is newer. But setting "sig_payload->force = 1;" forced downloading *.sig file content every time. And most of the time it is unneeded. We need to download the *.sig file content only when it changed. Thus I propose to make this line to "sig_payload->force = dbforce;" (i.e. force download signature only when we force download the *.db file).
On 14/4/20 4:33 am, Anatol Pomozov wrote:
Hi
On Thu, Mar 12, 2020 at 6:16 PM Allan McRae <allan@archlinux.org> wrote:
On 9/3/20 6:59 am, Anatol Pomozov wrote:
This is an equivalent of alpm_db_update but for multiplexed (parallel) download. The difference is that this function accepts list of databases to update. And then ALPM internals download it in parallel if possible.
Add a stub for _alpm_multi_download the function that will do parallel payloads downloads in the future.
Introduce dload_payload->filepath field that contains url path to the file we download. It is like fileurl field but does not contain protocol/server part. The rationale for having this field is that with the curl multidownload the server retry logic is going to move to a curl callback. And the callback needs to be able to reconstruct the 'next' fileurl. One will be able to do it by getting the next server url from 'servers' list and then concat with filepath. Once the 'parallel download' refactoring is over 'fileurl' field will go away.
Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
<snip>
+ sig_payload->handle = handle; + sig_payload->force = dbforce;
Always "force" the signature. sig_payload->force = 1;
I sent a change with "->force = 1;" and it got applied to Allan's patchqueu. But I would like to review this line one more time and switch it back to "sig_payload->force = dbforce;"
Having "sig_payload->force = 1;" forces ALPM to ignore modification time and always download the file. It was fine before when *.sig file downloaded only after we found that *.db file is changed (in this case *.sig is changed as well). With the new logic we start downloading the files in parallel. *.db file content is download only if modification time is newer. But setting "sig_payload->force = 1;" forced downloading *.sig file content every time. And most of the time it is unneeded. We need to download the *.sig file content only when it changed.
Thus I propose to make this line to "sig_payload->force = dbforce;" (i.e. force download signature only when we force download the *.db file).
This is good reasoning. I'll edit that back into the patch on my patchqueue branch. A
participants (2)
-
Allan McRae
-
Anatol Pomozov