[pacman-dev] [PATCH] Introduce alpm_dbs_update() function for parallel db updates

Allan McRae allan at archlinux.org
Sun Mar 8 13:04:50 UTC 2020


On 7/3/20 6:35 am, Anatol Pomozov wrote:
> This is an equivalent of alpm_db_update but for multiplexed (parallel)
> download. The difference is that this function accepts list of
> databases to update. And then ALPM internals download it in parallel if
> possible.
> 
> Add a stub for _alpm_multi_download the function that will do parallel
> payloads downloads in the future.

Thanks for splitting the patches up into smaller units.  It makes it a
lot easier for me to review.

Note that once review is complete, this patch (and the config one) will
sit on a branch until the rest of the work is ready to land in one go.

In fact... it probably makes sense to add _alpm_multi_download first,
then follow-up with this patch.

> Introduce dload_payload->filepath field that contains url path to the
> file we download. It is like fileurl field but does not contain
> protocol/server part. The rationale for having this field is that with
> the curl multidownload the server retry logic is going to move to a curl
> callback. And the callback needs to be able to reconstruct the 'next'
> fileurl. One will be able to do it by getting the next server url from
> 'servers' list and then concat with filepath. Once the 'parallel download'
> refactoring is over 'fileurl' field will go away.
> 
> Signed-off-by: Anatol Pomozov <anatol.pomozov at gmail.com>
> ---
>  lib/libalpm/alpm.h    |   2 +
>  lib/libalpm/be_sync.c | 132 ++++++++++++++++++++++++++++++++++++++++++
>  lib/libalpm/dload.c   |  12 ++++
>  lib/libalpm/dload.h   |   5 ++
>  4 files changed, 151 insertions(+)
> 
> diff --git a/lib/libalpm/alpm.h b/lib/libalpm/alpm.h
> index 93b97f44..eb0490eb 100644
> --- a/lib/libalpm/alpm.h
> +++ b/lib/libalpm/alpm.h
> @@ -1045,6 +1045,8 @@ int alpm_db_remove_server(alpm_db_t *db, const char *url);
>   */
>  int alpm_db_update(int force, alpm_db_t *db);
>  
> +int alpm_dbs_update(alpm_handle_t *handle, alpm_list_t *dbs, int force, int failfast);

As already mentioned in the other reply, I don't think failfast is needed.

It is a bit annoying that "force" and "dbs" are in a different order to
alpm_db_update, but I think this is the better order...  so keep it.

> +
>  /** Get a package entry from a package database.
>   * @param db pointer to the package database to get the package from
>   * @param name of the package
> diff --git a/lib/libalpm/be_sync.c b/lib/libalpm/be_sync.c
> index aafed15d..cdb46bd9 100644
> --- a/lib/libalpm/be_sync.c
> +++ b/lib/libalpm/be_sync.c
> @@ -301,6 +301,138 @@ int SYMEXPORT alpm_db_update(int force, alpm_db_t *db)
>  	return ret;
>  }
>  
> +/** Update list of databases. This function may run updates in parallel.
> + *
> + * @param dbs a list of alpm_db_t to update.
> + */

This should be in alpm.h and all params and return need documented.

> +int SYMEXPORT alpm_dbs_update(alpm_handle_t *handle, alpm_list_t *dbs, int force, UNUSED int failfast) {
> +	char *syncpath;
> +	const char *dbext = handle->dbext;
> +	alpm_list_t *i;
> +	int ret = -1;
> +	mode_t oldmask;
> +	alpm_list_t *payloads = NULL;
> +
> +	/* Sanity checks */
> +	ASSERT(dbs != NULL, return -1);
> +	handle->pm_errno = ALPM_ERR_OK;
> +
> +	syncpath = get_sync_dir(handle);
> +	ASSERT(syncpath != NULL, return -1);
> +
> +	/* make sure we have a sane umask */
> +	oldmask = umask(0022);
> +
> +	for(i = dbs; i; i = i->next) {
> +		alpm_db_t *db = i->data;
> +		int dbforce = force;
> +		struct dload_payload *payload = NULL;
> +		size_t len;
> +		int siglevel;
> +
> +		if(!(db->usage & ALPM_DB_USAGE_SYNC)) {
> +			continue;
> +		}
> +
> +		ASSERT(db != handle->db_local, RET_ERR(handle, ALPM_ERR_WRONG_ARGS, -1));
> +		ASSERT(db->servers != NULL, RET_ERR(handle, ALPM_ERR_SERVER_NONE, -1));
> +
> +		/* force update of invalid databases to fix potential mismatched database/signature */
> +		if(db->status & DB_STATUS_INVALID) {
> +			dbforce = 1;
> +		}
> +
> +		CALLOC(payload, 1, sizeof(*payload), RET_ERR(handle, ALPM_ERR_MEMORY, -1));
> +
> +		/* set hard upper limit of 128MiB */
> +		payload->max_size = 128 * 1024 * 1024;
> +		ASSERT(db->servers != NULL, RET_ERR(handle, ALPM_ERR_SERVER_NONE, -1));
> +		payload->servers = db->servers;
> +

I got to here.  Seems a lot of this is duplicated from the single db
path.  If both are going to coexist, can we do some refactoring?

A


More information about the pacman-dev mailing list