[pacman-dev] [PATCH] Introduce alpm_dbs_update() function for parallel db updates
Anatol Pomozov
anatol.pomozov at gmail.com
Sun Mar 8 20:55:54 UTC 2020
Hi
The failfast option got removed.
On Sun, Mar 8, 2020 at 6:05 AM Allan McRae <allan at archlinux.org> wrote:
>
> On 7/3/20 6:35 am, Anatol Pomozov wrote:
> > This is an equivalent of alpm_db_update but for multiplexed (parallel)
> > download. The difference is that this function accepts list of
> > databases to update. And then ALPM internals download it in parallel if
> > possible.
> >
> > Add a stub for _alpm_multi_download the function that will do parallel
> > payloads downloads in the future.
>
> Thanks for splitting the patches up into smaller units. It makes it a
> lot easier for me to review.
Yep. I'll try to split the work into a smaller patches around 100-200
lines each for easier and more modular review.
> Note that once review is complete, this patch (and the config one) will
> sit on a branch until the rest of the work is ready to land in one go.
It is fine. But please share this branch with me so I can rebase my
work on top of the reviewed/accepted changes. I do not want to keep
more than a couple of changes in my development branch. Once review
for previous changes is done I'll start working on the new changes.
>
> In fact... it probably makes sense to add _alpm_multi_download first,
> then follow-up with this patch.
_alpm_multi_download is not the whole part of the story. It depends on
another large chunk of download.c changes (that it is going to be in a
separate patch). It is easier for me to start from alpm_dbs_update()
part.
>
> > Introduce dload_payload->filepath field that contains url path to the
> > file we download. It is like fileurl field but does not contain
> > protocol/server part. The rationale for having this field is that with
> > the curl multidownload the server retry logic is going to move to a curl
> > callback. And the callback needs to be able to reconstruct the 'next'
> > fileurl. One will be able to do it by getting the next server url from
> > 'servers' list and then concat with filepath. Once the 'parallel download'
> > refactoring is over 'fileurl' field will go away.
> >
> > Signed-off-by: Anatol Pomozov <anatol.pomozov at gmail.com>
> > ---
> > lib/libalpm/alpm.h | 2 +
> > lib/libalpm/be_sync.c | 132 ++++++++++++++++++++++++++++++++++++++++++
> > lib/libalpm/dload.c | 12 ++++
> > lib/libalpm/dload.h | 5 ++
> > 4 files changed, 151 insertions(+)
> >
> > diff --git a/lib/libalpm/alpm.h b/lib/libalpm/alpm.h
> > index 93b97f44..eb0490eb 100644
> > --- a/lib/libalpm/alpm.h
> > +++ b/lib/libalpm/alpm.h
> > @@ -1045,6 +1045,8 @@ int alpm_db_remove_server(alpm_db_t *db, const char *url);
> > */
> > int alpm_db_update(int force, alpm_db_t *db);
> >
> > +int alpm_dbs_update(alpm_handle_t *handle, alpm_list_t *dbs, int force, int failfast);
>
> As already mentioned in the other reply, I don't think failfast is needed.
>
> It is a bit annoying that "force" and "dbs" are in a different order to
> alpm_db_update, but I think this is the better order... so keep it.
>
> > +
> > /** Get a package entry from a package database.
> > * @param db pointer to the package database to get the package from
> > * @param name of the package
> > diff --git a/lib/libalpm/be_sync.c b/lib/libalpm/be_sync.c
> > index aafed15d..cdb46bd9 100644
> > --- a/lib/libalpm/be_sync.c
> > +++ b/lib/libalpm/be_sync.c
> > @@ -301,6 +301,138 @@ int SYMEXPORT alpm_db_update(int force, alpm_db_t *db)
> > return ret;
> > }
> >
> > +/** Update list of databases. This function may run updates in parallel.
> > + *
> > + * @param dbs a list of alpm_db_t to update.
> > + */
>
> This should be in alpm.h and all params and return need documented.
docs are moved to alpm.h
>
> > +int SYMEXPORT alpm_dbs_update(alpm_handle_t *handle, alpm_list_t *dbs, int force, UNUSED int failfast) {
> > + char *syncpath;
> > + const char *dbext = handle->dbext;
> > + alpm_list_t *i;
> > + int ret = -1;
> > + mode_t oldmask;
> > + alpm_list_t *payloads = NULL;
> > +
> > + /* Sanity checks */
> > + ASSERT(dbs != NULL, return -1);
> > + handle->pm_errno = ALPM_ERR_OK;
> > +
> > + syncpath = get_sync_dir(handle);
> > + ASSERT(syncpath != NULL, return -1);
> > +
> > + /* make sure we have a sane umask */
> > + oldmask = umask(0022);
> > +
> > + for(i = dbs; i; i = i->next) {
> > + alpm_db_t *db = i->data;
> > + int dbforce = force;
> > + struct dload_payload *payload = NULL;
> > + size_t len;
> > + int siglevel;
> > +
> > + if(!(db->usage & ALPM_DB_USAGE_SYNC)) {
> > + continue;
> > + }
> > +
> > + ASSERT(db != handle->db_local, RET_ERR(handle, ALPM_ERR_WRONG_ARGS, -1));
> > + ASSERT(db->servers != NULL, RET_ERR(handle, ALPM_ERR_SERVER_NONE, -1));
> > +
> > + /* force update of invalid databases to fix potential mismatched database/signature */
> > + if(db->status & DB_STATUS_INVALID) {
> > + dbforce = 1;
> > + }
> > +
> > + CALLOC(payload, 1, sizeof(*payload), RET_ERR(handle, ALPM_ERR_MEMORY, -1));
> > +
> > + /* set hard upper limit of 128MiB */
> > + payload->max_size = 128 * 1024 * 1024;
> > + ASSERT(db->servers != NULL, RET_ERR(handle, ALPM_ERR_SERVER_NONE, -1));
> > + payload->servers = db->servers;
> > +
>
> I got to here. Seems a lot of this is duplicated from the single db
> path. If both are going to coexist, can we do some refactoring?
It depends whether we want to keep the API backward-compatible. If it
is fine to break one in pacman 6 release then we can just remove the
function from ALPM API. Otherwise alpm_db_update() need to be
reimplemented using alpm_dbs_update() functionality.
More information about the pacman-dev
mailing list