[pacman-dev] [PATCH] Introduce alpm_dbs_update() function for parallel db updates

Sun Mar 8 20:55:54 UTC 2020

Hi

The failfast option got removed.

On Sun, Mar 8, 2020 at 6:05 AM Allan McRae <allan at archlinux.org> wrote:
>
> On 7/3/20 6:35 am, Anatol Pomozov wrote:
> > This is an equivalent of alpm_db_update but for multiplexed (parallel)
> > download. The difference is that this function accepts list of
> > databases to update. And then ALPM internals download it in parallel if
> > possible.
> >
> > Add a stub for _alpm_multi_download the function that will do parallel
> > payloads downloads in the future.
>
> Thanks for splitting the patches up into smaller units.  It makes it a
> lot easier for me to review.

Yep. I'll try to split the work into a smaller patches around 100-200
lines each for easier and more modular review.

> Note that once review is complete, this patch (and the config one) will
> sit on a branch until the rest of the work is ready to land in one go.

It is fine. But please share this branch with me so I can rebase my
work on top of the reviewed/accepted changes. I do not want to keep
more than a couple of changes in my development branch. Once review
for previous changes is done I'll start working on the new changes.

>
> In fact... it probably makes sense to add _alpm_multi_download first,
> then follow-up with this patch.

_alpm_multi_download is not the whole part of the story. It depends on
another large chunk of download.c changes (that it is going to be in a
separate patch). It is easier for me to start from alpm_dbs_update()
part.

>
> > Introduce dload_payload->filepath field that contains url path to the
> > file we download. It is like fileurl field but does not contain
> > protocol/server part. The rationale for having this field is that with
> > the curl multidownload the server retry logic is going to move to a curl
> > callback. And the callback needs to be able to reconstruct the 'next'
> > fileurl. One will be able to do it by getting the next server url from
> > 'servers' list and then concat with filepath. Once the 'parallel download'
> > refactoring is over 'fileurl' field will go away.
> >
> > Signed-off-by: Anatol Pomozov <anatol.pomozov at gmail.com>
> > ---
> >  lib/libalpm/alpm.h    |   2 +
> >  lib/libalpm/be_sync.c | 132 ++++++++++++++++++++++++++++++++++++++++++
> >  lib/libalpm/dload.c   |  12 ++++
> >  lib/libalpm/dload.h   |   5 ++
> >  4 files changed, 151 insertions(+)
> >
> > diff --git a/lib/libalpm/alpm.h b/lib/libalpm/alpm.h
> > index 93b97f44..eb0490eb 100644
> > --- a/lib/libalpm/alpm.h
> > +++ b/lib/libalpm/alpm.h
> > @@ -1045,6 +1045,8 @@ int alpm_db_remove_server(alpm_db_t *db, const char *url);
> >   */
> >  int alpm_db_update(int force, alpm_db_t *db);
> >
> > +int alpm_dbs_update(alpm_handle_t *handle, alpm_list_t *dbs, int force, int failfast);
>
> As already mentioned in the other reply, I don't think failfast is needed.
>
> It is a bit annoying that "force" and "dbs" are in a different order to
> alpm_db_update, but I think this is the better order...  so keep it.
>
> > +
> >  /** Get a package entry from a package database.
> >   * @param db pointer to the package database to get the package from
> >   * @param name of the package
> > diff --git a/lib/libalpm/be_sync.c b/lib/libalpm/be_sync.c
> > index aafed15d..cdb46bd9 100644
> > --- a/lib/libalpm/be_sync.c
> > +++ b/lib/libalpm/be_sync.c
> > @@ -301,6 +301,138 @@ int SYMEXPORT alpm_db_update(int force, alpm_db_t *db)
> >       return ret;
> >  }
> >
> > +/** Update list of databases. This function may run updates in parallel.
> > + *
> > + * @param dbs a list of alpm_db_t to update.
> > + */
>
> This should be in alpm.h and all params and return need documented.

docs are moved to alpm.h

>
> > +int SYMEXPORT alpm_dbs_update(alpm_handle_t *handle, alpm_list_t *dbs, int force, UNUSED int failfast) {
> > +     char *syncpath;
> > +     const char *dbext = handle->dbext;
> > +     alpm_list_t *i;
> > +     int ret = -1;
> > +     mode_t oldmask;
> > +     alpm_list_t *payloads = NULL;
> > +
> > +     /* Sanity checks */
> > +     ASSERT(dbs != NULL, return -1);
> > +     handle->pm_errno = ALPM_ERR_OK;
> > +
> > +     syncpath = get_sync_dir(handle);
> > +     ASSERT(syncpath != NULL, return -1);
> > +
> > +     /* make sure we have a sane umask */
> > +     oldmask = umask(0022);
> > +
> > +     for(i = dbs; i; i = i->next) {
> > +             alpm_db_t *db = i->data;
> > +             int dbforce = force;
> > +             struct dload_payload *payload = NULL;
> > +             size_t len;
> > +             int siglevel;
> > +
> > +             if(!(db->usage & ALPM_DB_USAGE_SYNC)) {
> > +                     continue;
> > +             }
> > +
> > +             ASSERT(db != handle->db_local, RET_ERR(handle, ALPM_ERR_WRONG_ARGS, -1));
> > +             ASSERT(db->servers != NULL, RET_ERR(handle, ALPM_ERR_SERVER_NONE, -1));
> > +
> > +             /* force update of invalid databases to fix potential mismatched database/signature */
> > +             if(db->status & DB_STATUS_INVALID) {
> > +                     dbforce = 1;
> > +             }
> > +
> > +             CALLOC(payload, 1, sizeof(*payload), RET_ERR(handle, ALPM_ERR_MEMORY, -1));
> > +
> > +             /* set hard upper limit of 128MiB */
> > +             payload->max_size = 128 * 1024 * 1024;
> > +             ASSERT(db->servers != NULL, RET_ERR(handle, ALPM_ERR_SERVER_NONE, -1));
> > +             payload->servers = db->servers;
> > +
>
> I got to here.  Seems a lot of this is duplicated from the single db
> path.  If both are going to coexist, can we do some refactoring?

It depends whether we want to keep the API backward-compatible. If it
is fine to break one in pacman 6 release then we can just remove the
function from ALPM API. Otherwise alpm_db_update() need to be
reimplemented using alpm_dbs_update() functionality.