[pacman-dev] [PATCH v2] Introduce alpm_dbs_update() function for parallel db updates

Anatol Pomozov anatol.pomozov at gmail.com
Mon Apr 13 18:33:37 UTC 2020


Hi

On Thu, Mar 12, 2020 at 6:16 PM Allan McRae <allan at archlinux.org> wrote:
>
> On 9/3/20 6:59 am, Anatol Pomozov wrote:
> > This is an equivalent of alpm_db_update but for multiplexed (parallel)
> > download. The difference is that this function accepts list of
> > databases to update. And then ALPM internals download it in parallel if
> > possible.
> >
> > Add a stub for _alpm_multi_download the function that will do parallel
> > payloads downloads in the future.
> >
> > Introduce dload_payload->filepath field that contains url path to the
> > file we download. It is like fileurl field but does not contain
> > protocol/server part. The rationale for having this field is that with
> > the curl multidownload the server retry logic is going to move to a curl
> > callback. And the callback needs to be able to reconstruct the 'next'
> > fileurl. One will be able to do it by getting the next server url from
> > 'servers' list and then concat with filepath. Once the 'parallel download'
> > refactoring is over 'fileurl' field will go away.
> >
> > Signed-off-by: Anatol Pomozov <anatol.pomozov at gmail.com>
> > ---
> >  lib/libalpm/alpm.h    |  29 ++++++++++
> >  lib/libalpm/be_sync.c | 128 ++++++++++++++++++++++++++++++++++++++++++
> >  lib/libalpm/dload.c   |  12 ++++
> >  lib/libalpm/dload.h   |   5 ++
> >  4 files changed, 174 insertions(+)
> >
> > diff --git a/lib/libalpm/alpm.h b/lib/libalpm/alpm.h
> > index 5d559db1..c985429d 100644
> > --- a/lib/libalpm/alpm.h
> > +++ b/lib/libalpm/alpm.h
> > @@ -1049,6 +1049,35 @@ int alpm_db_remove_server(alpm_db_t *db, const char *url);
> >   */
> >  int alpm_db_update(int force, alpm_db_t *db);
> >
> > +/** Update package databases
> > + *
> > + * An update of the package databases \a dbs will be attempted. Unless
>
> ...package databases in the list \a dbs will ...
>
> > + * \a force is true, the update will only be performed if the remote
> > + * databases were modified since the last update.
> > + *
> > + * This operation requires a database lock, and will return an applicable error
> > + * if the lock could not be obtained.
> > + *
> > + * Example:
> > + * @code
> > + * alpm_list_t *dbs = alpm_get_syncdbs();
> > + * ret = alpm_dbs_update(config->handle, dbs, force);
> > + * if(ret < 0) {
> > + *     pm_printf(ALPM_LOG_ERROR, _("failed to synchronize all databases (%s)\n"),
> > + *         alpm_strerror(alpm_errno(config->handle)));
> > + * }
> > + * @endcode
> > + *
> > + * @note After a successful update, the \link alpm_db_get_pkgcache()
> > + * package cache \endlink will be invalidated
> > + * @param handle the context handle
> > + * @param dbs list of package databases to update
> > + * @param force if true, then forces the update, otherwise update only in case
> > + * the databases aren't up to date
> > + * @return 0 on success, -1 on error (pm_errno is set accordingly)
> > + */
> > +int alpm_dbs_update(alpm_handle_t *handle, alpm_list_t *dbs, int force);
> > +
> >  /** Get a package entry from a package database.
> >   * @param db pointer to the package database to get the package from
> >   * @param name of the package
> > diff --git a/lib/libalpm/be_sync.c b/lib/libalpm/be_sync.c
> > index aafed15d..497e1054 100644
> > --- a/lib/libalpm/be_sync.c
> > +++ b/lib/libalpm/be_sync.c
> > @@ -301,6 +301,134 @@ int SYMEXPORT alpm_db_update(int force, alpm_db_t *db)
> >       return ret;
> >  }
> >
> > +int SYMEXPORT alpm_dbs_update(alpm_handle_t *handle, alpm_list_t *dbs, int force) {
> > +     char *syncpath;
> > +     const char *dbext = handle->dbext;
>
> I'll note that we don't test if handle is NULL throughout the codebase,
> so OK...
>
> > +     alpm_list_t *i;
> > +     int ret = -1;
> > +     mode_t oldmask;
> > +     alpm_list_t *payloads = NULL;
> > +
> > +     /* Sanity checks */
> > +     ASSERT(dbs != NULL, return -1);
> > +     handle->pm_errno = ALPM_ERR_OK;
> > +
> > +     syncpath = get_sync_dir(handle);
> > +     ASSERT(syncpath != NULL, return -1);
> > +
> > +     /* make sure we have a sane umask */
> > +     oldmask = umask(0022);
> > +
>
> OK
>
> > +     for(i = dbs; i; i = i->next) {
> > +             alpm_db_t *db = i->data;
> > +             int dbforce = force;
> > +             struct dload_payload *payload = NULL;
> > +             size_t len;
> > +             int siglevel;
> > +
> > +             if(!(db->usage & ALPM_DB_USAGE_SYNC)) {
> > +                     continue;
> > +             }
> > +
> > +             ASSERT(db != handle->db_local, RET_ERR(handle, ALPM_ERR_WRONG_ARGS, -1));
> > +             ASSERT(db->servers != NULL, RET_ERR(handle, ALPM_ERR_SERVER_NONE, -1));
> > +
>
> OK.
>
> > +             /* force update of invalid databases to fix potential mismatched database/signature */
> > +             if(db->status & DB_STATUS_INVALID) {
> > +                     dbforce = 1;
> > +             }
>
> OK.
>
> > +
> > +             CALLOC(payload, 1, sizeof(*payload), RET_ERR(handle, ALPM_ERR_MEMORY, -1));
> > +
> > +             /* set hard upper limit of 128MiB */
> > +             payload->max_size = 128 * 1024 * 1024;
> > +             ASSERT(db->servers != NULL, RET_ERR(handle, ALPM_ERR_SERVER_NONE, -1));
>
> Already done above.
>
> > +             payload->servers = db->servers;
> > +
> > +             /* print server + filename into a buffer */
> > +             len = strlen(db->treename) + strlen(dbext) + 1;
> > +             MALLOC(payload->filepath, len, GOTO_ERR(handle, ALPM_ERR_MEMORY, cleanup));
> > +             snprintf(payload->filepath, len, "%s%s", db->treename, dbext);
> > +             payload->handle = handle;
> > +             payload->force = dbforce;
> > +             payload->unlink_on_fail = 1;
> > +
> > +             payloads = alpm_list_add(payloads, payload);
>
> OK
>
> > +
> > +             siglevel = alpm_db_get_siglevel(db);
> > +             if(siglevel & ALPM_SIG_DATABASE) {
> > +                     struct dload_payload *sig_payload;
> > +                     CALLOC(sig_payload, 1, sizeof(*sig_payload), GOTO_ERR(handle, ALPM_ERR_MEMORY, cleanup));
> > +
>
> OK.
>
> > +                     /* print filename into a buffer (leave space for separator and .sig) */
> > +                     len = strlen(db->treename) + strlen(dbext) + 5;
> > +                     MALLOC(sig_payload->filepath, len, goto cleanup);
> > +                     snprintf(sig_payload->filepath, len, "%s%s.sig", db->treename, dbext);
> > +
>
> OK.
>
> > +                     sig_payload->handle = handle;
> > +                     sig_payload->force = dbforce;
>
> Always "force" the signature.
> sig_payload->force = 1;

I sent a change with "->force = 1;" and it got applied to Allan's
patchqueu. But I would like to review this line one more time and
switch it back to "sig_payload->force = dbforce;"

Having "sig_payload->force = 1;" forces ALPM to ignore modification
time and always download the file. It was fine before when *.sig file
downloaded only after we found that *.db file is changed (in this case
*.sig is changed as well). With the new logic we start downloading the
files in parallel. *.db file content is download only if modification
time is newer. But setting "sig_payload->force = 1;" forced
downloading *.sig file content every time. And most of the time it is
unneeded. We need to download the *.sig file content only when it
changed.

Thus I propose to make this line to "sig_payload->force = dbforce;"
(i.e. force download signature only when we force download the *.db
file).


More information about the pacman-dev mailing list