[pacman-dev] [PATCH] Introduce alpm_dbs_update() function for parallel db updates

Sun Mar 8 05:23:25 UTC 2020

On 7/3/20 6:48 am, Anatol Pomozov wrote:
> Hi
> 
> On Fri, Mar 6, 2020 at 12:35 PM Anatol Pomozov <anatol.pomozov at gmail.com> wrote:
>>
>> This is an equivalent of alpm_db_update but for multiplexed (parallel)
>> download. The difference is that this function accepts list of
>> databases to update. And then ALPM internals download it in parallel if
>> possible.
>>
>> Add a stub for _alpm_multi_download the function that will do parallel
>> payloads downloads in the future.
>>
>> Introduce dload_payload->filepath field that contains url path to the
>> file we download. It is like fileurl field but does not contain
>> protocol/server part. The rationale for having this field is that with
>> the curl multidownload the server retry logic is going to move to a curl
>> callback. And the callback needs to be able to reconstruct the 'next'
>> fileurl. One will be able to do it by getting the next server url from
>> 'servers' list and then concat with filepath. Once the 'parallel download'
>> refactoring is over 'fileurl' field will go away.
>>
>> Signed-off-by: Anatol Pomozov <anatol.pomozov at gmail.com>
>> ---
>>  lib/libalpm/alpm.h    |   2 +
>>  lib/libalpm/be_sync.c | 132 ++++++++++++++++++++++++++++++++++++++++++
>>  lib/libalpm/dload.c   |  12 ++++
>>  lib/libalpm/dload.h   |   5 ++
>>  4 files changed, 151 insertions(+)
>>
>> diff --git a/lib/libalpm/alpm.h b/lib/libalpm/alpm.h
>> index 93b97f44..eb0490eb 100644
>> --- a/lib/libalpm/alpm.h
>> +++ b/lib/libalpm/alpm.h
>> @@ -1045,6 +1045,8 @@ int alpm_db_remove_server(alpm_db_t *db, const char *url);
>>   */
>>  int alpm_db_update(int force, alpm_db_t *db);
>>
>> +int alpm_dbs_update(alpm_handle_t *handle, alpm_list_t *dbs, int force, int failfast);
>> +
>>  /** Get a package entry from a package database.
>>   * @param db pointer to the package database to get the package from
>>   * @param name of the package
>> diff --git a/lib/libalpm/be_sync.c b/lib/libalpm/be_sync.c
>> index aafed15d..cdb46bd9 100644
>> --- a/lib/libalpm/be_sync.c
>> +++ b/lib/libalpm/be_sync.c
>> @@ -301,6 +301,138 @@ int SYMEXPORT alpm_db_update(int force, alpm_db_t *db)
>>         return ret;
>>  }
>>
>> +/** Update list of databases. This function may run updates in parallel.
>> + *
>> + * @param dbs a list of alpm_db_t to update.
>> + */
>> +int SYMEXPORT alpm_dbs_update(alpm_handle_t *handle, alpm_list_t *dbs, int force, UNUSED int failfast) {
> 
> One question I had initially is whether we need 'failfast' option for
> parallel downloads. Failfast means once any error is detected ALPM
> stops all the download streams right away and returns error back to
> the client.
> 
> But now I am not sure if this option will be really useful. Currently
> multi-download implementation waits for download transactions to
> finish and only then returns to the client. It works just fine.
> 
> So I am thinking to remove failfast option unless someone has a strong
> opinion we should keep it.
> 

We don't want failfast for databases.  At least until databases are
downloaded into a temporary location.  Failfast would leave us with a
bunch of half downloaded database files.

A