[pacman-dev] [PATCH] Order downloads by descending max_size
When downloading in parallel, sort by package size so that the larger packages are queued first to fully leverage parallelism. Addresses FS#70172 Signed-off-by: Charlie Sale <softwaresale01@gmail.com> --- lib/libalpm/dload.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/lib/libalpm/dload.c b/lib/libalpm/dload.c index ca6be7b6..58d2122f 100644 --- a/lib/libalpm/dload.c +++ b/lib/libalpm/dload.c @@ -825,6 +825,19 @@ cleanup: return ret; } +/* + * Use to sort payloads by max size in decending order (largest -> smallest) + */ +static int compare_dload_payload_sizes(const void *left_ptr, const void *right_ptr) +{ + struct dload_payload *left, *right; + + left = (struct dload_payload *) left_ptr; + right = (struct dload_payload *) right_ptr; + + return right->max_size - left->max_size; +} + /* Returns -1 if an error happened for a required file * Returns 0 if a payload was actually downloaded * Returns 1 if no files were downloaded and all errors were non-fatal @@ -838,6 +851,10 @@ static int curl_download_internal(alpm_handle_t *handle, int max_streams = handle->parallel_downloads; int updated = 0; /* was a file actually updated */ CURLM *curlm = handle->curlm; + size_t payloads_size = alpm_list_count(payloads); + + /* Sort payloads by package size */ + payloads = alpm_list_msort(payloads, payloads_size, &compare_dload_payload_sizes); while(active_downloads_num > 0 || payloads) { CURLMcode mc; -- 2.32.0
On Wed, Aug 11, 2021 at 11:59 PM Charlie Sale <softwaresale01@gmail.com> wrote:
When downloading in parallel, sort by package size so that the larger packages are queued first to fully leverage parallelism. Addresses FS#70172
Signed-off-by: Charlie Sale <softwaresale01@gmail.com> --- lib/libalpm/dload.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+)
diff --git a/lib/libalpm/dload.c b/lib/libalpm/dload.c index ca6be7b6..58d2122f 100644 --- a/lib/libalpm/dload.c +++ b/lib/libalpm/dload.c @@ -825,6 +825,19 @@ cleanup: return ret; }
+/* + * Use to sort payloads by max size in decending order (largest -> smallest) + */ +static int compare_dload_payload_sizes(const void *left_ptr, const void *right_ptr) +{ + struct dload_payload *left, *right; + + left = (struct dload_payload *) left_ptr; + right = (struct dload_payload *) right_ptr; + + return right->max_size - left->max_size; +} + /* Returns -1 if an error happened for a required file * Returns 0 if a payload was actually downloaded * Returns 1 if no files were downloaded and all errors were non-fatal @@ -838,6 +851,10 @@ static int curl_download_internal(alpm_handle_t *handle, int max_streams = handle->parallel_downloads; int updated = 0; /* was a file actually updated */ CURLM *curlm = handle->curlm; + size_t payloads_size = alpm_list_count(payloads); + + /* Sort payloads by package size */ + payloads = alpm_list_msort(payloads, payloads_size, &compare_dload_payload_sizes);
while(active_downloads_num > 0 || payloads) { CURLMcode mc; -- 2.32.0
I'll also add that this is my first contribution to pacman. Please inform me of anything I did incorrectly regarding the patch process. Thank you in advance for your patience :) Cheers! ~Charlie
On 12/08/2021 05:12, Charlie Sale wrote:
On Wed, Aug 11, 2021 at 11:59 PM Charlie Sale <softwaresale01@gmail.com> wrote:
When downloading in parallel, sort by package size so that the larger packages are queued first to fully leverage parallelism. Addresses FS#70172
Signed-off-by: Charlie Sale <softwaresale01@gmail.com> --- lib/libalpm/dload.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+)
diff --git a/lib/libalpm/dload.c b/lib/libalpm/dload.c index ca6be7b6..58d2122f 100644 --- a/lib/libalpm/dload.c +++ b/lib/libalpm/dload.c @@ -825,6 +825,19 @@ cleanup: return ret; }
+/* + * Use to sort payloads by max size in decending order (largest -> smallest) + */ +static int compare_dload_payload_sizes(const void *left_ptr, const void *right_ptr) +{ + struct dload_payload *left, *right; + + left = (struct dload_payload *) left_ptr; + right = (struct dload_payload *) right_ptr; + + return right->max_size - left->max_size; +} + /* Returns -1 if an error happened for a required file * Returns 0 if a payload was actually downloaded * Returns 1 if no files were downloaded and all errors were non-fatal @@ -838,6 +851,10 @@ static int curl_download_internal(alpm_handle_t *handle, int max_streams = handle->parallel_downloads; int updated = 0; /* was a file actually updated */ CURLM *curlm = handle->curlm; + size_t payloads_size = alpm_list_count(payloads); + + /* Sort payloads by package size */ + payloads = alpm_list_msort(payloads, payloads_size, &compare_dload_payload_sizes);
while(active_downloads_num > 0 || payloads) { CURLMcode mc; -- 2.32.0
I'll also add that this is my first contribution to pacman. Please inform me of anything I did incorrectly regarding the patch process. Thank you in advance for your patience :)
Cheers! ~Charlie
In my totally untested theory this would slow things down. The ideal situation is to have 1 large download running to soak up bandwidth and then many small downloads running so they can all be set up and handshake. Have you got any benchmarks to comfirm/deny the ones in the bug report?
On 12/8/21 3:38 pm, Morgan Adamiec wrote:
On 12/08/2021 05:12, Charlie Sale wrote:
On Wed, Aug 11, 2021 at 11:59 PM Charlie Sale <softwaresale01@gmail.com> wrote:
When downloading in parallel, sort by package size so that the larger packages are queued first to fully leverage parallelism. Addresses FS#70172
Signed-off-by: Charlie Sale <softwaresale01@gmail.com> --- lib/libalpm/dload.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+)
diff --git a/lib/libalpm/dload.c b/lib/libalpm/dload.c index ca6be7b6..58d2122f 100644 --- a/lib/libalpm/dload.c +++ b/lib/libalpm/dload.c @@ -825,6 +825,19 @@ cleanup: return ret; }
+/* + * Use to sort payloads by max size in decending order (largest -> smallest) + */ +static int compare_dload_payload_sizes(const void *left_ptr, const void *right_ptr) +{ + struct dload_payload *left, *right; + + left = (struct dload_payload *) left_ptr; + right = (struct dload_payload *) right_ptr; + + return right->max_size - left->max_size; +} + /* Returns -1 if an error happened for a required file * Returns 0 if a payload was actually downloaded * Returns 1 if no files were downloaded and all errors were non-fatal @@ -838,6 +851,10 @@ static int curl_download_internal(alpm_handle_t *handle, int max_streams = handle->parallel_downloads; int updated = 0; /* was a file actually updated */ CURLM *curlm = handle->curlm; + size_t payloads_size = alpm_list_count(payloads); + + /* Sort payloads by package size */ + payloads = alpm_list_msort(payloads, payloads_size, &compare_dload_payload_sizes);
while(active_downloads_num > 0 || payloads) { CURLMcode mc; -- 2.32.0
I'll also add that this is my first contribution to pacman. Please inform me of anything I did incorrectly regarding the patch process. Thank you in advance for your patience :)
Cheers! ~Charlie
In my totally untested theory this would slow things down. The ideal situation is to have 1 large download running to soak up bandwidth and then many small downloads running so they can all be set up and handshake.
Have you got any benchmarks to comfirm/deny the ones in the bug report?
I have done rough numbers... Compared to random order, there is a net gain. Not a big gain over the range of parameters I tried, but a gain. This sort is unlikely to optimal, but optimality would be much harder to achieve. Allan
participants (3)
-
Allan McRae
-
Charlie Sale
-
Morgan Adamiec