[pacman-dev] [PATCH] [RFC] makepkg: extract sources in parallel

Sat Apr 23 12:47:38 EDT 2011

On Sat, Apr 23, 2011 at 1:10 AM, Allan McRae <allan at archlinux.org> wrote:
> On 23/04/11 08:04, Dan McGee wrote:
>>
>> This one could definitely benefit from limiting the number of parallel
>> threads, especially when dealing with the GCC PKGBUILD. Definitely a lot
>> of contention and times varied from twice as fast to a few seconds
>> slower, depending on if the cache decided it needed to flush out some
>> data. Limiting the number of threads to the number of CPUs would
>> probably go a long way to resolving some of this contention for IO
>> bandwidth.
>>
>> Signed-off-by: Dan McGee<dan at archlinux.org>
>
> Took this for a spin and it does make a difference for the GCC PKGBUILD.
>  Reduced total extraction time from 35sec to 25sec on my laptop.  I was
> actually surprised it made that much difference given I figured this would
> be more disk speed bound that cpu bound...
>
> My main concern is still what happens if two processes try to extract the
> same directory at the same time.  I guess such an occurrence would be very
> rare, and perhaps bstar actually would gracefully handle this, but it is
> something to consider.
For the < 1% that would have a problem with this, I might say
noextract=() is the answer?

> Also, the extraction time actually seemed slower despite not being so due to
> the output.  On the non-parallel version, you get a visual cue on how far
> through the extraction process you are (in terms of number of files
> extracted), but with the parallel extraction, all "Extracting" output is
> printed at once and then there is a big wait.  I guess that could be
> adjusted.

Not quite true. We start X jobs, but then have to wait for those X
jobs. For sanity, we wait on each job in the order it was started, and
you are hitting the common case of the biggest file being first- thus,
when the output from that job appears, we then notice that the X - 1
jobs following it have already finished and can immediately print the
output for them.

Not really easy to adjust, outside of iterating the jobs list
backwards with the heuristic that most people tend to put big files
first, or patch bash wait to have better semantics. Thread handling
capabilities aren't exactly stellar in a shell script, so I did the
best I could here.

-Dan