Re: [pacman-dev] [PATCH v2 1/3] bacman: allow for parallel packaging

2 Sep 2016


      On Thu, 2016-09-01 at 13:14 +1000, Allan McRae wrote:
...
On 01/09/16 09:44, Gordian Edenhofer wrote:
...
On Thu, 2016-09-01 at 08:28 +1000, Allan McRae wrote:
...
On 01/09/16 08:08, Dave Reisner wrote:
...
On Wed, Aug 31, 2016 at 11:18:32PM +0200, Gordian Edenhofer
wrote:
...
...
> 
> 
> > 
> > 
> > The second probably would not be accepted...
> 
> I urge you to reconsider. Parallelization increases the
> speed
> of
> this
I don't think anyone is suggesting that packaging multiple
things in
parallel isn't useful. I already suggested that nothing
needs
to be
implemented in bacman proper in order for you to
parallelize
the
work.
You can write your own "pbacman" as simply as:
  for arg; do bacman "$arg" & done; wait
There is a huge difference between flooding your system with
~1000 jobs
and tightly controlling the maximum number. Adjusting the
precise
number of jobs enables you to organize your resources which
itself is
desirable.
Then use a program like 'parallel' which has this sort of knob.
I
really
wonder what it is you're doing that requires running bacman
with a
large
number of packages with any regularity.
Gathering the files etc takes no time.  It is really the
compression
that is being made parallel.  If only there was a way to set
compression
to use mutlithreading...
The actual compression using xz (default) is not necessary the most
time intensive part. The linux-headers package for example is
compressed within a few seconds but the whole process before xz is
run
takes way longer. This can be seen with top as an illustration or
simply by running bacman one time without compression and the other
with.
Moreover using bacman to parallelize makes it completely
independent
from the archive format used and still brings gains when recreating
multiple packages. At the very least it would fill the gap in
between
the compression of multiple packages. Therefore it would be
beneficial
even if compression would take the longest which is doesn't always
do.
So read speed is the slow part?   And trying to read more files at
the
same time helps?
Obviously read speed is not the limitation here. In this case bacman
would not speed up by increasing the job count - no matter the
implementation - but it obviously does. To have a good comparison I ran
the tests again with xz set to use multiple threads. The results can be
seen here [1] and the code is available here [2].
Surely tuning xz helps especially for single packages but using
multiple jobs bring the real speed boost when recreating more than one
package. The fact that xz can be tuned as well is no secret and was
stated in the man page and mentioned in the usage section from the
beginning on.
Furthermore the implementation is only a few additional lines of code,
must be explicitly invoked and should in no case slow someone down.

Best Regards,
Gordian Edenhofer

[1] http://edh.ddns.net/pacman_ml_bacman_benchmarks/bacman:%20simple%20
benchmark.svg
[2] http://edh.ddns.net/pacman_ml_bacman_benchmarks/bacman:%20simple%20
benchmark.R.txt