On Thu, 2016-09-01 at 13:14 +1000, Allan McRae wrote:
On 01/09/16 09:44, Gordian Edenhofer wrote:
On Thu, 2016-09-01 at 08:28 +1000, Allan McRae wrote:
On 01/09/16 08:08, Dave Reisner wrote:
On Wed, Aug 31, 2016 at 11:18:32PM +0200, Gordian Edenhofer wrote:
> > > > > > > > The second probably would not be accepted... > > I urge you to reconsider. Parallelization increases the > speed > of > this
I don't think anyone is suggesting that packaging multiple things in parallel isn't useful. I already suggested that nothing needs to be implemented in bacman proper in order for you to parallelize the work. You can write your own "pbacman" as simply as:
for arg; do bacman "$arg" & done; wait
There is a huge difference between flooding your system with ~1000 jobs and tightly controlling the maximum number. Adjusting the precise number of jobs enables you to organize your resources which itself is desirable.
Then use a program like 'parallel' which has this sort of knob. I really wonder what it is you're doing that requires running bacman with a large number of packages with any regularity.
Gathering the files etc takes no time. It is really the compression that is being made parallel. If only there was a way to set compression to use mutlithreading...
The actual compression using xz (default) is not necessary the most time intensive part. The linux-headers package for example is compressed within a few seconds but the whole process before xz is run takes way longer. This can be seen with top as an illustration or simply by running bacman one time without compression and the other with. Moreover using bacman to parallelize makes it completely independent from the archive format used and still brings gains when recreating multiple packages. At the very least it would fill the gap in between the compression of multiple packages. Therefore it would be beneficial even if compression would take the longest which is doesn't always do.
So read speed is the slow part? And trying to read more files at the same time helps?
Obviously read speed is not the limitation here. In this case bacman would not speed up by increasing the job count - no matter the implementation - but it obviously does. To have a good comparison I ran the tests again with xz set to use multiple threads. The results can be seen here [1] and the code is available here [2]. Surely tuning xz helps especially for single packages but using multiple jobs bring the real speed boost when recreating more than one package. The fact that xz can be tuned as well is no secret and was stated in the man page and mentioned in the usage section from the beginning on. Furthermore the implementation is only a few additional lines of code, must be explicitly invoked and should in no case slow someone down. Best Regards, Gordian Edenhofer [1] http://edh.ddns.net/pacman_ml_bacman_benchmarks/bacman:%20simple%20 benchmark.svg [2] http://edh.ddns.net/pacman_ml_bacman_benchmarks/bacman:%20simple%20 benchmark.R.txt