On Sun, Dec 19, 2010 at 11:32 AM, Xavier Chantry <chantry.xavier@gmail.com> wrote:
On Wed, Dec 15, 2010 at 7:48 AM, Dan McGee <dan@archlinux.org> wrote:
The old function was written in a time before we relied on it for nearly every operation. Since then, we have switched to the archive backend and now fast parsing is a big deal.
The former function made a per-character call to the libarchive archive_read_data() function, which resulted in some 21 million calls in a typical "load all sync dbs" operation. If we instead do some buffering of our own and read the blocks directly, and then find our newlines from there, we can cut out the multiple layers of overhead and go from archive to parsed data much quicker.
Both users of the former function are switched over to the new signature, made easier by the macros now in place in the sync backend parsing code.
Performance: for a `pacman -Su` (no upgrades available), _alpm_archive_fgets() goes from being 29% of the total time to 12% The time spent on the libarchive function being called dropped from 24% to 6%.
This pushes _alpm_pkg_find back to the title of slowest low-level function.
Signed-off-by: Dan McGee <dan@archlinux.org> ---
I wish this one was a bit easier to comprehend and grok, but the cost of performance here is a bit more complicated fgets function than the very simplistic one we had before.
The new function revolves around the zero-copy, retrieve an entire block functinoality of libarchive. We create a structure to keep all the bookkeeping info we need around, including a buffer of our own where each line ends up and auto-resizes itself as we poll through an archive file entry, and also cleans itself up when the file is EOF. This means malloc/free is not necessary by the caller, which keeps most concerns in the fgets function.
Please let me know if it is unclear how the new function works, so more comments can be put in there.
Since we have just one line, I would not have bothered trying to alloc exactly what we need. I would just have used a static or dynamic buf of max line size, and be done with it. But anyway you already implemented the more complex way and I don't see a problem with that as long as its tested (especially the re-alloc part) :)
I'm not sure what you are saying you would do with a dynamic buffer- doesn't it make sense to pick the original size as whatever we get back first, rather than any other arbitrary size that we might have to resize anyway? And I've always hated static buffer sizes; see the PATH_MAX junk for more reason for that.
I could not spot a problem by simply reading the code.
Another minor observation : in the !done case, the alloc block does a strict allocation and thus does not take into account that we will very likely have to extend it on the next iteration (if we did not reach the last archive block).
Finally can it actually happen with our sync archives which only has very small files that a line is split across two blocks ?
It can with the smoke002.py pactest I'll include that just blew a hole through my realloc code; necessary changes below. Pactest created packages usually produce a block size of 65536 or so. I also bumped the max line size to 512KB rather than PATH_MAX (4096 bytes), so now the delayed allocation makes a bit more sense.
In any cases, this is another very welcome improvement.
Thanks! -Dan diff --git a/lib/libalpm/util.c b/lib/libalpm/util.c index 87880de..d34eab5 100644 --- a/lib/libalpm/util.c +++ b/lib/libalpm/util.c @@ -841,6 +842,8 @@ int _alpm_archive_fgets(struct archive *a, struct archive_read_buffer *b) /* we've looked through the whole block but no newline, copy it */ size_t len = b->block + b->block_size - b->block_offset; memcpy(b->line_offset, b->block_offset, len); + b->line_offset += len; + b->block_offset = i; } }