[pacman-dev] [PATCH 1/5] Check if a file is in the package's file list before extracting
Signed-off-by: Florian Pritz <bluewind@xinu.at> --- lib/libalpm/add.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/lib/libalpm/add.c b/lib/libalpm/add.c index 5f1a953..7930117 100644 --- a/lib/libalpm/add.c +++ b/lib/libalpm/add.c @@ -178,6 +178,11 @@ static int extract_single_file(alpm_handle_t *handle, struct archive *archive, archive_read_data_skip(archive); return 0; } else { + if (!alpm_filelist_contains(&newpkg->files, entryname)) { + _alpm_log(handle, ALPM_LOG_WARNING, _("file not found in file list for package %s. skipping extraction of %s\n"), + newpkg->name, entryname); + return 0; + } /* build the new entryname relative to handle->root */ snprintf(filename, PATH_MAX, "%s%s", handle->root, entryname); } -- 1.8.5.3
Signed-off-by: Florian Pritz <bluewind@xinu.at> --- lib/libalpm/util.c | 38 ++++++++++++++++++++++++++++++++++++++ lib/libalpm/util.h | 1 + 2 files changed, 39 insertions(+) diff --git a/lib/libalpm/util.c b/lib/libalpm/util.c index 1bbc768..5f99f35 100644 --- a/lib/libalpm/util.c +++ b/lib/libalpm/util.c @@ -1287,6 +1287,44 @@ int _alpm_fnmatch(const void *pattern, const void *string) return fnmatch(pattern, string, 0); } +/** + * Think of this as realloc with error handling and + * automatic growing bases on current/required + * + * @param data source memory space + * @param current size of the space pointed to be data + * @param required size you want + * @return new memory + */ +void *_alpm_realloc(void *data, size_t *current, const size_t required) +{ + if(required >= *current) { + size_t old_size = *current; + char *newdata; + size_t newsize = *current; + + if(*current == 0) { + newsize = ALPM_BUFFER_SIZE; + } else if (required > (*current) * 2) { + newsize = required * 2; + } else { + newsize *= 2; + } + + newdata = realloc(data, newsize); + if(!newdata) { + _alpm_alloc_fail(newsize); + return NULL; + } + /* ensure all new memory is zeroed out, in both the initial + * allocation and later reallocs */ + memset(newdata + old_size, 0, *current - old_size); + *current = newsize; + return newdata; + } + return data; +} + void _alpm_alloc_fail(size_t size) { fprintf(stderr, "alloc failure: could not allocate %zd bytes\n", size); diff --git a/lib/libalpm/util.h b/lib/libalpm/util.h index b566868..b7e6e90 100644 --- a/lib/libalpm/util.h +++ b/lib/libalpm/util.h @@ -141,6 +141,7 @@ int _alpm_raw_ncmp(const char *first, const char *second, size_t max); int _alpm_access(alpm_handle_t *handle, const char *dir, const char *file, int amode); int _alpm_fnmatch_patterns(alpm_list_t *patterns, const char *string); int _alpm_fnmatch(const void *pattern, const void *string); +void *_alpm_realloc(void *data, size_t *current, const size_t required); #ifndef HAVE_STRSEP char *strsep(char **, const char *); -- 1.8.5.3
On Tue, Jan 28, 2014 at 12:02:34AM +0100, Florian Pritz wrote:
Signed-off-by: Florian Pritz <bluewind@xinu.at> --- lib/libalpm/util.c | 38 ++++++++++++++++++++++++++++++++++++++ lib/libalpm/util.h | 1 + 2 files changed, 39 insertions(+)
diff --git a/lib/libalpm/util.c b/lib/libalpm/util.c index 1bbc768..5f99f35 100644 --- a/lib/libalpm/util.c +++ b/lib/libalpm/util.c @@ -1287,6 +1287,44 @@ int _alpm_fnmatch(const void *pattern, const void *string) return fnmatch(pattern, string, 0); }
+/** + * Think of this as realloc with error handling and + * automatic growing bases on current/required
Hrmmm, I'm not really a fan of this as is. One might argue that it doesn't really do error handling -- it returns NULL on error (the same as realloc), and merely wraps some semi-arbitrary logic for deciding how much to grow the buffer. The caller still needs to save the value if they want to avoid a crappy situation on OOM. I think we can do better, and I'd suggesting looking greedy_realloc (function and macro) in systemd's codebase (src/shared/util.[ch]). What you have here is close, but a minor switch in the return value and the first parameter makes this a whole lot more useful.
+ * + * @param data source memory space + * @param current size of the space pointed to be data + * @param required size you want + * @return new memory + */ +void *_alpm_realloc(void *data, size_t *current, const size_t required) +{ + if(required >= *current) { + size_t old_size = *current; + char *newdata; + size_t newsize = *current; + + if(*current == 0) { + newsize = ALPM_BUFFER_SIZE; + } else if (required > (*current) * 2) { + newsize = required * 2; + } else { + newsize *= 2; + } + + newdata = realloc(data, newsize); + if(!newdata) { + _alpm_alloc_fail(newsize); + return NULL; + } + /* ensure all new memory is zeroed out, in both the initial + * allocation and later reallocs */ + memset(newdata + old_size, 0, *current - old_size); + *current = newsize; + return newdata; + } + return data; +} + void _alpm_alloc_fail(size_t size) { fprintf(stderr, "alloc failure: could not allocate %zd bytes\n", size); diff --git a/lib/libalpm/util.h b/lib/libalpm/util.h index b566868..b7e6e90 100644 --- a/lib/libalpm/util.h +++ b/lib/libalpm/util.h @@ -141,6 +141,7 @@ int _alpm_raw_ncmp(const char *first, const char *second, size_t max); int _alpm_access(alpm_handle_t *handle, const char *dir, const char *file, int amode); int _alpm_fnmatch_patterns(alpm_list_t *patterns, const char *string); int _alpm_fnmatch(const void *pattern, const void *string); +void *_alpm_realloc(void *data, size_t *current, const size_t required);
#ifndef HAVE_STRSEP char *strsep(char **, const char *); -- 1.8.5.3
Signed-off-by: Florian Pritz <bluewind@xinu.at> --- v2: - Incorporated suggestions by Dave (take **data instead of *data) - Handle overflows RFC: systemd's greedy_realloc0() uses uint8_t instead of char for *newdata. Is that important or doesn't matter? lib/libalpm/util.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ lib/libalpm/util.h | 1 + 2 files changed, 47 insertions(+) diff --git a/lib/libalpm/util.c b/lib/libalpm/util.c index 1bbc768..753cdd7 100644 --- a/lib/libalpm/util.c +++ b/lib/libalpm/util.c @@ -1287,6 +1287,52 @@ int _alpm_fnmatch(const void *pattern, const void *string) return fnmatch(pattern, string, 0); } +/** + * Think of this as realloc with error handling and + * automatic growing based on current/required + * + * @param data source memory space + * @param current size of the space pointed to be data + * @param required size you want + * @return new memory + */ +void *_alpm_realloc(void **data, size_t *current, const size_t required) +{ + size_t old_size = *current; + char *newdata; + size_t newsize = *current; + + if(*current >= required) { + return data; + } + + if(*current == 0) { + newsize = ALPM_BUFFER_SIZE; + } else if (required > (*current) * 2) { + newsize = required * 2; + } else { + newsize *= 2; + } + + /* check for overflows */ + if (newsize < required) { + return NULL; + } + + newdata = realloc(*data, newsize); + if(!newdata) { + _alpm_alloc_fail(newsize); + return NULL; + } + + /* ensure all new memory is zeroed out, in both the initial + * allocation and later reallocs */ + memset(newdata + old_size, 0, *current - old_size); + *current = newsize; + *data = newdata; + return newdata; +} + void _alpm_alloc_fail(size_t size) { fprintf(stderr, "alloc failure: could not allocate %zd bytes\n", size); diff --git a/lib/libalpm/util.h b/lib/libalpm/util.h index b566868..031254a 100644 --- a/lib/libalpm/util.h +++ b/lib/libalpm/util.h @@ -141,6 +141,7 @@ int _alpm_raw_ncmp(const char *first, const char *second, size_t max); int _alpm_access(alpm_handle_t *handle, const char *dir, const char *file, int amode); int _alpm_fnmatch_patterns(alpm_list_t *patterns, const char *string); int _alpm_fnmatch(const void *pattern, const void *string); +void *_alpm_realloc(void **data, size_t *current, const size_t required); #ifndef HAVE_STRSEP char *strsep(char **, const char *); -- 1.8.5.3
On 01/28/14 at 11:12am, Florian Pritz wrote:
Signed-off-by: Florian Pritz <bluewind@xinu.at> --- v2: - Incorporated suggestions by Dave (take **data instead of *data) - Handle overflows
RFC: systemd's greedy_realloc0() uses uint8_t instead of char for *newdata. Is that important or doesn't matter?
lib/libalpm/util.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ lib/libalpm/util.h | 1 + 2 files changed, 47 insertions(+)
diff --git a/lib/libalpm/util.c b/lib/libalpm/util.c index 1bbc768..753cdd7 100644 --- a/lib/libalpm/util.c +++ b/lib/libalpm/util.c @@ -1287,6 +1287,52 @@ int _alpm_fnmatch(const void *pattern, const void *string) return fnmatch(pattern, string, 0); }
+/** + * Think of this as realloc with error handling and + * automatic growing based on current/required + * + * @param data source memory space + * @param current size of the space pointed to be data + * @param required size you want + * @return new memory + */
This should mention the fact that this is grow-only and "greedy" so required is really the minimum needed and won't be the final size. A note about possibly handing back unused portions of the allocated memory might not hurt either.
+void *_alpm_realloc(void **data, size_t *current, const size_t required) +{ + size_t old_size = *current;
This variable is entirely unnecessary because you don't update current until you're about to return.
+ char *newdata; + size_t newsize = *current; + + if(*current >= required) { + return data; + } + + if(*current == 0) { + newsize = ALPM_BUFFER_SIZE; + } else if (required > (*current) * 2) { + newsize = required * 2; + } else { + newsize *= 2;
I think this will read better if you set newsize = current * 2 here instead of setting newsize = current above and doubling it.
+ } + + /* check for overflows */ + if (newsize < required) { + return NULL; + } + + newdata = realloc(*data, newsize); + if(!newdata) { + _alpm_alloc_fail(newsize); + return NULL; + } + + /* ensure all new memory is zeroed out, in both the initial + * allocation and later reallocs */ + memset(newdata + old_size, 0, *current - old_size);
This does nothing because *current still equals old_size.
+ *current = newsize; + *data = newdata; + return newdata; +} + void _alpm_alloc_fail(size_t size) { fprintf(stderr, "alloc failure: could not allocate %zd bytes\n", size); diff --git a/lib/libalpm/util.h b/lib/libalpm/util.h index b566868..031254a 100644 --- a/lib/libalpm/util.h +++ b/lib/libalpm/util.h @@ -141,6 +141,7 @@ int _alpm_raw_ncmp(const char *first, const char *second, size_t max); int _alpm_access(alpm_handle_t *handle, const char *dir, const char *file, int amode); int _alpm_fnmatch_patterns(alpm_list_t *patterns, const char *string); int _alpm_fnmatch(const void *pattern, const void *string); +void *_alpm_realloc(void **data, size_t *current, const size_t required);
#ifndef HAVE_STRSEP char *strsep(char **, const char *); -- 1.8.5.3
Signed-off-by: Florian Pritz <bluewind@xinu.at> --- v3: Implement suggestions from Andrew RFC from v2 still holds: is using char for *newdata fine or is there any good reason to use uint8_t like systemd does? lib/libalpm/util.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++ lib/libalpm/util.h | 1 + 2 files changed, 50 insertions(+) diff --git a/lib/libalpm/util.c b/lib/libalpm/util.c index 150b85e..28b8646 100644 --- a/lib/libalpm/util.c +++ b/lib/libalpm/util.c @@ -1287,6 +1287,55 @@ int _alpm_fnmatch(const void *pattern, const void *string) return fnmatch(pattern, string, 0); } +/** + * Think of this as realloc with error handling and + * automatic growing based on current/required. + * + * data will point to a memory space with at least required + * bytes, likely more. You may want to free unused memory. + * If required < current data is returned and nothing happens. + * + * @param data source memory space + * @param current size of the space pointed to be data + * @param required size you want + * @return new memory if grown; old memory otherwise; NULL on error + */ +void *_alpm_realloc(void **data, size_t *current, const size_t required) +{ + char *newdata; + size_t newsize = 0; + + if(*current >= required) { + return data; + } + + if(*current == 0) { + newsize = ALPM_BUFFER_SIZE; + } else if (required > (*current) * 2) { + newsize = required * 2; + } else { + newsize = *current * 2; + } + + /* check for overflows */ + if (newsize < required) { + return NULL; + } + + newdata = realloc(*data, newsize); + if(!newdata) { + _alpm_alloc_fail(newsize); + return NULL; + } + + /* ensure all new memory is zeroed out, in both the initial + * allocation and later reallocs */ + memset(newdata + *current, 0, newsize - *current); + *current = newsize; + *data = newdata; + return newdata; +} + void _alpm_alloc_fail(size_t size) { fprintf(stderr, "alloc failure: could not allocate %zd bytes\n", size); diff --git a/lib/libalpm/util.h b/lib/libalpm/util.h index 250c530..f348540 100644 --- a/lib/libalpm/util.h +++ b/lib/libalpm/util.h @@ -141,6 +141,7 @@ int _alpm_raw_ncmp(const char *first, const char *second, size_t max); int _alpm_access(alpm_handle_t *handle, const char *dir, const char *file, int amode); int _alpm_fnmatch_patterns(alpm_list_t *patterns, const char *string); int _alpm_fnmatch(const void *pattern, const void *string); +void *_alpm_realloc(void **data, size_t *current, const size_t required); #ifndef HAVE_STRSEP char *strsep(char **, const char *); -- 1.8.5.3
On 31/01/14 02:24, Florian Pritz wrote:
Signed-off-by: Florian Pritz <bluewind@xinu.at> ---
v3: Implement suggestions from Andrew
RFC from v2 still holds: is using char for *newdata fine or is there any good reason to use uint8_t like systemd does?
lib/libalpm/util.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++ lib/libalpm/util.h | 1 + 2 files changed, 50 insertions(+)
diff --git a/lib/libalpm/util.c b/lib/libalpm/util.c index 150b85e..28b8646 100644 --- a/lib/libalpm/util.c +++ b/lib/libalpm/util.c @@ -1287,6 +1287,55 @@ int _alpm_fnmatch(const void *pattern, const void *string) return fnmatch(pattern, string, 0); }
+/** + * Think of this as realloc with error handling and + * automatic growing based on current/required. + * + * data will point to a memory space with at least required + * bytes, likely more. You may want to free unused memory. + * If required < current data is returned and nothing happens. + * + * @param data source memory space + * @param current size of the space pointed to be data + * @param required size you want + * @return new memory if grown; old memory otherwise; NULL on error + */ +void *_alpm_realloc(void **data, size_t *current, const size_t required) +{ + char *newdata; + size_t newsize = 0; + + if(*current >= required) { + return data; + } + + if(*current == 0) { + newsize = ALPM_BUFFER_SIZE; + } else if (required > (*current) * 2) { + newsize = required * 2; + } else { + newsize = *current * 2; + }
What is this? It takes a "required" parameter and then returns whatever it feels like and ...
+ /* check for overflows */ + if (newsize < required) { + return NULL; + }
... doing that can cause an overflow. I'd say the correct place for an overflow check should be when calculating required before calling this function.
+ newdata = realloc(*data, newsize); + if(!newdata) { + _alpm_alloc_fail(newsize); + return NULL; + } + + /* ensure all new memory is zeroed out, in both the initial + * allocation and later reallocs */ + memset(newdata + *current, 0, newsize - *current);
Do we really need that memset? Anyway, I see no need for this function at all. Just use plain realloc and check the return. You still need to check what this returns anyway. It is more work for zero gain, particularly if you just use ALPM_BUFFER_SIZE at the start and double the size each realloc. It is exactly what your formula currently does because "required > (*current) * 2" can never happen in your filelist reading from mtree function in patch #5. As an aside, using packages with mtree files on my system: 47.7% of packages will not need a realloc 15.5% will need 1 realloc 14.3% will need 2 8.6% will need 3 6.6% will need 4 ... I found 1 package on my system needing 8. That looks fine (maybe decays a bit slow, but it will do). Allan
On 02/02/14 20:39, Allan McRae wrote:
On 31/01/14 02:24, Florian Pritz wrote:
Signed-off-by: Florian Pritz <bluewind@xinu.at> ---
v3: Implement suggestions from Andrew
RFC from v2 still holds: is using char for *newdata fine or is there any good reason to use uint8_t like systemd does?
lib/libalpm/util.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++ lib/libalpm/util.h | 1 + 2 files changed, 50 insertions(+)
diff --git a/lib/libalpm/util.c b/lib/libalpm/util.c index 150b85e..28b8646 100644 --- a/lib/libalpm/util.c +++ b/lib/libalpm/util.c @@ -1287,6 +1287,55 @@ int _alpm_fnmatch(const void *pattern, const void *string) return fnmatch(pattern, string, 0); }
+/** + * Think of this as realloc with error handling and + * automatic growing based on current/required. + * + * data will point to a memory space with at least required + * bytes, likely more. You may want to free unused memory. + * If required < current data is returned and nothing happens. + * + * @param data source memory space + * @param current size of the space pointed to be data + * @param required size you want + * @return new memory if grown; old memory otherwise; NULL on error + */ +void *_alpm_realloc(void **data, size_t *current, const size_t required) +{ + char *newdata; + size_t newsize = 0; + + if(*current >= required) { + return data; + } + + if(*current == 0) { + newsize = ALPM_BUFFER_SIZE; + } else if (required > (*current) * 2) { + newsize = required * 2; + } else { + newsize = *current * 2; + }
What is this? It takes a "required" parameter and then returns whatever it feels like and ...
+ /* check for overflows */ + if (newsize < required) { + return NULL; + }
... doing that can cause an overflow.
I'd say the correct place for an overflow check should be when calculating required before calling this function.
+ newdata = realloc(*data, newsize); + if(!newdata) { + _alpm_alloc_fail(newsize); + return NULL; + } + + /* ensure all new memory is zeroed out, in both the initial + * allocation and later reallocs */ + memset(newdata + *current, 0, newsize - *current);
Do we really need that memset?
Anyway, I see no need for this function at all. Just use plain realloc and check the return. You still need to check what this returns anyway. It is more work for zero gain, particularly if you just use ALPM_BUFFER_SIZE at the start and double the size each realloc. It is exactly what your formula currently does because "required > (*current) * 2" can never happen in your filelist reading from mtree function in patch #5.
Reading through the rest of your patches and seeing you are refactoring some of the other realloc calls into this, I'd be happy if this was just a wrapper to realloc that did the memset. I.e. You just realloc to the required value. Then do the memset if current < required. Allan
This is used to deduplicate code when using the mtree as the file list source. Signed-off-by: Florian Pritz <bluewind@xinu.at> --- lib/libalpm/be_package.c | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/lib/libalpm/be_package.c b/lib/libalpm/be_package.c index 149f281..066c220 100644 --- a/lib/libalpm/be_package.c +++ b/lib/libalpm/be_package.c @@ -353,6 +353,23 @@ int _alpm_pkg_validate_internal(alpm_handle_t *handle, } /** + * Handle the existance of simple paths for _alpm_load_pkg_internal() + * @param pkg package to change + * @param path path to examine + * @return 0 if path doesn't match any rule, 1 if it has been handled + */ +static int handle_simple_path(alpm_pkg_t *pkg, const char *path) +{ + if(strcmp(path, ".INSTALL") == 0) { + pkg->scriptlet = 1; + } else { + return 0; + } + + return 1; +} + +/** * Load a package and create the corresponding alpm_pkg_t struct. * @param handle the context handle * @param pkgfile path to the package file @@ -423,8 +440,8 @@ alpm_pkg_t *_alpm_pkg_load_internal(alpm_handle_t *handle, } config = 1; continue; - } else if(strcmp(entry_name, ".INSTALL") == 0) { - newpkg->scriptlet = 1; + } else if(handle_simple_path(newpkg, entry_name)) { + continue; } else if(*entry_name == '.') { /* for now, ignore all files starting with '.' that haven't * already been handled (for future possibilities) */ -- 1.8.5.3
On Mon, Jan 27, 2014 at 5:02 PM, Florian Pritz <bluewind@xinu.at> wrote:
This is used to deduplicate code when using the mtree as the file list source.
Signed-off-by: Florian Pritz <bluewind@xinu.at> --- lib/libalpm/be_package.c | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-)
diff --git a/lib/libalpm/be_package.c b/lib/libalpm/be_package.c index 149f281..066c220 100644 --- a/lib/libalpm/be_package.c +++ b/lib/libalpm/be_package.c @@ -353,6 +353,23 @@ int _alpm_pkg_validate_internal(alpm_handle_t *handle, }
/** + * Handle the existance of simple paths for _alpm_load_pkg_internal() + * @param pkg package to change + * @param path path to examine + * @return 0 if path doesn't match any rule, 1 if it has been handled + */ +static int handle_simple_path(alpm_pkg_t *pkg, const char *path) +{ + if(strcmp(path, ".INSTALL") == 0) { + pkg->scriptlet = 1; + } else { + return 0; + } + + return 1;
This felt a little weird to read and understand the flow. Why not just `return 1` from in the if block, drop the else completely, and return 0 at the end of the function? +}
+ +/** * Load a package and create the corresponding alpm_pkg_t struct. * @param handle the context handle * @param pkgfile path to the package file @@ -423,8 +440,8 @@ alpm_pkg_t *_alpm_pkg_load_internal(alpm_handle_t *handle, } config = 1; continue; - } else if(strcmp(entry_name, ".INSTALL") == 0) { - newpkg->scriptlet = 1; + } else if(handle_simple_path(newpkg, entry_name)) { + continue; } else if(*entry_name == '.') { /* for now, ignore all files starting with '.' that haven't * already been handled (for future possibilities) */ -- 1.8.5.3
On 28.01.2014 00:31, Dan McGee wrote:
+static int handle_simple_path(alpm_pkg_t *pkg, const char *path) +{ + if(strcmp(path, ".INSTALL") == 0) { + pkg->scriptlet = 1; + } else { + return 0; + } + + return 1;
This felt a little weird to read and understand the flow. Why not just `return 1` from in the if block, drop the else completely, and return 0 at the end of the function?
The idea was to make it easy to expand to more files (no need to remember to add returns), but I guess YAGNI and yeah not really readable. Will be fixed in the next patch.
This is used to deduplicate code when using the mtree as the file list source. Signed-off-by: Florian Pritz <bluewind@xinu.at> --- v2: Make it easier to read by putting the return 1 into the if block. lib/libalpm/be_package.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/lib/libalpm/be_package.c b/lib/libalpm/be_package.c index 149f281..0801fc6 100644 --- a/lib/libalpm/be_package.c +++ b/lib/libalpm/be_package.c @@ -353,6 +353,22 @@ int _alpm_pkg_validate_internal(alpm_handle_t *handle, } /** + * Handle the existance of simple paths for _alpm_load_pkg_internal() + * @param pkg package to change + * @param path path to examine + * @return 0 if path doesn't match any rule, 1 if it has been handled + */ +static int handle_simple_path(alpm_pkg_t *pkg, const char *path) +{ + if(strcmp(path, ".INSTALL") == 0) { + pkg->scriptlet = 1; + return 1; + } + + return 0; +} + +/** * Load a package and create the corresponding alpm_pkg_t struct. * @param handle the context handle * @param pkgfile path to the package file @@ -423,8 +439,8 @@ alpm_pkg_t *_alpm_pkg_load_internal(alpm_handle_t *handle, } config = 1; continue; - } else if(strcmp(entry_name, ".INSTALL") == 0) { - newpkg->scriptlet = 1; + } else if(handle_simple_path(newpkg, entry_name)) { + continue; } else if(*entry_name == '.') { /* for now, ignore all files starting with '.' that haven't * already been handled (for future possibilities) */ -- 1.8.5.3
This is used to deduplicate code when using the mtree as the file list source. Signed-off-by: Florian Pritz <bluewind@xinu.at> --- v4: move check for files starting with . into function (suggested by Andrew in patch 5 v2) lib/libalpm/be_package.c | 27 ++++++++++++++++++++++----- 1 file changed, 22 insertions(+), 5 deletions(-) diff --git a/lib/libalpm/be_package.c b/lib/libalpm/be_package.c index 2fbb1db..5980784 100644 --- a/lib/libalpm/be_package.c +++ b/lib/libalpm/be_package.c @@ -353,6 +353,26 @@ int _alpm_pkg_validate_internal(alpm_handle_t *handle, } /** + * Handle the existance of simple paths for _alpm_load_pkg_internal() + * @param pkg package to change + * @param path path to examine + * @return 0 if path doesn't match any rule, 1 if it has been handled + */ +static int handle_simple_path(alpm_pkg_t *pkg, const char *path) +{ + if(strcmp(path, ".INSTALL") == 0) { + pkg->scriptlet = 1; + return 1; + } else if(*path == '.') { + /* for now, ignore all files starting with '.' that haven't + * already been handled (for future possibilities) */ + return 1; + } + + return 0; +} + +/** * Load a package and create the corresponding alpm_pkg_t struct. * @param handle the context handle * @param pkgfile path to the package file @@ -423,11 +443,8 @@ alpm_pkg_t *_alpm_pkg_load_internal(alpm_handle_t *handle, } config = 1; continue; - } else if(strcmp(entry_name, ".INSTALL") == 0) { - newpkg->scriptlet = 1; - } else if(*entry_name == '.') { - /* for now, ignore all files starting with '.' that haven't - * already been handled (for future possibilities) */ + } else if(handle_simple_path(newpkg, entry_name)) { + continue; } else if(full) { const size_t files_count = newpkg->files.count; alpm_file_t *current_file; -- 1.8.5.3
Signed-off-by: Florian Pritz <bluewind@xinu.at> --- v4: Pass path via argument (suggested by Andrew in patch 5 v2) lib/libalpm/be_package.c | 56 +++++++++++++++++++++++++----------------------- 1 file changed, 29 insertions(+), 27 deletions(-) diff --git a/lib/libalpm/be_package.c b/lib/libalpm/be_package.c index 5980784..393f436 100644 --- a/lib/libalpm/be_package.c +++ b/lib/libalpm/be_package.c @@ -373,6 +373,33 @@ static int handle_simple_path(alpm_pkg_t *pkg, const char *path) } /** + * Add a file to the files list for pkg. + * + * @param pkg package to add the file to + * @param files_size size of pkg->files.files + * @param entry archive entry of the file to add to the list + * @param path path of the file to be added + * @return <0 on error, 0 on success + */ +static int add_entry_to_files_list(alpm_pkg_t *pkg, size_t *files_size, + struct archive_entry *entry, const char *path) +{ + const size_t files_count = pkg->files.count; + alpm_file_t *current_file; + + if(!_alpm_realloc((void **)&pkg->files.files, files_size, (files_count + 1) * sizeof(alpm_file_t))) { + return -1; + } + + current_file = pkg->files.files + files_count; + STRDUP(current_file->name, path, return -1); + current_file->size = archive_entry_size(entry); + current_file->mode = archive_entry_mode(entry); + pkg->files.count++; + return 0; +} + +/** * Load a package and create the corresponding alpm_pkg_t struct. * @param handle the context handle * @param pkgfile path to the package file @@ -446,34 +473,9 @@ alpm_pkg_t *_alpm_pkg_load_internal(alpm_handle_t *handle, } else if(handle_simple_path(newpkg, entry_name)) { continue; } else if(full) { - const size_t files_count = newpkg->files.count; - alpm_file_t *current_file; - /* Keep track of all files for filelist generation */ - if(files_count >= files_size) { - size_t old_size = files_size; - alpm_file_t *newfiles; - if(files_size == 0) { - files_size = 4; - } else { - files_size *= 2; - } - newfiles = realloc(newpkg->files.files, - sizeof(alpm_file_t) * files_size); - if(!newfiles) { - _alpm_alloc_fail(sizeof(alpm_file_t) * files_size); - goto error; - } - /* ensure all new memory is zeroed out, in both the initial - * allocation and later reallocs */ - memset(newfiles + old_size, 0, - sizeof(alpm_file_t) * (files_size - old_size)); - newpkg->files.files = newfiles; + if(add_entry_to_files_list(newpkg, &files_size, entry, entry_name) < 0) { + goto error; } - current_file = newpkg->files.files + files_count; - STRDUP(current_file->name, entry_name, goto error); - current_file->size = archive_entry_size(entry); - current_file->mode = archive_entry_mode(entry); - newpkg->files.count++; } if(archive_read_data_skip(archive)) { -- 1.8.5.3
This greatly speeds up file list generation times by avoiding uncompressing the whole package. pacman -S base with a deliberate file conflict: before: 9.1 seconds after: 2.2 seconds Signed-off-by: Florian Pritz <bluewind@xinu.at> --- v4: - fix mem leak when cleaning up pkg->files - make mtree_*size names a little more descriptive - switch comments to c style - break some long lines, there are still a few left.. - files starting with . are handled by handle_simple_path now - pass the path directly to add_entry_to_files_list - consolidate early breaking in main loop NOT changed: - ret for archive_read_next_header: copied from original code in _alpm_pkg_load_internal, if anyone cares feel free to change it later - I didn't break all lines > 80 chars, that should probably be done in a line break cleanup only commit lib/libalpm/be_package.c | 150 +++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 146 insertions(+), 4 deletions(-) diff --git a/lib/libalpm/be_package.c b/lib/libalpm/be_package.c index 393f436..ac52ae1 100644 --- a/lib/libalpm/be_package.c +++ b/lib/libalpm/be_package.c @@ -386,13 +386,36 @@ static int add_entry_to_files_list(alpm_pkg_t *pkg, size_t *files_size, { const size_t files_count = pkg->files.count; alpm_file_t *current_file; + mode_t type; + size_t pathlen; if(!_alpm_realloc((void **)&pkg->files.files, files_size, (files_count + 1) * sizeof(alpm_file_t))) { return -1; } + type = archive_entry_filetype(entry); + + pathlen = strlen(path); + current_file = pkg->files.files + files_count; - STRDUP(current_file->name, path, return -1); + + /* mtree paths don't contain a tailing slash, those we get from + * the archive directly do (expensive way) + * Other code relies on it to detect directories so add it here.*/ + if(type == AE_IFDIR && path[pathlen - 1] != '/') { + /* 2 = 1 for / + 1 for \0 */ + char *newpath = malloc(pathlen + 2); + if (!newpath) { + _alpm_alloc_fail(pathlen + 2); + return -1; + } + strcpy(newpath, path); + newpath[pathlen] = '/'; + newpath[pathlen + 1] = '\0'; + current_file->name = newpath; + } else { + STRDUP(current_file->name, path, return -1); + } current_file->size = archive_entry_size(entry); current_file->mode = archive_entry_mode(entry); pkg->files.count++; @@ -400,6 +423,115 @@ static int add_entry_to_files_list(alpm_pkg_t *pkg, size_t *files_size, } /** + * Generate a new file list from an mtree file and add it to the package. + * An existing file list will be free()d first. + * + * archive should point to an archive struct which is already at the + * position of the mtree's header. + * + * @param handle + * @param pkg package to add the file list to + * @param archive archive containing the mtree + * @return 0 on success, <0 on error + */ +static int build_filelist_from_mtree(alpm_handle_t *handle, alpm_pkg_t *pkg, struct archive *archive) +{ + int ret = 0; + size_t mtree_maxsize = 0; + size_t mtree_cursize = 0; + size_t files_size = 0; /* we clean up the existing array so this is fine */ + char *mtree_data = NULL; + struct archive *mtree; + struct archive_entry *mtree_entry = NULL; + + _alpm_log(handle, ALPM_LOG_DEBUG, + "found mtree for package %s, getting file list\n", pkg->filename); + + /* throw away any files we might have already found */ + for (size_t i = 0; i < pkg->files.count; i++) { + free(pkg->files.files[i].name); + } + free(pkg->files.files); + pkg->files.files = NULL; + pkg->files.count = 0; + + /* create a new archive to parse the mtree and load it from archive into memory */ + /* TODO: split this into a function */ + if((mtree = archive_read_new()) == NULL) { + handle->pm_errno = ALPM_ERR_LIBARCHIVE; + goto error; + } + + _alpm_archive_read_support_filter_all(mtree); + archive_read_support_format_mtree(mtree); + + /* TODO: split this into a function */ + while(1) { + ssize_t size; + + if(!_alpm_realloc((void **)&mtree_data, &mtree_maxsize, mtree_cursize + ALPM_BUFFER_SIZE)) { + goto error; + } + + size = archive_read_data(archive, mtree_data + mtree_cursize, ALPM_BUFFER_SIZE); + + if(size < 0) { + _alpm_log(handle, ALPM_LOG_ERROR, _("error while reading package %s: %s\n"), + pkg->filename, archive_error_string(archive)); + handle->pm_errno = ALPM_ERR_LIBARCHIVE; + goto error; + } + if(size == 0) { + break; + } + + mtree_cursize += size; + } + + /* TODO: need wrapper function for archive_read_open_memory for old libarchive versions? */ + if(archive_read_open_memory(mtree, mtree_data, mtree_cursize)) { + _alpm_log(handle, ALPM_LOG_ERROR, + _("error while reading mtree of package %s: %s\n"), + pkg->filename, archive_error_string(mtree)); + handle->pm_errno = ALPM_ERR_LIBARCHIVE; + goto error; + } + + while((ret = archive_read_next_header(mtree, &mtree_entry)) == ARCHIVE_OK) { + const char *path = archive_entry_pathname(mtree_entry); + + /* strip leading "./" from path entries */ + if(path[0] == '.' && path[1] == '/') { + path += 2; + } + + if(handle_simple_path(pkg, path)) { + continue; + } + + if(add_entry_to_files_list(pkg, &files_size, mtree_entry, path) < 0) { + goto error; + } + } + + if(ret != ARCHIVE_EOF && ret != ARCHIVE_OK) { /* An error occurred */ + _alpm_log(handle, ALPM_LOG_ERROR, _("error while reading mtree of package %s: %s\n"), + pkg->filename, archive_error_string(mtree)); + handle->pm_errno = ALPM_ERR_LIBARCHIVE; + goto error; + } + + free(mtree_data); + _alpm_archive_read_free(mtree); + _alpm_log(handle, ALPM_LOG_DEBUG, "finished mtree reading for %s\n", pkg->filename); + return 0; +error: + free(mtree_data); + _alpm_archive_read_free(mtree); + return -1; +} + +/** * Load a package and create the corresponding alpm_pkg_t struct. * @param handle the context handle * @param pkgfile path to the package file @@ -409,7 +541,7 @@ static int add_entry_to_files_list(alpm_pkg_t *pkg, size_t *files_size, alpm_pkg_t *_alpm_pkg_load_internal(alpm_handle_t *handle, const char *pkgfile, int full) { - int ret, fd, config = 0; + int ret, fd, config, hit_mtree = 0; struct archive *archive; struct archive_entry *entry; alpm_pkg_t *newpkg; @@ -472,7 +604,17 @@ alpm_pkg_t *_alpm_pkg_load_internal(alpm_handle_t *handle, continue; } else if(handle_simple_path(newpkg, entry_name)) { continue; - } else if(full) { + } else if(full && strcmp(entry_name, ".MTREE") == 0) { + /* building the file list: cheap way + * get the filelist from the mtree file rather than scanning + * the whole archive */ + if(build_filelist_from_mtree(handle, newpkg, archive) < 0) { + goto error; + } + hit_mtree = 1; + continue; + } else if(full && !hit_mtree) { + /* building the file list: expensive way */ if(add_entry_to_files_list(newpkg, &files_size, entry, entry_name) < 0) { goto error; } @@ -486,7 +628,7 @@ alpm_pkg_t *_alpm_pkg_load_internal(alpm_handle_t *handle, } /* if we are not doing a full read, see if we have all we need */ - if(!full && config) { + if((!full || hit_mtree) && config) { break; } } -- 1.8.5.3
On 02/02/14 04:12, Florian Pritz wrote:
This greatly speeds up file list generation times by avoiding uncompressing the whole package.
pacman -S base with a deliberate file conflict: before: 9.1 seconds after: 2.2 seconds
Signed-off-by: Florian Pritz <bluewind@xinu.at> ---
v4:
- fix mem leak when cleaning up pkg->files - make mtree_*size names a little more descriptive - switch comments to c style - break some long lines, there are still a few left.. - files starting with . are handled by handle_simple_path now - pass the path directly to add_entry_to_files_list - consolidate early breaking in main loop
NOT changed:
- ret for archive_read_next_header: copied from original code in _alpm_pkg_load_internal, if anyone cares feel free to change it later - I didn't break all lines > 80 chars, that should probably be done in a line break cleanup only commit
lib/libalpm/be_package.c | 150 +++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 146 insertions(+), 4 deletions(-)
diff --git a/lib/libalpm/be_package.c b/lib/libalpm/be_package.c index 393f436..ac52ae1 100644 --- a/lib/libalpm/be_package.c +++ b/lib/libalpm/be_package.c @@ -386,13 +386,36 @@ static int add_entry_to_files_list(alpm_pkg_t *pkg, size_t *files_size, { const size_t files_count = pkg->files.count; alpm_file_t *current_file; + mode_t type; + size_t pathlen;
if(!_alpm_realloc((void **)&pkg->files.files, files_size, (files_count + 1) * sizeof(alpm_file_t))) { return -1; }
+ type = archive_entry_filetype(entry); + + pathlen = strlen(path); + current_file = pkg->files.files + files_count; - STRDUP(current_file->name, path, return -1); + + /* mtree paths don't contain a tailing slash, those we get from + * the archive directly do (expensive way) + * Other code relies on it to detect directories so add it here.*/ + if(type == AE_IFDIR && path[pathlen - 1] != '/') { + /* 2 = 1 for / + 1 for \0 */ + char *newpath = malloc(pathlen + 2); + if (!newpath) { + _alpm_alloc_fail(pathlen + 2); + return -1; + } + strcpy(newpath, path); + newpath[pathlen] = '/'; + newpath[pathlen + 1] = '\0'; + current_file->name = newpath; + } else { + STRDUP(current_file->name, path, return -1); + } current_file->size = archive_entry_size(entry); current_file->mode = archive_entry_mode(entry); pkg->files.count++; @@ -400,6 +423,115 @@ static int add_entry_to_files_list(alpm_pkg_t *pkg, size_t *files_size, }
/** + * Generate a new file list from an mtree file and add it to the package. + * An existing file list will be free()d first. + * + * archive should point to an archive struct which is already at the + * position of the mtree's header. + * + * @param handle + * @param pkg package to add the file list to + * @param archive archive containing the mtree + * @return 0 on success, <0 on error + */ +static int build_filelist_from_mtree(alpm_handle_t *handle, alpm_pkg_t *pkg, struct archive *archive) +{ + int ret = 0; + size_t mtree_maxsize = 0; + size_t mtree_cursize = 0; + size_t files_size = 0; /* we clean up the existing array so this is fine */ + char *mtree_data = NULL; + struct archive *mtree; + struct archive_entry *mtree_entry = NULL; + + _alpm_log(handle, ALPM_LOG_DEBUG, + "found mtree for package %s, getting file list\n", pkg->filename); + + /* throw away any files we might have already found */ + for (size_t i = 0; i < pkg->files.count; i++) { + free(pkg->files.files[i].name); + } + free(pkg->files.files); + pkg->files.files = NULL; + pkg->files.count = 0; + + /* create a new archive to parse the mtree and load it from archive into memory */ + /* TODO: split this into a function */ + if((mtree = archive_read_new()) == NULL) { + handle->pm_errno = ALPM_ERR_LIBARCHIVE; + goto error; + } + + _alpm_archive_read_support_filter_all(mtree); + archive_read_support_format_mtree(mtree); + + /* TODO: split this into a function */ + while(1) { + ssize_t size; + + if(!_alpm_realloc((void **)&mtree_data, &mtree_maxsize, mtree_cursize + ALPM_BUFFER_SIZE)) { + goto error; + } + + size = archive_read_data(archive, mtree_data + mtree_cursize, ALPM_BUFFER_SIZE); + + if(size < 0) { + _alpm_log(handle, ALPM_LOG_ERROR, _("error while reading package %s: %s\n"), + pkg->filename, archive_error_string(archive)); + handle->pm_errno = ALPM_ERR_LIBARCHIVE; + goto error; + } + if(size == 0) { + break; + } + + mtree_cursize += size; + } + + /* TODO: need wrapper function for archive_read_open_memory for old libarchive versions? */
When did support for that function get added?
+ if(archive_read_open_memory(mtree, mtree_data, mtree_cursize)) { + _alpm_log(handle, ALPM_LOG_ERROR, + _("error while reading mtree of package %s: %s\n"), + pkg->filename, archive_error_string(mtree)); + handle->pm_errno = ALPM_ERR_LIBARCHIVE; + goto error; + } + + while((ret = archive_read_next_header(mtree, &mtree_entry)) == ARCHIVE_OK) { + const char *path = archive_entry_pathname(mtree_entry); + + /* strip leading "./" from path entries */ + if(path[0] == '.' && path[1] == '/') { + path += 2; + } + + if(handle_simple_path(pkg, path)) { + continue; + } + + if(add_entry_to_files_list(pkg, &files_size, mtree_entry, path) < 0) { + goto error; + } + } + + if(ret != ARCHIVE_EOF && ret != ARCHIVE_OK) { /* An error occurred */ + _alpm_log(handle, ALPM_LOG_ERROR, _("error while reading mtree of package %s: %s\n"), + pkg->filename, archive_error_string(mtree)); + handle->pm_errno = ALPM_ERR_LIBARCHIVE; + goto error; + } + + free(mtree_data); + _alpm_archive_read_free(mtree); + _alpm_log(handle, ALPM_LOG_DEBUG, "finished mtree reading for %s\n", pkg->filename); + return 0; +error: + free(mtree_data); + _alpm_archive_read_free(mtree); + return -1; +} + +/** * Load a package and create the corresponding alpm_pkg_t struct. * @param handle the context handle * @param pkgfile path to the package file @@ -409,7 +541,7 @@ static int add_entry_to_files_list(alpm_pkg_t *pkg, size_t *files_size, alpm_pkg_t *_alpm_pkg_load_internal(alpm_handle_t *handle, const char *pkgfile, int full) { - int ret, fd, config = 0; + int ret, fd, config, hit_mtree = 0; struct archive *archive; struct archive_entry *entry; alpm_pkg_t *newpkg; @@ -472,7 +604,17 @@ alpm_pkg_t *_alpm_pkg_load_internal(alpm_handle_t *handle, continue; } else if(handle_simple_path(newpkg, entry_name)) { continue;
handle_simple_path will return 1 if the file starts with a ".". ".MTREE" starts with a ".", so....
- } else if(full) { + } else if(full && strcmp(entry_name, ".MTREE") == 0) {
... this can never happen.
+ /* building the file list: cheap way + * get the filelist from the mtree file rather than scanning + * the whole archive */ + if(build_filelist_from_mtree(handle, newpkg, archive) < 0) { + goto error; + } + hit_mtree = 1; + continue; + } else if(full && !hit_mtree) { + /* building the file list: expensive way */ if(add_entry_to_files_list(newpkg, &files_size, entry, entry_name) < 0) { goto error; } @@ -486,7 +628,7 @@ alpm_pkg_t *_alpm_pkg_load_internal(alpm_handle_t *handle, }
/* if we are not doing a full read, see if we have all we need */ - if(!full && config) { + if((!full || hit_mtree) && config) {
Why does hit_mtree matter here? if(!full) we should never read the mtree file.
break; } }
On 02.02.2014 12:13, Allan McRae wrote:
On 02/02/14 04:12, Florian Pritz wrote:
This greatly speeds up file list generation times by avoiding uncompressing the whole package.
pacman -S base with a deliberate file conflict: before: 9.1 seconds after: 2.2 seconds
Signed-off-by: Florian Pritz <bluewind@xinu.at> ---
v4:
- fix mem leak when cleaning up pkg->files - make mtree_*size names a little more descriptive - switch comments to c style - break some long lines, there are still a few left.. - files starting with . are handled by handle_simple_path now - pass the path directly to add_entry_to_files_list - consolidate early breaking in main loop
NOT changed:
- ret for archive_read_next_header: copied from original code in _alpm_pkg_load_internal, if anyone cares feel free to change it later - I didn't break all lines > 80 chars, that should probably be done in a line break cleanup only commit
lib/libalpm/be_package.c | 150 +++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 146 insertions(+), 4 deletions(-)
diff --git a/lib/libalpm/be_package.c b/lib/libalpm/be_package.c index 393f436..ac52ae1 100644 --- a/lib/libalpm/be_package.c +++ b/lib/libalpm/be_package.c @@ -386,13 +386,36 @@ static int add_entry_to_files_list(alpm_pkg_t *pkg, size_t *files_size, { const size_t files_count = pkg->files.count; alpm_file_t *current_file; + mode_t type; + size_t pathlen;
if(!_alpm_realloc((void **)&pkg->files.files, files_size, (files_count + 1) * sizeof(alpm_file_t))) { return -1; }
+ type = archive_entry_filetype(entry); + + pathlen = strlen(path); + current_file = pkg->files.files + files_count; - STRDUP(current_file->name, path, return -1); + + /* mtree paths don't contain a tailing slash, those we get from + * the archive directly do (expensive way) + * Other code relies on it to detect directories so add it here.*/ + if(type == AE_IFDIR && path[pathlen - 1] != '/') { + /* 2 = 1 for / + 1 for \0 */ + char *newpath = malloc(pathlen + 2); + if (!newpath) { + _alpm_alloc_fail(pathlen + 2); + return -1; + } + strcpy(newpath, path); + newpath[pathlen] = '/'; + newpath[pathlen + 1] = '\0'; + current_file->name = newpath; + } else { + STRDUP(current_file->name, path, return -1); + } current_file->size = archive_entry_size(entry); current_file->mode = archive_entry_mode(entry); pkg->files.count++; @@ -400,6 +423,115 @@ static int add_entry_to_files_list(alpm_pkg_t *pkg, size_t *files_size, }
/** + * Generate a new file list from an mtree file and add it to the package. + * An existing file list will be free()d first. + * + * archive should point to an archive struct which is already at the + * position of the mtree's header. + * + * @param handle + * @param pkg package to add the file list to + * @param archive archive containing the mtree + * @return 0 on success, <0 on error + */ +static int build_filelist_from_mtree(alpm_handle_t *handle, alpm_pkg_t *pkg, struct archive *archive) +{ + int ret = 0; + size_t mtree_maxsize = 0; + size_t mtree_cursize = 0; + size_t files_size = 0; /* we clean up the existing array so this is fine */ + char *mtree_data = NULL; + struct archive *mtree; + struct archive_entry *mtree_entry = NULL; + + _alpm_log(handle, ALPM_LOG_DEBUG, + "found mtree for package %s, getting file list\n", pkg->filename); + + /* throw away any files we might have already found */ + for (size_t i = 0; i < pkg->files.count; i++) { + free(pkg->files.files[i].name); + } + free(pkg->files.files); + pkg->files.files = NULL; + pkg->files.count = 0; + + /* create a new archive to parse the mtree and load it from archive into memory */ + /* TODO: split this into a function */ + if((mtree = archive_read_new()) == NULL) { + handle->pm_errno = ALPM_ERR_LIBARCHIVE; + goto error; + } + + _alpm_archive_read_support_filter_all(mtree); + archive_read_support_format_mtree(mtree); + + /* TODO: split this into a function */ + while(1) { + ssize_t size; + + if(!_alpm_realloc((void **)&mtree_data, &mtree_maxsize, mtree_cursize + ALPM_BUFFER_SIZE)) { + goto error; + } + + size = archive_read_data(archive, mtree_data + mtree_cursize, ALPM_BUFFER_SIZE); + + if(size < 0) { + _alpm_log(handle, ALPM_LOG_ERROR, _("error while reading package %s: %s\n"), + pkg->filename, archive_error_string(archive)); + handle->pm_errno = ALPM_ERR_LIBARCHIVE; + goto error; + } + if(size == 0) { + break; + } + + mtree_cursize += size; + } + + /* TODO: need wrapper function for archive_read_open_memory for old libarchive versions? */
When did support for that function get added?
Seems like it's been there from the beginning of the git repo (2007) so I guess I can remove that comment.
+ if(archive_read_open_memory(mtree, mtree_data, mtree_cursize)) { + _alpm_log(handle, ALPM_LOG_ERROR, + _("error while reading mtree of package %s: %s\n"), + pkg->filename, archive_error_string(mtree)); + handle->pm_errno = ALPM_ERR_LIBARCHIVE; + goto error; + } + + while((ret = archive_read_next_header(mtree, &mtree_entry)) == ARCHIVE_OK) { + const char *path = archive_entry_pathname(mtree_entry); + + /* strip leading "./" from path entries */ + if(path[0] == '.' && path[1] == '/') { + path += 2; + } + + if(handle_simple_path(pkg, path)) { + continue; + } + + if(add_entry_to_files_list(pkg, &files_size, mtree_entry, path) < 0) { + goto error; + } + } + + if(ret != ARCHIVE_EOF && ret != ARCHIVE_OK) { /* An error occurred */ + _alpm_log(handle, ALPM_LOG_ERROR, _("error while reading mtree of package %s: %s\n"), + pkg->filename, archive_error_string(mtree)); + handle->pm_errno = ALPM_ERR_LIBARCHIVE; + goto error; + } + + free(mtree_data); + _alpm_archive_read_free(mtree); + _alpm_log(handle, ALPM_LOG_DEBUG, "finished mtree reading for %s\n", pkg->filename); + return 0; +error: + free(mtree_data); + _alpm_archive_read_free(mtree); + return -1; +} + +/** * Load a package and create the corresponding alpm_pkg_t struct. * @param handle the context handle * @param pkgfile path to the package file @@ -409,7 +541,7 @@ static int add_entry_to_files_list(alpm_pkg_t *pkg, size_t *files_size, alpm_pkg_t *_alpm_pkg_load_internal(alpm_handle_t *handle, const char *pkgfile, int full) { - int ret, fd, config = 0; + int ret, fd, config, hit_mtree = 0; struct archive *archive; struct archive_entry *entry; alpm_pkg_t *newpkg; @@ -472,7 +604,17 @@ alpm_pkg_t *_alpm_pkg_load_internal(alpm_handle_t *handle, continue; } else if(handle_simple_path(newpkg, entry_name)) { continue;
handle_simple_path will return 1 if the file starts with a ".". ".MTREE" starts with a ".", so....
- } else if(full) { + } else if(full && strcmp(entry_name, ".MTREE") == 0) {
... this can never happen.
Thanks, messed that up when rebasing some other changes.
+ /* building the file list: cheap way + * get the filelist from the mtree file rather than scanning + * the whole archive */ + if(build_filelist_from_mtree(handle, newpkg, archive) < 0) { + goto error; + } + hit_mtree = 1; + continue; + } else if(full && !hit_mtree) { + /* building the file list: expensive way */ if(add_entry_to_files_list(newpkg, &files_size, entry, entry_name) < 0) { goto error; } @@ -486,7 +628,7 @@ alpm_pkg_t *_alpm_pkg_load_internal(alpm_handle_t *handle, }
/* if we are not doing a full read, see if we have all we need */ - if(!full && config) { + if((!full || hit_mtree) && config) {
Why does hit_mtree matter here? if(!full) we should never read the mtree file.
hit_mtree matters because we want to break the loop once we have a file list (if we need one), but only if we also have read .PKGINFO. Since .MTREE could be before .PKGINFO in the tarball I can't break just after reading the mtree. Does that make sense?
break; } }
On 03/02/14 01:32, Florian Pritz wrote:
On 02.02.2014 12:13, Allan McRae wrote:
On 02/02/14 04:12, Florian Pritz wrote:
This greatly speeds up file list generation times by avoiding uncompressing the whole package.
pacman -S base with a deliberate file conflict: before: 9.1 seconds after: 2.2 seconds
Signed-off-by: Florian Pritz <bluewind@xinu.at> ---
v4:
- fix mem leak when cleaning up pkg->files - make mtree_*size names a little more descriptive - switch comments to c style - break some long lines, there are still a few left.. - files starting with . are handled by handle_simple_path now - pass the path directly to add_entry_to_files_list - consolidate early breaking in main loop
NOT changed:
- ret for archive_read_next_header: copied from original code in _alpm_pkg_load_internal, if anyone cares feel free to change it later - I didn't break all lines > 80 chars, that should probably be done in a line break cleanup only commit
lib/libalpm/be_package.c | 150 +++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 146 insertions(+), 4 deletions(-)
diff --git a/lib/libalpm/be_package.c b/lib/libalpm/be_package.c index 393f436..ac52ae1 100644 --- a/lib/libalpm/be_package.c +++ b/lib/libalpm/be_package.c @@ -386,13 +386,36 @@ static int add_entry_to_files_list(alpm_pkg_t *pkg, size_t *files_size, { const size_t files_count = pkg->files.count; alpm_file_t *current_file; + mode_t type; + size_t pathlen;
if(!_alpm_realloc((void **)&pkg->files.files, files_size, (files_count + 1) * sizeof(alpm_file_t))) { return -1; }
+ type = archive_entry_filetype(entry); + + pathlen = strlen(path); + current_file = pkg->files.files + files_count; - STRDUP(current_file->name, path, return -1); + + /* mtree paths don't contain a tailing slash, those we get from + * the archive directly do (expensive way) + * Other code relies on it to detect directories so add it here.*/ + if(type == AE_IFDIR && path[pathlen - 1] != '/') { + /* 2 = 1 for / + 1 for \0 */ + char *newpath = malloc(pathlen + 2); + if (!newpath) { + _alpm_alloc_fail(pathlen + 2); + return -1; + } + strcpy(newpath, path); + newpath[pathlen] = '/'; + newpath[pathlen + 1] = '\0'; + current_file->name = newpath; + } else { + STRDUP(current_file->name, path, return -1); + } current_file->size = archive_entry_size(entry); current_file->mode = archive_entry_mode(entry); pkg->files.count++; @@ -400,6 +423,115 @@ static int add_entry_to_files_list(alpm_pkg_t *pkg, size_t *files_size, }
/** + * Generate a new file list from an mtree file and add it to the package. + * An existing file list will be free()d first. + * + * archive should point to an archive struct which is already at the + * position of the mtree's header. + * + * @param handle + * @param pkg package to add the file list to + * @param archive archive containing the mtree + * @return 0 on success, <0 on error + */ +static int build_filelist_from_mtree(alpm_handle_t *handle, alpm_pkg_t *pkg, struct archive *archive) +{ + int ret = 0; + size_t mtree_maxsize = 0; + size_t mtree_cursize = 0; + size_t files_size = 0; /* we clean up the existing array so this is fine */ + char *mtree_data = NULL; + struct archive *mtree; + struct archive_entry *mtree_entry = NULL; + + _alpm_log(handle, ALPM_LOG_DEBUG, + "found mtree for package %s, getting file list\n", pkg->filename); + + /* throw away any files we might have already found */ + for (size_t i = 0; i < pkg->files.count; i++) { + free(pkg->files.files[i].name); + } + free(pkg->files.files); + pkg->files.files = NULL; + pkg->files.count = 0; + + /* create a new archive to parse the mtree and load it from archive into memory */ + /* TODO: split this into a function */ + if((mtree = archive_read_new()) == NULL) { + handle->pm_errno = ALPM_ERR_LIBARCHIVE; + goto error; + } + + _alpm_archive_read_support_filter_all(mtree); + archive_read_support_format_mtree(mtree); + + /* TODO: split this into a function */ + while(1) { + ssize_t size; + + if(!_alpm_realloc((void **)&mtree_data, &mtree_maxsize, mtree_cursize + ALPM_BUFFER_SIZE)) { + goto error; + } + + size = archive_read_data(archive, mtree_data + mtree_cursize, ALPM_BUFFER_SIZE); + + if(size < 0) { + _alpm_log(handle, ALPM_LOG_ERROR, _("error while reading package %s: %s\n"), + pkg->filename, archive_error_string(archive)); + handle->pm_errno = ALPM_ERR_LIBARCHIVE; + goto error; + } + if(size == 0) { + break; + } + + mtree_cursize += size; + } + + /* TODO: need wrapper function for archive_read_open_memory for old libarchive versions? */
When did support for that function get added?
Seems like it's been there from the beginning of the git repo (2007) so I guess I can remove that comment.
+ if(archive_read_open_memory(mtree, mtree_data, mtree_cursize)) { + _alpm_log(handle, ALPM_LOG_ERROR, + _("error while reading mtree of package %s: %s\n"), + pkg->filename, archive_error_string(mtree)); + handle->pm_errno = ALPM_ERR_LIBARCHIVE; + goto error; + } + + while((ret = archive_read_next_header(mtree, &mtree_entry)) == ARCHIVE_OK) { + const char *path = archive_entry_pathname(mtree_entry); + + /* strip leading "./" from path entries */ + if(path[0] == '.' && path[1] == '/') { + path += 2; + } + + if(handle_simple_path(pkg, path)) { + continue; + } + + if(add_entry_to_files_list(pkg, &files_size, mtree_entry, path) < 0) { + goto error; + } + } + + if(ret != ARCHIVE_EOF && ret != ARCHIVE_OK) { /* An error occurred */ + _alpm_log(handle, ALPM_LOG_ERROR, _("error while reading mtree of package %s: %s\n"), + pkg->filename, archive_error_string(mtree)); + handle->pm_errno = ALPM_ERR_LIBARCHIVE; + goto error; + } + + free(mtree_data); + _alpm_archive_read_free(mtree); + _alpm_log(handle, ALPM_LOG_DEBUG, "finished mtree reading for %s\n", pkg->filename); + return 0; +error: + free(mtree_data); + _alpm_archive_read_free(mtree); + return -1; +} + +/** * Load a package and create the corresponding alpm_pkg_t struct. * @param handle the context handle * @param pkgfile path to the package file @@ -409,7 +541,7 @@ static int add_entry_to_files_list(alpm_pkg_t *pkg, size_t *files_size, alpm_pkg_t *_alpm_pkg_load_internal(alpm_handle_t *handle, const char *pkgfile, int full) { - int ret, fd, config = 0; + int ret, fd, config, hit_mtree = 0; struct archive *archive; struct archive_entry *entry; alpm_pkg_t *newpkg; @@ -472,7 +604,17 @@ alpm_pkg_t *_alpm_pkg_load_internal(alpm_handle_t *handle, continue; } else if(handle_simple_path(newpkg, entry_name)) { continue;
handle_simple_path will return 1 if the file starts with a ".". ".MTREE" starts with a ".", so....
- } else if(full) { + } else if(full && strcmp(entry_name, ".MTREE") == 0) {
... this can never happen.
Thanks, messed that up when rebasing some other changes.
+ /* building the file list: cheap way + * get the filelist from the mtree file rather than scanning + * the whole archive */ + if(build_filelist_from_mtree(handle, newpkg, archive) < 0) { + goto error; + } + hit_mtree = 1; + continue; + } else if(full && !hit_mtree) { + /* building the file list: expensive way */ if(add_entry_to_files_list(newpkg, &files_size, entry, entry_name) < 0) { goto error; } @@ -486,7 +628,7 @@ alpm_pkg_t *_alpm_pkg_load_internal(alpm_handle_t *handle, }
/* if we are not doing a full read, see if we have all we need */ - if(!full && config) { + if((!full || hit_mtree) && config) {
Why does hit_mtree matter here? if(!full) we should never read the mtree file.
hit_mtree matters because we want to break the loop once we have a file list (if we need one), but only if we also have read .PKGINFO.
Since .MTREE could be before .PKGINFO in the tarball I can't break just after reading the mtree.
Does that make sense?
Sure - I was making assumptions given the order of the files in the tarball. Allan
Signed-off-by: Florian Pritz <bluewind@xinu.at> --- lib/libalpm/be_package.c | 59 ++++++++++++++++++++++++++---------------------- 1 file changed, 32 insertions(+), 27 deletions(-) diff --git a/lib/libalpm/be_package.c b/lib/libalpm/be_package.c index 066c220..5e000bb 100644 --- a/lib/libalpm/be_package.c +++ b/lib/libalpm/be_package.c @@ -370,6 +370,36 @@ static int handle_simple_path(alpm_pkg_t *pkg, const char *path) } /** + * Add a file to the files list for pkg. + * + * @param pkg package to add the file to + * @param files_size size of pkg->files.files + * @param entry archive entry of the file to add to the list + * @return <0 on error, 0 on success + */ +static int add_entry_to_files_list(alpm_pkg_t *pkg, size_t *files_size, struct archive_entry *entry) +{ + const size_t files_count = pkg->files.count; + alpm_file_t *current_file; + const char *path; + alpm_file_t *tmp; + + if((tmp = _alpm_realloc(pkg->files.files, files_size, (files_count + 1) * sizeof(alpm_file_t))) == NULL) { + return -1; + } + pkg->files.files = tmp; + + path = archive_entry_pathname(entry); + + current_file = pkg->files.files + files_count; + STRDUP(current_file->name, path, return -1); + current_file->size = archive_entry_size(entry); + current_file->mode = archive_entry_mode(entry); + pkg->files.count++; + return 0; +} + +/** * Load a package and create the corresponding alpm_pkg_t struct. * @param handle the context handle * @param pkgfile path to the package file @@ -446,34 +476,9 @@ alpm_pkg_t *_alpm_pkg_load_internal(alpm_handle_t *handle, /* for now, ignore all files starting with '.' that haven't * already been handled (for future possibilities) */ } else if(full) { - const size_t files_count = newpkg->files.count; - alpm_file_t *current_file; - /* Keep track of all files for filelist generation */ - if(files_count >= files_size) { - size_t old_size = files_size; - alpm_file_t *newfiles; - if(files_size == 0) { - files_size = 4; - } else { - files_size *= 2; - } - newfiles = realloc(newpkg->files.files, - sizeof(alpm_file_t) * files_size); - if(!newfiles) { - _alpm_alloc_fail(sizeof(alpm_file_t) * files_size); - goto error; - } - /* ensure all new memory is zeroed out, in both the initial - * allocation and later reallocs */ - memset(newfiles + old_size, 0, - sizeof(alpm_file_t) * (files_size - old_size)); - newpkg->files.files = newfiles; + if(add_entry_to_files_list(newpkg, &files_size, entry) < 0) { + goto error; } - current_file = newpkg->files.files + files_count; - STRDUP(current_file->name, entry_name, goto error); - current_file->size = archive_entry_size(entry); - current_file->mode = archive_entry_mode(entry); - newpkg->files.count++; } if(archive_read_data_skip(archive)) { -- 1.8.5.3
Signed-off-by: Florian Pritz <bluewind@xinu.at> --- v2: Adjust for v2 of realloc patch. lib/libalpm/be_package.c | 57 +++++++++++++++++++++++++----------------------- 1 file changed, 30 insertions(+), 27 deletions(-) diff --git a/lib/libalpm/be_package.c b/lib/libalpm/be_package.c index 0801fc6..6ef84fc 100644 --- a/lib/libalpm/be_package.c +++ b/lib/libalpm/be_package.c @@ -369,6 +369,34 @@ static int handle_simple_path(alpm_pkg_t *pkg, const char *path) } /** + * Add a file to the files list for pkg. + * + * @param pkg package to add the file to + * @param files_size size of pkg->files.files + * @param entry archive entry of the file to add to the list + * @return <0 on error, 0 on success + */ +static int add_entry_to_files_list(alpm_pkg_t *pkg, size_t *files_size, struct archive_entry *entry) +{ + const size_t files_count = pkg->files.count; + alpm_file_t *current_file; + const char *path; + + if(!_alpm_realloc((void **)&pkg->files.files, files_size, (files_count + 1) * sizeof(alpm_file_t))) { + return -1; + } + + path = archive_entry_pathname(entry); + + current_file = pkg->files.files + files_count; + STRDUP(current_file->name, path, return -1); + current_file->size = archive_entry_size(entry); + current_file->mode = archive_entry_mode(entry); + pkg->files.count++; + return 0; +} + +/** * Load a package and create the corresponding alpm_pkg_t struct. * @param handle the context handle * @param pkgfile path to the package file @@ -445,34 +473,9 @@ alpm_pkg_t *_alpm_pkg_load_internal(alpm_handle_t *handle, /* for now, ignore all files starting with '.' that haven't * already been handled (for future possibilities) */ } else if(full) { - const size_t files_count = newpkg->files.count; - alpm_file_t *current_file; - /* Keep track of all files for filelist generation */ - if(files_count >= files_size) { - size_t old_size = files_size; - alpm_file_t *newfiles; - if(files_size == 0) { - files_size = 4; - } else { - files_size *= 2; - } - newfiles = realloc(newpkg->files.files, - sizeof(alpm_file_t) * files_size); - if(!newfiles) { - _alpm_alloc_fail(sizeof(alpm_file_t) * files_size); - goto error; - } - /* ensure all new memory is zeroed out, in both the initial - * allocation and later reallocs */ - memset(newfiles + old_size, 0, - sizeof(alpm_file_t) * (files_size - old_size)); - newpkg->files.files = newfiles; + if(add_entry_to_files_list(newpkg, &files_size, entry) < 0) { + goto error; } - current_file = newpkg->files.files + files_count; - STRDUP(current_file->name, entry_name, goto error); - current_file->size = archive_entry_size(entry); - current_file->mode = archive_entry_mode(entry); - newpkg->files.count++; } if(archive_read_data_skip(archive)) { -- 1.8.5.3
This greatly speeds up file list generation times by avoiding uncompressing the whole package. pacman -S base with a deliberate file conflict: before: 9.1 seconds after: 2.2 seconds Signed-off-by: Florian Pritz <bluewind@xinu.at> --- lib/libalpm/be_package.c | 155 ++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 153 insertions(+), 2 deletions(-) diff --git a/lib/libalpm/be_package.c b/lib/libalpm/be_package.c index 5e000bb..286b964 100644 --- a/lib/libalpm/be_package.c +++ b/lib/libalpm/be_package.c @@ -381,6 +381,8 @@ static int add_entry_to_files_list(alpm_pkg_t *pkg, size_t *files_size, struct a { const size_t files_count = pkg->files.count; alpm_file_t *current_file; + mode_t type; + size_t pathlen; const char *path; alpm_file_t *tmp; @@ -389,8 +391,26 @@ static int add_entry_to_files_list(alpm_pkg_t *pkg, size_t *files_size, struct a } pkg->files.files = tmp; + type = archive_entry_filetype(entry); path = archive_entry_pathname(entry); + pathlen = strlen(path); + + /* mtree paths don't contain a tailing slash, those we get from + * the archive directly do (expensive way) + * Other code relies on it to detect directories so add it here.*/ + if(type == AE_IFDIR && path[pathlen - 1] != '/') { + /* 2 = 1 for / + 1 for \0 */ + char *newpath = malloc(pathlen + 2); + if (!newpath) { + _alpm_alloc_fail(pathlen + 2); + return -1; + } + strcpy(newpath, path); + newpath[pathlen] = '/'; + newpath[pathlen + 1] = '\0'; + path = newpath; + } current_file = pkg->files.files + files_count; STRDUP(current_file->name, path, return -1); current_file->size = archive_entry_size(entry); @@ -400,6 +420,124 @@ static int add_entry_to_files_list(alpm_pkg_t *pkg, size_t *files_size, struct a } /** + * Generate a new file list from an mtree file and add it to the package. + * An existing file list will be free()d first. + * + * archive should point to an archive struct which is already at the + * position of the mtree's header. + * + * @param handle + * @param pkg package to add the file list to + * @param archive archive containing the mtree + * @return 0 on success, <0 on error + */ +static int build_filelist_from_mtree(alpm_handle_t *handle, alpm_pkg_t *pkg, struct archive *archive) +{ + int ret = 0; + size_t mtree_msize = 0; + size_t mtree_csize = 0; + size_t files_size = 0; // we clean up the existing array so this is fine + char *mtree_data = NULL; + struct archive *mtree; + struct archive_entry *mtree_entry = NULL; + + _alpm_log(handle, ALPM_LOG_DEBUG, "found mtree for package %s, getting file list\n", pkg->filename); + + /* throw away any files we might have already found */ + free(pkg->files.files); + pkg->files.files = NULL; + pkg->files.count = 0; + + /* create a new archive to parse the mtree and load it from archive into memory */ + // TODO: split this into a function + if((mtree = archive_read_new()) == NULL) { + handle->pm_errno = ALPM_ERR_LIBARCHIVE; + goto error; + } + + _alpm_archive_read_support_filter_all(mtree); + archive_read_support_format_mtree(mtree); + + // TODO: split this into a function + while(1) { + ssize_t size; + char *tmp; + + if((tmp = _alpm_realloc(mtree_data, &mtree_msize, mtree_csize)) == NULL) { + goto error; + } + mtree_data = tmp; + + size = archive_read_data(archive, mtree_data + mtree_csize, ALPM_BUFFER_SIZE); + + if(size < 0) { + _alpm_log(handle, ALPM_LOG_ERROR, _("error while reading package %s: %s\n"), + pkg->filename, archive_error_string(archive)); + handle->pm_errno = ALPM_ERR_LIBARCHIVE; + goto error; + } + if(size == 0) { + break; + } + + mtree_csize += size; + } + + // TODO: need wrapper function for archive_read_open_memory for old libarchive versions? + if(archive_read_open_memory(mtree, mtree_data, mtree_csize)) { + _alpm_log(handle, ALPM_LOG_ERROR, _("error while reading mtree of package %s: %s\n"), + pkg->filename, archive_error_string(mtree)); + handle->pm_errno = ALPM_ERR_LIBARCHIVE; + goto error; + } + + while((ret = archive_read_next_header(mtree, &mtree_entry)) == ARCHIVE_OK) { + const char *path = archive_entry_pathname(mtree_entry); + char *p = NULL; + + /* strip leading "./" from path entries */ + if(path[0] == '.' && path[1] == '/') { + path += 2; + } + + if(handle_simple_path(pkg, path)) { + continue; + } + + if(*path == '.') { + continue; + } + + /* Apparently archive_entry_copy/set_pathname doesn't like to get it's own + * pointer and produces rather strange results */ + STRDUP(p, path, goto error); + archive_entry_copy_pathname(mtree_entry, p); + + if(add_entry_to_files_list(pkg, &files_size, mtree_entry) < 0) { + free(p); + goto error; + } + free(p); + } + + if(ret != ARCHIVE_EOF && ret != ARCHIVE_OK) { /* An error occurred */ + _alpm_log(handle, ALPM_LOG_ERROR, _("error while reading mtree of package %s: %s\n"), + pkg->filename, archive_error_string(mtree)); + handle->pm_errno = ALPM_ERR_LIBARCHIVE; + goto error; + } + + free(mtree_data); + _alpm_archive_read_free(mtree); + _alpm_log(handle, ALPM_LOG_DEBUG, "finished mtree reading for %s\n", pkg->filename); + return 0; +error: + free(mtree_data); + _alpm_archive_read_free(mtree); + return -1; +} + +/** * Load a package and create the corresponding alpm_pkg_t struct. * @param handle the context handle * @param pkgfile path to the package file @@ -409,7 +547,7 @@ static int add_entry_to_files_list(alpm_pkg_t *pkg, size_t *files_size, struct a alpm_pkg_t *_alpm_pkg_load_internal(alpm_handle_t *handle, const char *pkgfile, int full) { - int ret, fd, config = 0; + int ret, fd, config, hit_mtree = 0; struct archive *archive; struct archive_entry *entry; alpm_pkg_t *newpkg; @@ -472,10 +610,23 @@ alpm_pkg_t *_alpm_pkg_load_internal(alpm_handle_t *handle, continue; } else if(handle_simple_path(newpkg, entry_name)) { continue; + } else if(full && strcmp(entry_name, ".MTREE") == 0) { + /* building the file list: cheap way + * get the filelist from the mtree file rather than scanning + * the whole archive */ + if(build_filelist_from_mtree(handle, newpkg, archive) < 0) { + goto error; + } + hit_mtree = 1; + continue; } else if(*entry_name == '.') { /* for now, ignore all files starting with '.' that haven't * already been handled (for future possibilities) */ - } else if(full) { + } else if(hit_mtree && config) { + /* we have all we need */ + break; + } else if(full && !hit_mtree) { + /* building the file list: expensive way */ if(add_entry_to_files_list(newpkg, &files_size, entry) < 0) { goto error; } -- 1.8.5.3
This greatly speeds up file list generation times by avoiding uncompressing the whole package. pacman -S base with a deliberate file conflict: before: 9.1 seconds after: 2.2 seconds Signed-off-by: Florian Pritz <bluewind@xinu.at> --- v2: Adjust for v2 of realloc patch. lib/libalpm/be_package.c | 153 ++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 151 insertions(+), 2 deletions(-) diff --git a/lib/libalpm/be_package.c b/lib/libalpm/be_package.c index 6ef84fc..d876967 100644 --- a/lib/libalpm/be_package.c +++ b/lib/libalpm/be_package.c @@ -380,14 +380,34 @@ static int add_entry_to_files_list(alpm_pkg_t *pkg, size_t *files_size, struct a { const size_t files_count = pkg->files.count; alpm_file_t *current_file; + mode_t type; + size_t pathlen; const char *path; if(!_alpm_realloc((void **)&pkg->files.files, files_size, (files_count + 1) * sizeof(alpm_file_t))) { return -1; } + type = archive_entry_filetype(entry); path = archive_entry_pathname(entry); + pathlen = strlen(path); + + /* mtree paths don't contain a tailing slash, those we get from + * the archive directly do (expensive way) + * Other code relies on it to detect directories so add it here.*/ + if(type == AE_IFDIR && path[pathlen - 1] != '/') { + /* 2 = 1 for / + 1 for \0 */ + char *newpath = malloc(pathlen + 2); + if (!newpath) { + _alpm_alloc_fail(pathlen + 2); + return -1; + } + strcpy(newpath, path); + newpath[pathlen] = '/'; + newpath[pathlen + 1] = '\0'; + path = newpath; + } current_file = pkg->files.files + files_count; STRDUP(current_file->name, path, return -1); current_file->size = archive_entry_size(entry); @@ -397,6 +417,122 @@ static int add_entry_to_files_list(alpm_pkg_t *pkg, size_t *files_size, struct a } /** + * Generate a new file list from an mtree file and add it to the package. + * An existing file list will be free()d first. + * + * archive should point to an archive struct which is already at the + * position of the mtree's header. + * + * @param handle + * @param pkg package to add the file list to + * @param archive archive containing the mtree + * @return 0 on success, <0 on error + */ +static int build_filelist_from_mtree(alpm_handle_t *handle, alpm_pkg_t *pkg, struct archive *archive) +{ + int ret = 0; + size_t mtree_msize = 0; + size_t mtree_csize = 0; + size_t files_size = 0; // we clean up the existing array so this is fine + char *mtree_data = NULL; + struct archive *mtree; + struct archive_entry *mtree_entry = NULL; + + _alpm_log(handle, ALPM_LOG_DEBUG, "found mtree for package %s, getting file list\n", pkg->filename); + + /* throw away any files we might have already found */ + free(pkg->files.files); + pkg->files.files = NULL; + pkg->files.count = 0; + + /* create a new archive to parse the mtree and load it from archive into memory */ + // TODO: split this into a function + if((mtree = archive_read_new()) == NULL) { + handle->pm_errno = ALPM_ERR_LIBARCHIVE; + goto error; + } + + _alpm_archive_read_support_filter_all(mtree); + archive_read_support_format_mtree(mtree); + + // TODO: split this into a function + while(1) { + ssize_t size; + + if(!_alpm_realloc((void **)&mtree_data, &mtree_msize, mtree_csize + ALPM_BUFFER_SIZE)) { + goto error; + } + + size = archive_read_data(archive, mtree_data + mtree_csize, ALPM_BUFFER_SIZE); + + if(size < 0) { + _alpm_log(handle, ALPM_LOG_ERROR, _("error while reading package %s: %s\n"), + pkg->filename, archive_error_string(archive)); + handle->pm_errno = ALPM_ERR_LIBARCHIVE; + goto error; + } + if(size == 0) { + break; + } + + mtree_csize += size; + } + + // TODO: need wrapper function for archive_read_open_memory for old libarchive versions? + if(archive_read_open_memory(mtree, mtree_data, mtree_csize)) { + _alpm_log(handle, ALPM_LOG_ERROR, _("error while reading mtree of package %s: %s\n"), + pkg->filename, archive_error_string(mtree)); + handle->pm_errno = ALPM_ERR_LIBARCHIVE; + goto error; + } + + while((ret = archive_read_next_header(mtree, &mtree_entry)) == ARCHIVE_OK) { + const char *path = archive_entry_pathname(mtree_entry); + char *p = NULL; + + /* strip leading "./" from path entries */ + if(path[0] == '.' && path[1] == '/') { + path += 2; + } + + if(handle_simple_path(pkg, path)) { + continue; + } + + if(*path == '.') { + continue; + } + + /* Apparently archive_entry_copy/set_pathname doesn't like to get it's own + * pointer and produces rather strange results */ + STRDUP(p, path, goto error); + archive_entry_copy_pathname(mtree_entry, p); + + if(add_entry_to_files_list(pkg, &files_size, mtree_entry) < 0) { + free(p); + goto error; + } + free(p); + } + + if(ret != ARCHIVE_EOF && ret != ARCHIVE_OK) { /* An error occurred */ + _alpm_log(handle, ALPM_LOG_ERROR, _("error while reading mtree of package %s: %s\n"), + pkg->filename, archive_error_string(mtree)); + handle->pm_errno = ALPM_ERR_LIBARCHIVE; + goto error; + } + + free(mtree_data); + _alpm_archive_read_free(mtree); + _alpm_log(handle, ALPM_LOG_DEBUG, "finished mtree reading for %s\n", pkg->filename); + return 0; +error: + free(mtree_data); + _alpm_archive_read_free(mtree); + return -1; +} + +/** * Load a package and create the corresponding alpm_pkg_t struct. * @param handle the context handle * @param pkgfile path to the package file @@ -406,7 +542,7 @@ static int add_entry_to_files_list(alpm_pkg_t *pkg, size_t *files_size, struct a alpm_pkg_t *_alpm_pkg_load_internal(alpm_handle_t *handle, const char *pkgfile, int full) { - int ret, fd, config = 0; + int ret, fd, config, hit_mtree = 0; struct archive *archive; struct archive_entry *entry; alpm_pkg_t *newpkg; @@ -469,10 +605,23 @@ alpm_pkg_t *_alpm_pkg_load_internal(alpm_handle_t *handle, continue; } else if(handle_simple_path(newpkg, entry_name)) { continue; + } else if(full && strcmp(entry_name, ".MTREE") == 0) { + /* building the file list: cheap way + * get the filelist from the mtree file rather than scanning + * the whole archive */ + if(build_filelist_from_mtree(handle, newpkg, archive) < 0) { + goto error; + } + hit_mtree = 1; + continue; } else if(*entry_name == '.') { /* for now, ignore all files starting with '.' that haven't * already been handled (for future possibilities) */ - } else if(full) { + } else if(hit_mtree && config) { + /* we have all we need */ + break; + } else if(full && !hit_mtree) { + /* building the file list: expensive way */ if(add_entry_to_files_list(newpkg, &files_size, entry) < 0) { goto error; } -- 1.8.5.3
On 01/28/14 at 11:14am, Florian Pritz wrote:
This greatly speeds up file list generation times by avoiding uncompressing the whole package.
pacman -S base with a deliberate file conflict: before: 9.1 seconds after: 2.2 seconds
Signed-off-by: Florian Pritz <bluewind@xinu.at> ---
v2: Adjust for v2 of realloc patch.
lib/libalpm/be_package.c | 153 ++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 151 insertions(+), 2 deletions(-)
diff --git a/lib/libalpm/be_package.c b/lib/libalpm/be_package.c index 6ef84fc..d876967 100644 --- a/lib/libalpm/be_package.c +++ b/lib/libalpm/be_package.c @@ -380,14 +380,34 @@ static int add_entry_to_files_list(alpm_pkg_t *pkg, size_t *files_size, struct a { const size_t files_count = pkg->files.count; alpm_file_t *current_file; + mode_t type; + size_t pathlen; const char *path;
if(!_alpm_realloc((void **)&pkg->files.files, files_size, (files_count + 1) * sizeof(alpm_file_t))) { return -1; }
+ type = archive_entry_filetype(entry); path = archive_entry_pathname(entry);
+ pathlen = strlen(path); + + /* mtree paths don't contain a tailing slash, those we get from + * the archive directly do (expensive way) + * Other code relies on it to detect directories so add it here.*/ + if(type == AE_IFDIR && path[pathlen - 1] != '/') { + /* 2 = 1 for / + 1 for \0 */ + char *newpath = malloc(pathlen + 2); + if (!newpath) { + _alpm_alloc_fail(pathlen + 2); + return -1; + } + strcpy(newpath, path); + newpath[pathlen] = '/'; + newpath[pathlen + 1] = '\0'; + path = newpath;
Memory leak. Just assign to current_file->name directly and avoid the STRDUP for this case.
+ } current_file = pkg->files.files + files_count; STRDUP(current_file->name, path, return -1); current_file->size = archive_entry_size(entry); @@ -397,6 +417,122 @@ static int add_entry_to_files_list(alpm_pkg_t *pkg, size_t *files_size, struct a }
/** + * Generate a new file list from an mtree file and add it to the package. + * An existing file list will be free()d first. + * + * archive should point to an archive struct which is already at the + * position of the mtree's header. + * + * @param handle + * @param pkg package to add the file list to + * @param archive archive containing the mtree + * @return 0 on success, <0 on error + */ +static int build_filelist_from_mtree(alpm_handle_t *handle, alpm_pkg_t *pkg, struct archive *archive) +{ + int ret = 0;
We normally use 'ret' for or own return value.
+ size_t mtree_msize = 0; + size_t mtree_csize = 0;
These either need more descriptive names or a comment explaining what they are.
+ size_t files_size = 0; // we clean up the existing array so this is fine
We only use c-style comments
+ char *mtree_data = NULL; + struct archive *mtree; + struct archive_entry *mtree_entry = NULL; + + _alpm_log(handle, ALPM_LOG_DEBUG, "found mtree for package %s, getting file list\n", pkg->filename);
Please try to keep lines closer to 80 columns.
+ + /* throw away any files we might have already found */ + free(pkg->files.files);
You need to free the file names as well.
+ pkg->files.files = NULL; + pkg->files.count = 0; + + /* create a new archive to parse the mtree and load it from archive into memory */ + // TODO: split this into a function + if((mtree = archive_read_new()) == NULL) { + handle->pm_errno = ALPM_ERR_LIBARCHIVE; + goto error; + } + + _alpm_archive_read_support_filter_all(mtree); + archive_read_support_format_mtree(mtree); + + // TODO: split this into a function + while(1) { + ssize_t size; + + if(!_alpm_realloc((void **)&mtree_data, &mtree_msize, mtree_csize + ALPM_BUFFER_SIZE)) { + goto error; + } + + size = archive_read_data(archive, mtree_data + mtree_csize, ALPM_BUFFER_SIZE); + + if(size < 0) { + _alpm_log(handle, ALPM_LOG_ERROR, _("error while reading package %s: %s\n"), + pkg->filename, archive_error_string(archive)); + handle->pm_errno = ALPM_ERR_LIBARCHIVE; + goto error; + } + if(size == 0) { + break; + } + + mtree_csize += size; + } + + // TODO: need wrapper function for archive_read_open_memory for old libarchive versions? + if(archive_read_open_memory(mtree, mtree_data, mtree_csize)) { + _alpm_log(handle, ALPM_LOG_ERROR, _("error while reading mtree of package %s: %s\n"), + pkg->filename, archive_error_string(mtree)); + handle->pm_errno = ALPM_ERR_LIBARCHIVE; + goto error; + } + + while((ret = archive_read_next_header(mtree, &mtree_entry)) == ARCHIVE_OK) { + const char *path = archive_entry_pathname(mtree_entry); + char *p = NULL; + + /* strip leading "./" from path entries */ + if(path[0] == '.' && path[1] == '/') { + path += 2; + } + + if(handle_simple_path(pkg, path)) { + continue; + } + + if(*path == '.') { + continue; + }
Please add a comment why we're skipping all files that start with '.'. Maybe this should just be moved into handle_simple_path
+ + /* Apparently archive_entry_copy/set_pathname doesn't like to get it's own + * pointer and produces rather strange results */ + STRDUP(p, path, goto error); + archive_entry_copy_pathname(mtree_entry, p); + + if(add_entry_to_files_list(pkg, &files_size, mtree_entry) < 0) {
You've already extracted the entry's pathname everywhere you call add_entry_to_file_list, how about just passing the path as an additional parameter instead of dup'ing it and copying it back to the mtree_entry?
+ free(p); + goto error; + } + free(p); + } + + if(ret != ARCHIVE_EOF && ret != ARCHIVE_OK) { /* An error occurred */ + _alpm_log(handle, ALPM_LOG_ERROR, _("error while reading mtree of package %s: %s\n"), + pkg->filename, archive_error_string(mtree)); + handle->pm_errno = ALPM_ERR_LIBARCHIVE; + goto error; + } + + free(mtree_data); + _alpm_archive_read_free(mtree); + _alpm_log(handle, ALPM_LOG_DEBUG, "finished mtree reading for %s\n", pkg->filename); + return 0; +error: + free(mtree_data); + _alpm_archive_read_free(mtree); + return -1; +} + +/** * Load a package and create the corresponding alpm_pkg_t struct. * @param handle the context handle * @param pkgfile path to the package file @@ -406,7 +542,7 @@ static int add_entry_to_files_list(alpm_pkg_t *pkg, size_t *files_size, struct a alpm_pkg_t *_alpm_pkg_load_internal(alpm_handle_t *handle, const char *pkgfile, int full) { - int ret, fd, config = 0; + int ret, fd, config, hit_mtree = 0; struct archive *archive; struct archive_entry *entry; alpm_pkg_t *newpkg; @@ -469,10 +605,23 @@ alpm_pkg_t *_alpm_pkg_load_internal(alpm_handle_t *handle, continue; } else if(handle_simple_path(newpkg, entry_name)) { continue; + } else if(full && strcmp(entry_name, ".MTREE") == 0) { + /* building the file list: cheap way + * get the filelist from the mtree file rather than scanning + * the whole archive */ + if(build_filelist_from_mtree(handle, newpkg, archive) < 0) { + goto error; + } + hit_mtree = 1; + continue; } else if(*entry_name == '.') { /* for now, ignore all files starting with '.' that haven't * already been handled (for future possibilities) */ - } else if(full) { + } else if(hit_mtree && config) { + /* we have all we need */ + break;
This is kind of a strange place for this test and is similar to an existing check at the bottom of the loop. Could you move them to the top of the loop and consolidate them?
+ } else if(full && !hit_mtree) { + /* building the file list: expensive way */ if(add_entry_to_files_list(newpkg, &files_size, entry) < 0) { goto error; } -- 1.8.5.3
I hope I incorporated all suggestions. Most patches didn't change, but I'm submitting them anyway for an easier (hopefully final) review.
Signed-off-by: Florian Pritz <bluewind@xinu.at> --- lib/libalpm/add.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/lib/libalpm/add.c b/lib/libalpm/add.c index f38afef..1199d30 100644 --- a/lib/libalpm/add.c +++ b/lib/libalpm/add.c @@ -178,6 +178,11 @@ static int extract_single_file(alpm_handle_t *handle, struct archive *archive, archive_read_data_skip(archive); return 0; } else { + if (!alpm_filelist_contains(&newpkg->files, entryname)) { + _alpm_log(handle, ALPM_LOG_WARNING, _("file not found in file list for package %s. skipping extraction of %s\n"), + newpkg->name, entryname); + return 0; + } /* build the new entryname relative to handle->root */ snprintf(filename, PATH_MAX, "%s%s", handle->root, entryname); } -- 1.8.5.3
These will be used in the following patches. Signed-off-by: Florian Pritz <bluewind@xinu.at> --- v5: Split automatic growing into _alpm_greedy_grow. lib/libalpm/util.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ lib/libalpm/util.h | 2 ++ 2 files changed, 64 insertions(+) diff --git a/lib/libalpm/util.c b/lib/libalpm/util.c index 150b85e..1bc208c 100644 --- a/lib/libalpm/util.c +++ b/lib/libalpm/util.c @@ -1287,6 +1287,68 @@ int _alpm_fnmatch(const void *pattern, const void *string) return fnmatch(pattern, string, 0); } +/** Think of this as realloc with error handling. If realloc fails NULL will be + * returned and data will not be changed. + * + * Newly created memory will be zeroed. + * + * @param data source memory space + * @param current size of the space pointed to by data + * @param required size you want + * @return new memory; NULL on error + */ +void *_alpm_realloc(void **data, size_t *current, const size_t required) +{ + char *newdata; + + newdata = realloc(*data, required); + if(!newdata) { + _alpm_alloc_fail(required); + return NULL; + } + + if (*current < required) { + /* ensure all new memory is zeroed out, in both the initial + * allocation and later reallocs */ + memset(newdata + *current, 0, required - *current); + } + *current = required; + *data = newdata; + return newdata; +} + +/** This automatically grows data based on current/required. + * + * The memory space will be initialised to required bytes and doubled in size when required. + * + * Newly created memory will be zeroed. + * @param data source memory space + * @param current size of the space pointed to by data + * @param required size you want + * @return new memory if grown; old memory otherwise; NULL on error + */ +void *_alpm_greedy_grow(void **data, size_t *current, const size_t required) +{ + size_t newsize = 0; + + if(*current >= required) { + return data; + } + + if(*current == 0) { + newsize = required; + } else { + newsize = *current * 2; + } + + /* check for overflows */ + if (newsize < required) { + return NULL; + } + + return _alpm_realloc(data, current, required); +} + void _alpm_alloc_fail(size_t size) { fprintf(stderr, "alloc failure: could not allocate %zd bytes\n", size); diff --git a/lib/libalpm/util.h b/lib/libalpm/util.h index 250c530..20f63f6 100644 --- a/lib/libalpm/util.h +++ b/lib/libalpm/util.h @@ -141,6 +141,8 @@ int _alpm_raw_ncmp(const char *first, const char *second, size_t max); int _alpm_access(alpm_handle_t *handle, const char *dir, const char *file, int amode); int _alpm_fnmatch_patterns(alpm_list_t *patterns, const char *string); int _alpm_fnmatch(const void *pattern, const void *string); +void *_alpm_realloc(void **data, size_t *current, const size_t required); +void *_alpm_greedy_grow(void **data, size_t *current, const size_t required); #ifndef HAVE_STRSEP char *strsep(char **, const char *); -- 1.8.5.3
This is used to deduplicate code when using the mtree as the file list source. Signed-off-by: Florian Pritz <bluewind@xinu.at> --- lib/libalpm/be_package.c | 27 ++++++++++++++++++++++----- 1 file changed, 22 insertions(+), 5 deletions(-) diff --git a/lib/libalpm/be_package.c b/lib/libalpm/be_package.c index 2fbb1db..5980784 100644 --- a/lib/libalpm/be_package.c +++ b/lib/libalpm/be_package.c @@ -353,6 +353,26 @@ int _alpm_pkg_validate_internal(alpm_handle_t *handle, } /** + * Handle the existance of simple paths for _alpm_load_pkg_internal() + * @param pkg package to change + * @param path path to examine + * @return 0 if path doesn't match any rule, 1 if it has been handled + */ +static int handle_simple_path(alpm_pkg_t *pkg, const char *path) +{ + if(strcmp(path, ".INSTALL") == 0) { + pkg->scriptlet = 1; + return 1; + } else if(*path == '.') { + /* for now, ignore all files starting with '.' that haven't + * already been handled (for future possibilities) */ + return 1; + } + + return 0; +} + +/** * Load a package and create the corresponding alpm_pkg_t struct. * @param handle the context handle * @param pkgfile path to the package file @@ -423,11 +443,8 @@ alpm_pkg_t *_alpm_pkg_load_internal(alpm_handle_t *handle, } config = 1; continue; - } else if(strcmp(entry_name, ".INSTALL") == 0) { - newpkg->scriptlet = 1; - } else if(*entry_name == '.') { - /* for now, ignore all files starting with '.' that haven't - * already been handled (for future possibilities) */ + } else if(handle_simple_path(newpkg, entry_name)) { + continue; } else if(full) { const size_t files_count = newpkg->files.count; alpm_file_t *current_file; -- 1.8.5.3
Signed-off-by: Florian Pritz <bluewind@xinu.at> --- lib/libalpm/be_package.c | 56 +++++++++++++++++++++++++----------------------- 1 file changed, 29 insertions(+), 27 deletions(-) diff --git a/lib/libalpm/be_package.c b/lib/libalpm/be_package.c index 5980784..e276f53 100644 --- a/lib/libalpm/be_package.c +++ b/lib/libalpm/be_package.c @@ -373,6 +373,33 @@ static int handle_simple_path(alpm_pkg_t *pkg, const char *path) } /** + * Add a file to the files list for pkg. + * + * @param pkg package to add the file to + * @param files_size size of pkg->files.files + * @param entry archive entry of the file to add to the list + * @param path path of the file to be added + * @return <0 on error, 0 on success + */ +static int add_entry_to_files_list(alpm_pkg_t *pkg, size_t *files_size, + struct archive_entry *entry, const char *path) +{ + const size_t files_count = pkg->files.count; + alpm_file_t *current_file; + + if(!_alpm_greedy_grow((void **)&pkg->files.files, files_size, (files_count + 1) * sizeof(alpm_file_t))) { + return -1; + } + + current_file = pkg->files.files + files_count; + STRDUP(current_file->name, path, return -1); + current_file->size = archive_entry_size(entry); + current_file->mode = archive_entry_mode(entry); + pkg->files.count++; + return 0; +} + +/** * Load a package and create the corresponding alpm_pkg_t struct. * @param handle the context handle * @param pkgfile path to the package file @@ -446,34 +473,9 @@ alpm_pkg_t *_alpm_pkg_load_internal(alpm_handle_t *handle, } else if(handle_simple_path(newpkg, entry_name)) { continue; } else if(full) { - const size_t files_count = newpkg->files.count; - alpm_file_t *current_file; - /* Keep track of all files for filelist generation */ - if(files_count >= files_size) { - size_t old_size = files_size; - alpm_file_t *newfiles; - if(files_size == 0) { - files_size = 4; - } else { - files_size *= 2; - } - newfiles = realloc(newpkg->files.files, - sizeof(alpm_file_t) * files_size); - if(!newfiles) { - _alpm_alloc_fail(sizeof(alpm_file_t) * files_size); - goto error; - } - /* ensure all new memory is zeroed out, in both the initial - * allocation and later reallocs */ - memset(newfiles + old_size, 0, - sizeof(alpm_file_t) * (files_size - old_size)); - newpkg->files.files = newfiles; + if(add_entry_to_files_list(newpkg, &files_size, entry, entry_name) < 0) { + goto error; } - current_file = newpkg->files.files + files_count; - STRDUP(current_file->name, entry_name, goto error); - current_file->size = archive_entry_size(entry); - current_file->mode = archive_entry_mode(entry); - newpkg->files.count++; } if(archive_read_data_skip(archive)) { -- 1.8.5.3
This greatly speeds up file list generation times by avoiding uncompressing the whole package. pacman -S base with a deliberate file conflict: before: 9.1 seconds after: 2.2 seconds Signed-off-by: Florian Pritz <bluewind@xinu.at> --- lib/libalpm/be_package.c | 149 +++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 145 insertions(+), 4 deletions(-) diff --git a/lib/libalpm/be_package.c b/lib/libalpm/be_package.c index e276f53..3c35484 100644 --- a/lib/libalpm/be_package.c +++ b/lib/libalpm/be_package.c @@ -386,13 +386,36 @@ static int add_entry_to_files_list(alpm_pkg_t *pkg, size_t *files_size, { const size_t files_count = pkg->files.count; alpm_file_t *current_file; + mode_t type; + size_t pathlen; if(!_alpm_greedy_grow((void **)&pkg->files.files, files_size, (files_count + 1) * sizeof(alpm_file_t))) { return -1; } + type = archive_entry_filetype(entry); + + pathlen = strlen(path); + current_file = pkg->files.files + files_count; - STRDUP(current_file->name, path, return -1); + + /* mtree paths don't contain a tailing slash, those we get from + * the archive directly do (expensive way) + * Other code relies on it to detect directories so add it here.*/ + if(type == AE_IFDIR && path[pathlen - 1] != '/') { + /* 2 = 1 for / + 1 for \0 */ + char *newpath = malloc(pathlen + 2); + if (!newpath) { + _alpm_alloc_fail(pathlen + 2); + return -1; + } + strcpy(newpath, path); + newpath[pathlen] = '/'; + newpath[pathlen + 1] = '\0'; + current_file->name = newpath; + } else { + STRDUP(current_file->name, path, return -1); + } current_file->size = archive_entry_size(entry); current_file->mode = archive_entry_mode(entry); pkg->files.count++; @@ -400,6 +423,114 @@ static int add_entry_to_files_list(alpm_pkg_t *pkg, size_t *files_size, } /** + * Generate a new file list from an mtree file and add it to the package. + * An existing file list will be free()d first. + * + * archive should point to an archive struct which is already at the + * position of the mtree's header. + * + * @param handle + * @param pkg package to add the file list to + * @param archive archive containing the mtree + * @return 0 on success, <0 on error + */ +static int build_filelist_from_mtree(alpm_handle_t *handle, alpm_pkg_t *pkg, struct archive *archive) +{ + int ret = 0; + size_t mtree_maxsize = 0; + size_t mtree_cursize = 0; + size_t files_size = 0; /* we clean up the existing array so this is fine */ + char *mtree_data = NULL; + struct archive *mtree; + struct archive_entry *mtree_entry = NULL; + + _alpm_log(handle, ALPM_LOG_DEBUG, + "found mtree for package %s, getting file list\n", pkg->filename); + + /* throw away any files we might have already found */ + for (size_t i = 0; i < pkg->files.count; i++) { + free(pkg->files.files[i].name); + } + free(pkg->files.files); + pkg->files.files = NULL; + pkg->files.count = 0; + + /* create a new archive to parse the mtree and load it from archive into memory */ + /* TODO: split this into a function */ + if((mtree = archive_read_new()) == NULL) { + handle->pm_errno = ALPM_ERR_LIBARCHIVE; + goto error; + } + + _alpm_archive_read_support_filter_all(mtree); + archive_read_support_format_mtree(mtree); + + /* TODO: split this into a function */ + while(1) { + ssize_t size; + + if(!_alpm_greedy_grow((void **)&mtree_data, &mtree_maxsize, mtree_cursize + ALPM_BUFFER_SIZE)) { + goto error; + } + + size = archive_read_data(archive, mtree_data + mtree_cursize, ALPM_BUFFER_SIZE); + + if(size < 0) { + _alpm_log(handle, ALPM_LOG_ERROR, _("error while reading package %s: %s\n"), + pkg->filename, archive_error_string(archive)); + handle->pm_errno = ALPM_ERR_LIBARCHIVE; + goto error; + } + if(size == 0) { + break; + } + + mtree_cursize += size; + } + + if(archive_read_open_memory(mtree, mtree_data, mtree_cursize)) { + _alpm_log(handle, ALPM_LOG_ERROR, + _("error while reading mtree of package %s: %s\n"), + pkg->filename, archive_error_string(mtree)); + handle->pm_errno = ALPM_ERR_LIBARCHIVE; + goto error; + } + + while((ret = archive_read_next_header(mtree, &mtree_entry)) == ARCHIVE_OK) { + const char *path = archive_entry_pathname(mtree_entry); + + /* strip leading "./" from path entries */ + if(path[0] == '.' && path[1] == '/') { + path += 2; + } + + if(handle_simple_path(pkg, path)) { + continue; + } + + if(add_entry_to_files_list(pkg, &files_size, mtree_entry, path) < 0) { + goto error; + } + } + + if(ret != ARCHIVE_EOF && ret != ARCHIVE_OK) { /* An error occurred */ + _alpm_log(handle, ALPM_LOG_ERROR, _("error while reading mtree of package %s: %s\n"), + pkg->filename, archive_error_string(mtree)); + handle->pm_errno = ALPM_ERR_LIBARCHIVE; + goto error; + } + + free(mtree_data); + _alpm_archive_read_free(mtree); + _alpm_log(handle, ALPM_LOG_DEBUG, "finished mtree reading for %s\n", pkg->filename); + return 0; +error: + free(mtree_data); + _alpm_archive_read_free(mtree); + return -1; +} + +/** * Load a package and create the corresponding alpm_pkg_t struct. * @param handle the context handle * @param pkgfile path to the package file @@ -409,7 +540,7 @@ static int add_entry_to_files_list(alpm_pkg_t *pkg, size_t *files_size, alpm_pkg_t *_alpm_pkg_load_internal(alpm_handle_t *handle, const char *pkgfile, int full) { - int ret, fd, config = 0; + int ret, fd, config, hit_mtree = 0; struct archive *archive; struct archive_entry *entry; alpm_pkg_t *newpkg; @@ -470,9 +601,19 @@ alpm_pkg_t *_alpm_pkg_load_internal(alpm_handle_t *handle, } config = 1; continue; + } else if(full && strcmp(entry_name, ".MTREE") == 0) { + /* building the file list: cheap way + * get the filelist from the mtree file rather than scanning + * the whole archive */ + if(build_filelist_from_mtree(handle, newpkg, archive) < 0) { + goto error; + } + hit_mtree = 1; + continue; } else if(handle_simple_path(newpkg, entry_name)) { continue; - } else if(full) { + } else if(full && !hit_mtree) { + /* building the file list: expensive way */ if(add_entry_to_files_list(newpkg, &files_size, entry, entry_name) < 0) { goto error; } @@ -486,7 +627,7 @@ alpm_pkg_t *_alpm_pkg_load_internal(alpm_handle_t *handle, } /* if we are not doing a full read, see if we have all we need */ - if(!full && config) { + if((!full || hit_mtree) && config) { break; } } -- 1.8.5.3
Signed-off-by: Florian Pritz <bluewind@xinu.at> --- lib/libalpm/be_local.c | 18 ++---------------- 1 file changed, 2 insertions(+), 16 deletions(-) diff --git a/lib/libalpm/be_local.c b/lib/libalpm/be_local.c index 0b5b266..d2b0fe3 100644 --- a/lib/libalpm/be_local.c +++ b/lib/libalpm/be_local.c @@ -736,22 +736,8 @@ static int local_db_read(alpm_pkg_t *info, alpm_dbinfrq_t inforeq) while(fgets(line, sizeof(line), fp) && (len = _alpm_strip_newline(line, 0))) { - if(files_count >= files_size) { - size_t old_size = files_size; - if(files_size == 0) { - files_size = 8; - } else { - files_size *= 2; - } - files = realloc(files, sizeof(alpm_file_t) * files_size); - if(!files) { - _alpm_alloc_fail(sizeof(alpm_file_t) * files_size); - goto error; - } - /* ensure all new memory is zeroed out, in both the initial - * allocation and later reallocs */ - memset(files + old_size, 0, - sizeof(alpm_file_t) * (files_size - old_size)); + if(!_alpm_greedy_grow((void **)&files, &files_size, (files_count + 1) * sizeof(alpm_file_t))) { + goto error; } /* since we know the length of the file string already, * we can do malloc + memcpy rather than strdup */ -- 1.8.5.3
On 10/02/14 05:24, Florian Pritz wrote:
Signed-off-by: Florian Pritz <bluewind@xinu.at> --- lib/libalpm/be_local.c | 18 ++---------------- 1 file changed, 2 insertions(+), 16 deletions(-)
diff --git a/lib/libalpm/be_local.c b/lib/libalpm/be_local.c index 0b5b266..d2b0fe3 100644 --- a/lib/libalpm/be_local.c +++ b/lib/libalpm/be_local.c @@ -736,22 +736,8 @@ static int local_db_read(alpm_pkg_t *info, alpm_dbinfrq_t inforeq)
while(fgets(line, sizeof(line), fp) && (len = _alpm_strip_newline(line, 0))) { - if(files_count >= files_size) { - size_t old_size = files_size; - if(files_size == 0) { - files_size = 8; - } else { - files_size *= 2; - } - files = realloc(files, sizeof(alpm_file_t) * files_size); - if(!files) { - _alpm_alloc_fail(sizeof(alpm_file_t) * files_size); - goto error; - } - /* ensure all new memory is zeroed out, in both the initial - * allocation and later reallocs */ - memset(files + old_size, 0, - sizeof(alpm_file_t) * (files_size - old_size)); + if(!_alpm_greedy_grow((void **)&files, &files_size, (files_count + 1) * sizeof(alpm_file_t))) {
This starts the size at 1 then doubles rather than starting at 8 like previously. This does the old way: _alpm_greedy_grow((void **)&files, &files_size, (files_size ? files_size + 1 : 8 * sizeof(alpm_file_t)))
+ goto error; } /* since we know the length of the file string already, * we can do malloc + memcpy rather than strdup */
Signed-off-by: Florian Pritz <bluewind@xinu.at> --- v6: Allocate 8 elements at the beginning instead of simply one. lib/libalpm/be_local.c | 20 ++++---------------- 1 file changed, 4 insertions(+), 16 deletions(-) diff --git a/lib/libalpm/be_local.c b/lib/libalpm/be_local.c index 0b5b266..0a86f11 100644 --- a/lib/libalpm/be_local.c +++ b/lib/libalpm/be_local.c @@ -736,22 +736,10 @@ static int local_db_read(alpm_pkg_t *info, alpm_dbinfrq_t inforeq) while(fgets(line, sizeof(line), fp) && (len = _alpm_strip_newline(line, 0))) { - if(files_count >= files_size) { - size_t old_size = files_size; - if(files_size == 0) { - files_size = 8; - } else { - files_size *= 2; - } - files = realloc(files, sizeof(alpm_file_t) * files_size); - if(!files) { - _alpm_alloc_fail(sizeof(alpm_file_t) * files_size); - goto error; - } - /* ensure all new memory is zeroed out, in both the initial - * allocation and later reallocs */ - memset(files + old_size, 0, - sizeof(alpm_file_t) * (files_size - old_size)); + _alpm_log(db->handle, ALPM_LOG_DEBUG, "allocating %ld\n", (files_size ? files_size + 1 : 8 * sizeof(alpm_file_t))); + if(!_alpm_greedy_grow((void **)&files, &files_size, + (files_size ? files_size + sizeof(alpm_file_t) : 8 * sizeof(alpm_file_t)))) { + goto error; } /* since we know the length of the file string already, * we can do malloc + memcpy rather than strdup */ -- 1.8.5.3
On 10.02.2014 09:54, Florian Pritz wrote:
+ _alpm_log(db->handle, ALPM_LOG_DEBUG, "allocating %ld\n", (files_size ? files_size + 1 : 8 * sizeof(alpm_file_t)));
Debug output removed in my working branch.
On 10/02/14 05:24, Florian Pritz wrote:
I hope I incorporated all suggestions. Most patches didn't change, but I'm submitting them anyway for an easier (hopefully final) review.
Looks good from a quick read to me, apart from that comment I made on the final patch. I'm low on time this week, so will take a better look and pull if all good later. Allan
On 10/02/14 16:04, Allan McRae wrote:
On 10/02/14 05:24, Florian Pritz wrote:
I hope I incorporated all suggestions. Most patches didn't change, but I'm submitting them anyway for an easier (hopefully final) review.
Looks good from a quick read to me, apart from that comment I made on the final patch. I'm low on time this week, so will take a better look and pull if all good later.
Umm... Can someone look into this?
pacman -S filesystem warning: filesystem-2013.05-2 is up to date -- reinstalling resolving dependencies... looking for conflicting packages...
Packages (1) filesystem-2013.05-2 Total Installed Size: 0.01 MiB Net Upgrade Size: 0.00 MiB :: Proceed with installation? [Y/n] (1/1) checking keys in keyring [######################] 100% (1/1) checking package integrity [######################] 100% (1/1) loading package files [######################] 100% error: error while reading mtree of package /home/arch/pkgcache/filesystem-2013.05-2-x86_64.pkg.tar.xz: mtree specification has different type for ./bin error: failed to commit transaction (libarchive error) Errors occurred, no packages were upgraded.
On 14/02/14 10:47, Allan McRae wrote:
On 10/02/14 16:04, Allan McRae wrote:
On 10/02/14 05:24, Florian Pritz wrote:
I hope I incorporated all suggestions. Most patches didn't change, but I'm submitting them anyway for an easier (hopefully final) review.
Looks good from a quick read to me, apart from that comment I made on the final patch. I'm low on time this week, so will take a better look and pull if all good later.
Umm... Can someone look into this?
pacman -S filesystem warning: filesystem-2013.05-2 is up to date -- reinstalling resolving dependencies... looking for conflicting packages...
Packages (1) filesystem-2013.05-2
Total Installed Size: 0.01 MiB Net Upgrade Size: 0.00 MiB
:: Proceed with installation? [Y/n] (1/1) checking keys in keyring [######################] 100% (1/1) checking package integrity [######################] 100% (1/1) loading package files [######################] 100% error: error while reading mtree of package /home/arch/pkgcache/filesystem-2013.05-2-x86_64.pkg.tar.xz: mtree specification has different type for ./bin error: failed to commit transaction (libarchive error) Errors occurred, no packages were upgraded.
And I believe this is a libarchive bug: $ pacman -Qk filesystem filesystem: 90 total files, 0 missing files $ pacman -Qkk filesystem filesystem: 1 total file, 0 altered files It is bailing out in -Qkk too... It turns out that libarchive checks the files on the filesystem to fill in gaps in the mtree data. It uses fstat... Allan
On 14/02/14 13:29, Allan McRae wrote:
On 14/02/14 10:47, Allan McRae wrote:
On 10/02/14 16:04, Allan McRae wrote:
On 10/02/14 05:24, Florian Pritz wrote:
I hope I incorporated all suggestions. Most patches didn't change, but I'm submitting them anyway for an easier (hopefully final) review.
Looks good from a quick read to me, apart from that comment I made on the final patch. I'm low on time this week, so will take a better look and pull if all good later.
Umm... Can someone look into this?
pacman -S filesystem warning: filesystem-2013.05-2 is up to date -- reinstalling resolving dependencies... looking for conflicting packages...
Packages (1) filesystem-2013.05-2
Total Installed Size: 0.01 MiB Net Upgrade Size: 0.00 MiB
:: Proceed with installation? [Y/n] (1/1) checking keys in keyring [######################] 100% (1/1) checking package integrity [######################] 100% (1/1) loading package files [######################] 100% error: error while reading mtree of package /home/arch/pkgcache/filesystem-2013.05-2-x86_64.pkg.tar.xz: mtree specification has different type for ./bin error: failed to commit transaction (libarchive error) Errors occurred, no packages were upgraded.
And I believe this is a libarchive bug:
$ pacman -Qk filesystem filesystem: 90 total files, 0 missing files
$ pacman -Qkk filesystem filesystem: 1 total file, 0 altered files
It is bailing out in -Qkk too...
It turns out that libarchive checks the files on the filesystem to fill in gaps in the mtree data. It uses fstat...
It seems I have misdiagnosed this issue. To replicate the problem I am having add a directory called "bin" in your home directory and do a "pacman -Qkk filesystem" from there. Allan
participants (5)
-
Allan McRae
-
Andrew Gregory
-
Dan McGee
-
Dave Reisner
-
Florian Pritz