[pacman-dev] [PATCH] Enable Perl regular expressions in NoExtract.
Until now it was not easily possible to remove all files but some in a given directory. By enabling Perl regular expressions for NoExtract this is now made possible through negative lookahead. Fixes FS#31749. --- This patch is work in progress. I want to check if there is interest in having this feature available in pacman and how the extra dependency on libpcre is received. If it is well received I might still do some rework for the actual matching of NoExtract items, as currently I'm always recompiling the pattern for each match. configure.ac | 13 ++++++++++++- lib/libalpm/Makefile.am | 6 ++++-- lib/libalpm/add.c | 2 +- lib/libalpm/util.c | 22 ++++++++++++++++++++++ lib/libalpm/util.h | 1 + 5 files changed, 40 insertions(+), 4 deletions(-) diff --git a/configure.ac b/configure.ac index 1ea0fce..29ea101 100644 --- a/configure.ac +++ b/configure.ac @@ -230,6 +230,17 @@ if test "x$with_libcurl" != "xno"; then fi AM_CONDITIONAL(HAVE_LIBCURL, [test "$have_libcurl" = "yes"]) +# Check for PCRE +have_libpcre=no +if test "x$with_libpcre" != "xno"; then + PKG_CHECK_MODULES(LIBPCRE, [libpcre], + [AC_DEFINE(HAVE_LIBPCRE, 1, [Define if libpcre is available]) have_libpcre=yes], have_libpcre=no) + if test "x$have_libpcre" = xno -a "x$with_libpcre" = xyes; then + AC_MSG_ERROR([*** libpcre is required for file matching support]) + fi +fi +AM_CONDITIONAL(HAVE_LIBPCRE, [test "$have_libpcre" = "yes"]) + # Check for gpgme AC_MSG_CHECKING(whether to link with libgpgme) AS_IF([test "x$with_gpgme" != "xno"], @@ -523,7 +534,7 @@ ${PACKAGE_NAME}: compiler : ${CC} preprocessor flags : ${CPPFLAGS} compiler flags : ${WARNING_CFLAGS} ${CFLAGS} - library flags : ${LIBS} ${LIBSSL_LIBS} ${LIBARCHIVE_LIBS} ${LIBCURL_LIBS} ${GPGME_LIBS} + library flags : ${LIBS} ${LIBSSL_LIBS} ${LIBPCRE_LIBS} ${LIBARCHIVE_LIBS} ${LIBCURL_LIBS} ${GPGME_LIBS} linker flags : ${LDFLAGS} Architecture : ${CARCH} diff --git a/lib/libalpm/Makefile.am b/lib/libalpm/Makefile.am index 171bc46..0f71aed 100644 --- a/lib/libalpm/Makefile.am +++ b/lib/libalpm/Makefile.am @@ -75,13 +75,15 @@ libalpm_la_CFLAGS = \ $(GPGME_CFLAGS) \ $(LIBARCHIVE_CFLAGS) \ $(LIBCURL_CFLAGS) \ - $(LIBSSL_CFLAGS) + $(LIBSSL_CFLAGS) \ + $(LIBPCRE_CFLAGS) libalpm_la_LIBADD = \ $(LTLIBINTL) \ $(GPGME_LIBS) \ $(LIBARCHIVE_LIBS) \ $(LIBCURL_LIBS) \ - $(LIBSSL_LIBS) + $(LIBSSL_LIBS) \ + $(LIBPCRE_LIBS) # vim:set ts=2 sw=2 noet: diff --git a/lib/libalpm/add.c b/lib/libalpm/add.c index c20e7c6..d80dbe4 100644 --- a/lib/libalpm/add.c +++ b/lib/libalpm/add.c @@ -183,7 +183,7 @@ static int extract_single_file(alpm_handle_t *handle, struct archive *archive, } /* if a file is in NoExtract then we never extract it */ - if(alpm_list_find(handle->noextract, entryname, _alpm_fnmatch)) { + if(alpm_list_find(handle->noextract, entryname, _alpm_regex)) { _alpm_log(handle, ALPM_LOG_DEBUG, "%s is in NoExtract," " skipping extraction of %s\n", entryname, filename); diff --git a/lib/libalpm/util.c b/lib/libalpm/util.c index f84fb66..5858d28 100644 --- a/lib/libalpm/util.c +++ b/lib/libalpm/util.c @@ -33,6 +33,7 @@ #include <sys/wait.h> #include <locale.h> /* setlocale */ #include <fnmatch.h> +#include <pcre.h> /* libarchive */ #include <archive.h> @@ -1261,6 +1262,27 @@ int _alpm_fnmatch(const void *pattern, const void *string) return fnmatch(pattern, string, 0); } +/** Checks whether a string matches an extended Perl regular expression. + * @param pattern string to match against + * @param string string to check against pattern + * @return 0 if string matches pattern, non-zero if they don't match and on + * error + */ +int _alpm_regex(const void *pattern, const void *string) +{ + const char *error; + int erroroff; + + pcre *pcre = pcre_compile(pattern, PCRE_EXTENDED | PCRE_ANCHORED, &error, &erroroff, NULL); + if (error != NULL) { + return -1; + } + + int ret = pcre_exec(pcre, NULL, string, strlen(string), 0, 0, NULL, 0); + pcre_free(pcre); + return ret < 0; +} + void _alpm_alloc_fail(size_t size) { fprintf(stderr, "alloc failure: could not allocate %zd bytes\n", size); diff --git a/lib/libalpm/util.h b/lib/libalpm/util.h index 56031f3..5ec36b6 100644 --- a/lib/libalpm/util.h +++ b/lib/libalpm/util.h @@ -141,6 +141,7 @@ int _alpm_raw_cmp(const char *first, const char *second); int _alpm_raw_ncmp(const char *first, const char *second, size_t max); int _alpm_access(alpm_handle_t *handle, const char *dir, const char *file, int amode); int _alpm_fnmatch(const void *pattern, const void *string); +int _alpm_regex(const void *pattern, const void *string); #ifndef HAVE_STRSEP char *strsep(char **, const char *); -- 1.8.3
On Tue, Jun 04, 2013 at 05:03:47PM +0200, Patrick Steinhardt wrote:
Until now it was not easily possible to remove all files but some in a given directory. By enabling Perl regular expressions for NoExtract this is now made possible through negative lookahead.
Fixes FS#31749. ---
This patch is work in progress. I want to check if there is interest in having this feature available in pacman and how the extra dependency on libpcre is received. If it is well received I might still do some rework for the actual matching of NoExtract items, as currently I'm always recompiling the pattern for each match.
I've considered moving to PCRE, but I'm curious: 1) Why do you need this for NoExtract? Please be *specific* 2) What is insufficient about POSIX regex? (#include <regex.h>)
configure.ac | 13 ++++++++++++- lib/libalpm/Makefile.am | 6 ++++-- lib/libalpm/add.c | 2 +- lib/libalpm/util.c | 22 ++++++++++++++++++++++ lib/libalpm/util.h | 1 + 5 files changed, 40 insertions(+), 4 deletions(-)
diff --git a/configure.ac b/configure.ac index 1ea0fce..29ea101 100644 --- a/configure.ac +++ b/configure.ac @@ -230,6 +230,17 @@ if test "x$with_libcurl" != "xno"; then fi AM_CONDITIONAL(HAVE_LIBCURL, [test "$have_libcurl" = "yes"])
+# Check for PCRE +have_libpcre=no +if test "x$with_libpcre" != "xno"; then + PKG_CHECK_MODULES(LIBPCRE, [libpcre], + [AC_DEFINE(HAVE_LIBPCRE, 1, [Define if libpcre is available]) have_libpcre=yes], have_libpcre=no) + if test "x$have_libpcre" = xno -a "x$with_libpcre" = xyes; then + AC_MSG_ERROR([*** libpcre is required for file matching support]) + fi +fi +AM_CONDITIONAL(HAVE_LIBPCRE, [test "$have_libpcre" = "yes"]) + # Check for gpgme AC_MSG_CHECKING(whether to link with libgpgme) AS_IF([test "x$with_gpgme" != "xno"], @@ -523,7 +534,7 @@ ${PACKAGE_NAME}: compiler : ${CC} preprocessor flags : ${CPPFLAGS} compiler flags : ${WARNING_CFLAGS} ${CFLAGS} - library flags : ${LIBS} ${LIBSSL_LIBS} ${LIBARCHIVE_LIBS} ${LIBCURL_LIBS} ${GPGME_LIBS} + library flags : ${LIBS} ${LIBSSL_LIBS} ${LIBPCRE_LIBS} ${LIBARCHIVE_LIBS} ${LIBCURL_LIBS} ${GPGME_LIBS} linker flags : ${LDFLAGS}
Architecture : ${CARCH} diff --git a/lib/libalpm/Makefile.am b/lib/libalpm/Makefile.am index 171bc46..0f71aed 100644 --- a/lib/libalpm/Makefile.am +++ b/lib/libalpm/Makefile.am @@ -75,13 +75,15 @@ libalpm_la_CFLAGS = \ $(GPGME_CFLAGS) \ $(LIBARCHIVE_CFLAGS) \ $(LIBCURL_CFLAGS) \ - $(LIBSSL_CFLAGS) + $(LIBSSL_CFLAGS) \ + $(LIBPCRE_CFLAGS)
libalpm_la_LIBADD = \ $(LTLIBINTL) \ $(GPGME_LIBS) \ $(LIBARCHIVE_LIBS) \ $(LIBCURL_LIBS) \ - $(LIBSSL_LIBS) + $(LIBSSL_LIBS) \ + $(LIBPCRE_LIBS)
# vim:set ts=2 sw=2 noet: diff --git a/lib/libalpm/add.c b/lib/libalpm/add.c index c20e7c6..d80dbe4 100644 --- a/lib/libalpm/add.c +++ b/lib/libalpm/add.c @@ -183,7 +183,7 @@ static int extract_single_file(alpm_handle_t *handle, struct archive *archive, }
/* if a file is in NoExtract then we never extract it */ - if(alpm_list_find(handle->noextract, entryname, _alpm_fnmatch)) { + if(alpm_list_find(handle->noextract, entryname, _alpm_regex)) { _alpm_log(handle, ALPM_LOG_DEBUG, "%s is in NoExtract," " skipping extraction of %s\n", entryname, filename); diff --git a/lib/libalpm/util.c b/lib/libalpm/util.c index f84fb66..5858d28 100644 --- a/lib/libalpm/util.c +++ b/lib/libalpm/util.c @@ -33,6 +33,7 @@ #include <sys/wait.h> #include <locale.h> /* setlocale */ #include <fnmatch.h> +#include <pcre.h>
/* libarchive */ #include <archive.h> @@ -1261,6 +1262,27 @@ int _alpm_fnmatch(const void *pattern, const void *string) return fnmatch(pattern, string, 0); }
+/** Checks whether a string matches an extended Perl regular expression. + * @param pattern string to match against + * @param string string to check against pattern + * @return 0 if string matches pattern, non-zero if they don't match and on + * error + */ +int _alpm_regex(const void *pattern, const void *string) +{ + const char *error; + int erroroff; + + pcre *pcre = pcre_compile(pattern, PCRE_EXTENDED | PCRE_ANCHORED, &error, &erroroff, NULL); + if (error != NULL) { + return -1; + } + + int ret = pcre_exec(pcre, NULL, string, strlen(string), 0, 0, NULL, 0); + pcre_free(pcre); + return ret < 0; +} + void _alpm_alloc_fail(size_t size) { fprintf(stderr, "alloc failure: could not allocate %zd bytes\n", size); diff --git a/lib/libalpm/util.h b/lib/libalpm/util.h index 56031f3..5ec36b6 100644 --- a/lib/libalpm/util.h +++ b/lib/libalpm/util.h @@ -141,6 +141,7 @@ int _alpm_raw_cmp(const char *first, const char *second); int _alpm_raw_ncmp(const char *first, const char *second, size_t max); int _alpm_access(alpm_handle_t *handle, const char *dir, const char *file, int amode); int _alpm_fnmatch(const void *pattern, const void *string); +int _alpm_regex(const void *pattern, const void *string);
#ifndef HAVE_STRSEP char *strsep(char **, const char *); -- 1.8.3
On Tue, Jun 04, 2013 at 11:46:50AM -0400, Dave Reisner wrote:
On Tue, Jun 04, 2013 at 05:03:47PM +0200, Patrick Steinhardt wrote:
Until now it was not easily possible to remove all files but some in a given directory. By enabling Perl regular expressions for NoExtract this is now made possible through negative lookahead.
Fixes FS#31749. ---
This patch is work in progress. I want to check if there is interest in having this feature available in pacman and how the extra dependency on libpcre is received. If it is well received I might still do some rework for the actual matching of NoExtract items, as currently I'm always recompiling the pattern for each match.
I've considered moving to PCRE, but I'm curious:
1) Why do you need this for NoExtract? Please be *specific* 2) What is insufficient about POSIX regex? (#include <regex.h>)
My usecase is specifically the directory /usr/share/locale. As I only need two files/directories (locale.alias and en_US) in there I'd need to specify a list of a whopping 106 entries that shouldn't be extracted. POSIX regular expressions do not support negative lookarounds. Thus I might add entries like usr/share/locale{af,am,...} but it is not possible to remove everything but those two entries. With Perl regular expressions this is as easy as usr/share/locale/((?!en_US|locale.alias).+). I'm sure there are other usecases where POSIX regular expressions may be insufficient but can't currently think about any other. In generl PCRE is in the core repo, widely used and very powerful, so I don't see any reason to not use it.
configure.ac | 13 ++++++++++++- lib/libalpm/Makefile.am | 6 ++++-- lib/libalpm/add.c | 2 +- lib/libalpm/util.c | 22 ++++++++++++++++++++++ lib/libalpm/util.h | 1 + 5 files changed, 40 insertions(+), 4 deletions(-)
diff --git a/configure.ac b/configure.ac index 1ea0fce..29ea101 100644 --- a/configure.ac +++ b/configure.ac @@ -230,6 +230,17 @@ if test "x$with_libcurl" != "xno"; then fi AM_CONDITIONAL(HAVE_LIBCURL, [test "$have_libcurl" = "yes"])
+# Check for PCRE +have_libpcre=no +if test "x$with_libpcre" != "xno"; then + PKG_CHECK_MODULES(LIBPCRE, [libpcre], + [AC_DEFINE(HAVE_LIBPCRE, 1, [Define if libpcre is available]) have_libpcre=yes], have_libpcre=no) + if test "x$have_libpcre" = xno -a "x$with_libpcre" = xyes; then + AC_MSG_ERROR([*** libpcre is required for file matching support]) + fi +fi +AM_CONDITIONAL(HAVE_LIBPCRE, [test "$have_libpcre" = "yes"]) + # Check for gpgme AC_MSG_CHECKING(whether to link with libgpgme) AS_IF([test "x$with_gpgme" != "xno"], @@ -523,7 +534,7 @@ ${PACKAGE_NAME}: compiler : ${CC} preprocessor flags : ${CPPFLAGS} compiler flags : ${WARNING_CFLAGS} ${CFLAGS} - library flags : ${LIBS} ${LIBSSL_LIBS} ${LIBARCHIVE_LIBS} ${LIBCURL_LIBS} ${GPGME_LIBS} + library flags : ${LIBS} ${LIBSSL_LIBS} ${LIBPCRE_LIBS} ${LIBARCHIVE_LIBS} ${LIBCURL_LIBS} ${GPGME_LIBS} linker flags : ${LDFLAGS}
Architecture : ${CARCH} diff --git a/lib/libalpm/Makefile.am b/lib/libalpm/Makefile.am index 171bc46..0f71aed 100644 --- a/lib/libalpm/Makefile.am +++ b/lib/libalpm/Makefile.am @@ -75,13 +75,15 @@ libalpm_la_CFLAGS = \ $(GPGME_CFLAGS) \ $(LIBARCHIVE_CFLAGS) \ $(LIBCURL_CFLAGS) \ - $(LIBSSL_CFLAGS) + $(LIBSSL_CFLAGS) \ + $(LIBPCRE_CFLAGS)
libalpm_la_LIBADD = \ $(LTLIBINTL) \ $(GPGME_LIBS) \ $(LIBARCHIVE_LIBS) \ $(LIBCURL_LIBS) \ - $(LIBSSL_LIBS) + $(LIBSSL_LIBS) \ + $(LIBPCRE_LIBS)
# vim:set ts=2 sw=2 noet: diff --git a/lib/libalpm/add.c b/lib/libalpm/add.c index c20e7c6..d80dbe4 100644 --- a/lib/libalpm/add.c +++ b/lib/libalpm/add.c @@ -183,7 +183,7 @@ static int extract_single_file(alpm_handle_t *handle, struct archive *archive, }
/* if a file is in NoExtract then we never extract it */ - if(alpm_list_find(handle->noextract, entryname, _alpm_fnmatch)) { + if(alpm_list_find(handle->noextract, entryname, _alpm_regex)) { _alpm_log(handle, ALPM_LOG_DEBUG, "%s is in NoExtract," " skipping extraction of %s\n", entryname, filename); diff --git a/lib/libalpm/util.c b/lib/libalpm/util.c index f84fb66..5858d28 100644 --- a/lib/libalpm/util.c +++ b/lib/libalpm/util.c @@ -33,6 +33,7 @@ #include <sys/wait.h> #include <locale.h> /* setlocale */ #include <fnmatch.h> +#include <pcre.h>
/* libarchive */ #include <archive.h> @@ -1261,6 +1262,27 @@ int _alpm_fnmatch(const void *pattern, const void *string) return fnmatch(pattern, string, 0); }
+/** Checks whether a string matches an extended Perl regular expression. + * @param pattern string to match against + * @param string string to check against pattern + * @return 0 if string matches pattern, non-zero if they don't match and on + * error + */ +int _alpm_regex(const void *pattern, const void *string) +{ + const char *error; + int erroroff; + + pcre *pcre = pcre_compile(pattern, PCRE_EXTENDED | PCRE_ANCHORED, &error, &erroroff, NULL); + if (error != NULL) { + return -1; + } + + int ret = pcre_exec(pcre, NULL, string, strlen(string), 0, 0, NULL, 0); + pcre_free(pcre); + return ret < 0; +} + void _alpm_alloc_fail(size_t size) { fprintf(stderr, "alloc failure: could not allocate %zd bytes\n", size); diff --git a/lib/libalpm/util.h b/lib/libalpm/util.h index 56031f3..5ec36b6 100644 --- a/lib/libalpm/util.h +++ b/lib/libalpm/util.h @@ -141,6 +141,7 @@ int _alpm_raw_cmp(const char *first, const char *second); int _alpm_raw_ncmp(const char *first, const char *second, size_t max); int _alpm_access(alpm_handle_t *handle, const char *dir, const char *file, int amode); int _alpm_fnmatch(const void *pattern, const void *string); +int _alpm_regex(const void *pattern, const void *string);
#ifndef HAVE_STRSEP char *strsep(char **, const char *); -- 1.8.3
Regards, Patrick
On Tue, Jun 04, 2013 at 06:24:03PM +0200, Patrick Steinhardt wrote:
On Tue, Jun 04, 2013 at 11:46:50AM -0400, Dave Reisner wrote:
On Tue, Jun 04, 2013 at 05:03:47PM +0200, Patrick Steinhardt wrote:
Until now it was not easily possible to remove all files but some in a given directory. By enabling Perl regular expressions for NoExtract this is now made possible through negative lookahead.
Fixes FS#31749. ---
This patch is work in progress. I want to check if there is interest in having this feature available in pacman and how the extra dependency on libpcre is received. If it is well received I might still do some rework for the actual matching of NoExtract items, as currently I'm always recompiling the pattern for each match.
I've considered moving to PCRE, but I'm curious:
1) Why do you need this for NoExtract? Please be *specific* 2) What is insufficient about POSIX regex? (#include <regex.h>)
My usecase is specifically the directory /usr/share/locale. As I only need two files/directories (locale.alias and en_US) in there I'd need to specify a list of a whopping 106 entries that shouldn't be extracted.
POSIX regular expressions do not support negative lookarounds. Thus I might add entries like usr/share/locale{af,am,...} but it is not possible to remove everything but those two entries. With Perl regular expressions this is as easy as usr/share/locale/((?!en_US|locale.alias).+).
I'm sure there are other usecases where POSIX regular expressions may be insufficient but can't currently think about any other. In generl PCRE is in the core repo, widely used and very powerful, so I don't see any reason to not use it.
I think a major hurdle here is going to be that we've already standardized on fnmatch() as the matcher for not only NoExtract, but also IgnorePkg, IgnoreGroup, HoldPkg, and NoUpgrade. Moving to regex based matching flat out breaks expectations since a valid glob will *not* match the same data when interpreted as a regex, and it might not even be a valid regex (consider "*foo"). If you want to propose such a move, you either need a migration path (unlikely), or some sort of flag to determine what style of matching is performed on these config options, something like: MatchStyle = (Glob|Regex) Along with the necessary documentation. I'm not sure this is a road I want to go down.
configure.ac | 13 ++++++++++++- lib/libalpm/Makefile.am | 6 ++++-- lib/libalpm/add.c | 2 +- lib/libalpm/util.c | 22 ++++++++++++++++++++++ lib/libalpm/util.h | 1 + 5 files changed, 40 insertions(+), 4 deletions(-)
diff --git a/configure.ac b/configure.ac index 1ea0fce..29ea101 100644 --- a/configure.ac +++ b/configure.ac @@ -230,6 +230,17 @@ if test "x$with_libcurl" != "xno"; then fi AM_CONDITIONAL(HAVE_LIBCURL, [test "$have_libcurl" = "yes"])
+# Check for PCRE +have_libpcre=no +if test "x$with_libpcre" != "xno"; then + PKG_CHECK_MODULES(LIBPCRE, [libpcre], + [AC_DEFINE(HAVE_LIBPCRE, 1, [Define if libpcre is available]) have_libpcre=yes], have_libpcre=no) + if test "x$have_libpcre" = xno -a "x$with_libpcre" = xyes; then + AC_MSG_ERROR([*** libpcre is required for file matching support]) + fi +fi +AM_CONDITIONAL(HAVE_LIBPCRE, [test "$have_libpcre" = "yes"]) + # Check for gpgme AC_MSG_CHECKING(whether to link with libgpgme) AS_IF([test "x$with_gpgme" != "xno"], @@ -523,7 +534,7 @@ ${PACKAGE_NAME}: compiler : ${CC} preprocessor flags : ${CPPFLAGS} compiler flags : ${WARNING_CFLAGS} ${CFLAGS} - library flags : ${LIBS} ${LIBSSL_LIBS} ${LIBARCHIVE_LIBS} ${LIBCURL_LIBS} ${GPGME_LIBS} + library flags : ${LIBS} ${LIBSSL_LIBS} ${LIBPCRE_LIBS} ${LIBARCHIVE_LIBS} ${LIBCURL_LIBS} ${GPGME_LIBS} linker flags : ${LDFLAGS}
Architecture : ${CARCH} diff --git a/lib/libalpm/Makefile.am b/lib/libalpm/Makefile.am index 171bc46..0f71aed 100644 --- a/lib/libalpm/Makefile.am +++ b/lib/libalpm/Makefile.am @@ -75,13 +75,15 @@ libalpm_la_CFLAGS = \ $(GPGME_CFLAGS) \ $(LIBARCHIVE_CFLAGS) \ $(LIBCURL_CFLAGS) \ - $(LIBSSL_CFLAGS) + $(LIBSSL_CFLAGS) \ + $(LIBPCRE_CFLAGS)
libalpm_la_LIBADD = \ $(LTLIBINTL) \ $(GPGME_LIBS) \ $(LIBARCHIVE_LIBS) \ $(LIBCURL_LIBS) \ - $(LIBSSL_LIBS) + $(LIBSSL_LIBS) \ + $(LIBPCRE_LIBS)
# vim:set ts=2 sw=2 noet: diff --git a/lib/libalpm/add.c b/lib/libalpm/add.c index c20e7c6..d80dbe4 100644 --- a/lib/libalpm/add.c +++ b/lib/libalpm/add.c @@ -183,7 +183,7 @@ static int extract_single_file(alpm_handle_t *handle, struct archive *archive, }
/* if a file is in NoExtract then we never extract it */ - if(alpm_list_find(handle->noextract, entryname, _alpm_fnmatch)) { + if(alpm_list_find(handle->noextract, entryname, _alpm_regex)) { _alpm_log(handle, ALPM_LOG_DEBUG, "%s is in NoExtract," " skipping extraction of %s\n", entryname, filename); diff --git a/lib/libalpm/util.c b/lib/libalpm/util.c index f84fb66..5858d28 100644 --- a/lib/libalpm/util.c +++ b/lib/libalpm/util.c @@ -33,6 +33,7 @@ #include <sys/wait.h> #include <locale.h> /* setlocale */ #include <fnmatch.h> +#include <pcre.h>
/* libarchive */ #include <archive.h> @@ -1261,6 +1262,27 @@ int _alpm_fnmatch(const void *pattern, const void *string) return fnmatch(pattern, string, 0); }
+/** Checks whether a string matches an extended Perl regular expression. + * @param pattern string to match against + * @param string string to check against pattern + * @return 0 if string matches pattern, non-zero if they don't match and on + * error + */ +int _alpm_regex(const void *pattern, const void *string) +{ + const char *error; + int erroroff; + + pcre *pcre = pcre_compile(pattern, PCRE_EXTENDED | PCRE_ANCHORED, &error, &erroroff, NULL); + if (error != NULL) { + return -1; + } + + int ret = pcre_exec(pcre, NULL, string, strlen(string), 0, 0, NULL, 0); + pcre_free(pcre); + return ret < 0; +} + void _alpm_alloc_fail(size_t size) { fprintf(stderr, "alloc failure: could not allocate %zd bytes\n", size); diff --git a/lib/libalpm/util.h b/lib/libalpm/util.h index 56031f3..5ec36b6 100644 --- a/lib/libalpm/util.h +++ b/lib/libalpm/util.h @@ -141,6 +141,7 @@ int _alpm_raw_cmp(const char *first, const char *second); int _alpm_raw_ncmp(const char *first, const char *second, size_t max); int _alpm_access(alpm_handle_t *handle, const char *dir, const char *file, int amode); int _alpm_fnmatch(const void *pattern, const void *string); +int _alpm_regex(const void *pattern, const void *string);
#ifndef HAVE_STRSEP char *strsep(char **, const char *); -- 1.8.3
Regards, Patrick
On Tue, Jun 04, 2013 at 12:34:13PM -0400, Dave Reisner wrote:
On Tue, Jun 04, 2013 at 06:24:03PM +0200, Patrick Steinhardt wrote:
On Tue, Jun 04, 2013 at 11:46:50AM -0400, Dave Reisner wrote:
On Tue, Jun 04, 2013 at 05:03:47PM +0200, Patrick Steinhardt wrote:
Until now it was not easily possible to remove all files but some in a given directory. By enabling Perl regular expressions for NoExtract this is now made possible through negative lookahead.
Fixes FS#31749. ---
This patch is work in progress. I want to check if there is interest in having this feature available in pacman and how the extra dependency on libpcre is received. If it is well received I might still do some rework for the actual matching of NoExtract items, as currently I'm always recompiling the pattern for each match.
I've considered moving to PCRE, but I'm curious:
1) Why do you need this for NoExtract? Please be *specific* 2) What is insufficient about POSIX regex? (#include <regex.h>)
My usecase is specifically the directory /usr/share/locale. As I only need two files/directories (locale.alias and en_US) in there I'd need to specify a list of a whopping 106 entries that shouldn't be extracted.
POSIX regular expressions do not support negative lookarounds. Thus I might add entries like usr/share/locale{af,am,...} but it is not possible to remove everything but those two entries. With Perl regular expressions this is as easy as usr/share/locale/((?!en_US|locale.alias).+).
I'm sure there are other usecases where POSIX regular expressions may be insufficient but can't currently think about any other. In generl PCRE is in the core repo, widely used and very powerful, so I don't see any reason to not use it.
I think a major hurdle here is going to be that we've already standardized on fnmatch() as the matcher for not only NoExtract, but also IgnorePkg, IgnoreGroup, HoldPkg, and NoUpgrade. Moving to regex based matching flat out breaks expectations since a valid glob will *not* match the same data when interpreted as a regex, and it might not even be a valid regex (consider "*foo").
If you want to propose such a move, you either need a migration path (unlikely), or some sort of flag to determine what style of matching is performed on these config options, something like:
MatchStyle = (Glob|Regex)
Along with the necessary documentation. I'm not sure this is a road I want to go down.
I think though that feeding PCRE with fnmatch-style globs won't produce (much) different results. At least I made sure that regexps are anchored (must match right at the beginning) and globbing-operator from fnmatch (which has no FNM_PATHNAME set) and star-operator from PCRE behave the same. But probably this is too short-sighted and there will be loads of problems. Why are you opposed to the MatchStyle option? I've thought about such a flag as well (as a build flag though, disabling the PCRE dependency, even though pacman.conf variables are better suited) and can't see much hassle with that. If we decide to make that move one should consider changing behaviour of the other variables as well to achieve better consistency. No use in having similar things act differently. I'd be more than willing to implement both, MatchStyle and behaviour changes of IgnorePkg, IgnoreGroup, HoldPkg, NoUpgrade.
configure.ac | 13 ++++++++++++- lib/libalpm/Makefile.am | 6 ++++-- lib/libalpm/add.c | 2 +- lib/libalpm/util.c | 22 ++++++++++++++++++++++ lib/libalpm/util.h | 1 + 5 files changed, 40 insertions(+), 4 deletions(-)
diff --git a/configure.ac b/configure.ac index 1ea0fce..29ea101 100644 --- a/configure.ac +++ b/configure.ac @@ -230,6 +230,17 @@ if test "x$with_libcurl" != "xno"; then fi AM_CONDITIONAL(HAVE_LIBCURL, [test "$have_libcurl" = "yes"])
+# Check for PCRE +have_libpcre=no +if test "x$with_libpcre" != "xno"; then + PKG_CHECK_MODULES(LIBPCRE, [libpcre], + [AC_DEFINE(HAVE_LIBPCRE, 1, [Define if libpcre is available]) have_libpcre=yes], have_libpcre=no) + if test "x$have_libpcre" = xno -a "x$with_libpcre" = xyes; then + AC_MSG_ERROR([*** libpcre is required for file matching support]) + fi +fi +AM_CONDITIONAL(HAVE_LIBPCRE, [test "$have_libpcre" = "yes"]) + # Check for gpgme AC_MSG_CHECKING(whether to link with libgpgme) AS_IF([test "x$with_gpgme" != "xno"], @@ -523,7 +534,7 @@ ${PACKAGE_NAME}: compiler : ${CC} preprocessor flags : ${CPPFLAGS} compiler flags : ${WARNING_CFLAGS} ${CFLAGS} - library flags : ${LIBS} ${LIBSSL_LIBS} ${LIBARCHIVE_LIBS} ${LIBCURL_LIBS} ${GPGME_LIBS} + library flags : ${LIBS} ${LIBSSL_LIBS} ${LIBPCRE_LIBS} ${LIBARCHIVE_LIBS} ${LIBCURL_LIBS} ${GPGME_LIBS} linker flags : ${LDFLAGS}
Architecture : ${CARCH} diff --git a/lib/libalpm/Makefile.am b/lib/libalpm/Makefile.am index 171bc46..0f71aed 100644 --- a/lib/libalpm/Makefile.am +++ b/lib/libalpm/Makefile.am @@ -75,13 +75,15 @@ libalpm_la_CFLAGS = \ $(GPGME_CFLAGS) \ $(LIBARCHIVE_CFLAGS) \ $(LIBCURL_CFLAGS) \ - $(LIBSSL_CFLAGS) + $(LIBSSL_CFLAGS) \ + $(LIBPCRE_CFLAGS)
libalpm_la_LIBADD = \ $(LTLIBINTL) \ $(GPGME_LIBS) \ $(LIBARCHIVE_LIBS) \ $(LIBCURL_LIBS) \ - $(LIBSSL_LIBS) + $(LIBSSL_LIBS) \ + $(LIBPCRE_LIBS)
# vim:set ts=2 sw=2 noet: diff --git a/lib/libalpm/add.c b/lib/libalpm/add.c index c20e7c6..d80dbe4 100644 --- a/lib/libalpm/add.c +++ b/lib/libalpm/add.c @@ -183,7 +183,7 @@ static int extract_single_file(alpm_handle_t *handle, struct archive *archive, }
/* if a file is in NoExtract then we never extract it */ - if(alpm_list_find(handle->noextract, entryname, _alpm_fnmatch)) { + if(alpm_list_find(handle->noextract, entryname, _alpm_regex)) { _alpm_log(handle, ALPM_LOG_DEBUG, "%s is in NoExtract," " skipping extraction of %s\n", entryname, filename); diff --git a/lib/libalpm/util.c b/lib/libalpm/util.c index f84fb66..5858d28 100644 --- a/lib/libalpm/util.c +++ b/lib/libalpm/util.c @@ -33,6 +33,7 @@ #include <sys/wait.h> #include <locale.h> /* setlocale */ #include <fnmatch.h> +#include <pcre.h>
/* libarchive */ #include <archive.h> @@ -1261,6 +1262,27 @@ int _alpm_fnmatch(const void *pattern, const void *string) return fnmatch(pattern, string, 0); }
+/** Checks whether a string matches an extended Perl regular expression. + * @param pattern string to match against + * @param string string to check against pattern + * @return 0 if string matches pattern, non-zero if they don't match and on + * error + */ +int _alpm_regex(const void *pattern, const void *string) +{ + const char *error; + int erroroff; + + pcre *pcre = pcre_compile(pattern, PCRE_EXTENDED | PCRE_ANCHORED, &error, &erroroff, NULL); + if (error != NULL) { + return -1; + } + + int ret = pcre_exec(pcre, NULL, string, strlen(string), 0, 0, NULL, 0); + pcre_free(pcre); + return ret < 0; +} + void _alpm_alloc_fail(size_t size) { fprintf(stderr, "alloc failure: could not allocate %zd bytes\n", size); diff --git a/lib/libalpm/util.h b/lib/libalpm/util.h index 56031f3..5ec36b6 100644 --- a/lib/libalpm/util.h +++ b/lib/libalpm/util.h @@ -141,6 +141,7 @@ int _alpm_raw_cmp(const char *first, const char *second); int _alpm_raw_ncmp(const char *first, const char *second, size_t max); int _alpm_access(alpm_handle_t *handle, const char *dir, const char *file, int amode); int _alpm_fnmatch(const void *pattern, const void *string); +int _alpm_regex(const void *pattern, const void *string);
#ifndef HAVE_STRSEP char *strsep(char **, const char *); -- 1.8.3
Regards, Patrick
On Tue, Jun 04, 2013 at 06:46:01PM +0200, Patrick Steinhardt wrote:
On Tue, Jun 04, 2013 at 12:34:13PM -0400, Dave Reisner wrote:
On Tue, Jun 04, 2013 at 06:24:03PM +0200, Patrick Steinhardt wrote:
On Tue, Jun 04, 2013 at 11:46:50AM -0400, Dave Reisner wrote:
On Tue, Jun 04, 2013 at 05:03:47PM +0200, Patrick Steinhardt wrote:
Until now it was not easily possible to remove all files but some in a given directory. By enabling Perl regular expressions for NoExtract this is now made possible through negative lookahead.
Fixes FS#31749. ---
This patch is work in progress. I want to check if there is interest in having this feature available in pacman and how the extra dependency on libpcre is received. If it is well received I might still do some rework for the actual matching of NoExtract items, as currently I'm always recompiling the pattern for each match.
I've considered moving to PCRE, but I'm curious:
1) Why do you need this for NoExtract? Please be *specific* 2) What is insufficient about POSIX regex? (#include <regex.h>)
My usecase is specifically the directory /usr/share/locale. As I only need two files/directories (locale.alias and en_US) in there I'd need to specify a list of a whopping 106 entries that shouldn't be extracted.
POSIX regular expressions do not support negative lookarounds. Thus I might add entries like usr/share/locale{af,am,...} but it is not possible to remove everything but those two entries. With Perl regular expressions this is as easy as usr/share/locale/((?!en_US|locale.alias).+).
I'm sure there are other usecases where POSIX regular expressions may be insufficient but can't currently think about any other. In generl PCRE is in the core repo, widely used and very powerful, so I don't see any reason to not use it.
I think a major hurdle here is going to be that we've already standardized on fnmatch() as the matcher for not only NoExtract, but also IgnorePkg, IgnoreGroup, HoldPkg, and NoUpgrade. Moving to regex based matching flat out breaks expectations since a valid glob will *not* match the same data when interpreted as a regex, and it might not even be a valid regex (consider "*foo").
If you want to propose such a move, you either need a migration path (unlikely), or some sort of flag to determine what style of matching is performed on these config options, something like:
MatchStyle = (Glob|Regex)
Along with the necessary documentation. I'm not sure this is a road I want to go down.
I think though that feeding PCRE with fnmatch-style globs won't produce (much) different results. At least I made sure that regexps are anchored (must match right at the beginning) and globbing-operator from fnmatch (which has no FNM_PATHNAME set) and star-operator from PCRE behave the same. But probably this is too short-sighted and there will be loads of problems.
Forget that paragraph. Already forgot about the difference between star operators in fnmatch and PCRE, where it is a quantificator in PCRE.
Why are you opposed to the MatchStyle option? I've thought about such a flag as well (as a build flag though, disabling the PCRE dependency, even though pacman.conf variables are better suited) and can't see much hassle with that.
If we decide to make that move one should consider changing behaviour of the other variables as well to achieve better consistency. No use in having similar things act differently.
I'd be more than willing to implement both, MatchStyle and behaviour changes of IgnorePkg, IgnoreGroup, HoldPkg, NoUpgrade.
configure.ac | 13 ++++++++++++- lib/libalpm/Makefile.am | 6 ++++-- lib/libalpm/add.c | 2 +- lib/libalpm/util.c | 22 ++++++++++++++++++++++ lib/libalpm/util.h | 1 + 5 files changed, 40 insertions(+), 4 deletions(-)
diff --git a/configure.ac b/configure.ac index 1ea0fce..29ea101 100644 --- a/configure.ac +++ b/configure.ac @@ -230,6 +230,17 @@ if test "x$with_libcurl" != "xno"; then fi AM_CONDITIONAL(HAVE_LIBCURL, [test "$have_libcurl" = "yes"])
+# Check for PCRE +have_libpcre=no +if test "x$with_libpcre" != "xno"; then + PKG_CHECK_MODULES(LIBPCRE, [libpcre], + [AC_DEFINE(HAVE_LIBPCRE, 1, [Define if libpcre is available]) have_libpcre=yes], have_libpcre=no) + if test "x$have_libpcre" = xno -a "x$with_libpcre" = xyes; then + AC_MSG_ERROR([*** libpcre is required for file matching support]) + fi +fi +AM_CONDITIONAL(HAVE_LIBPCRE, [test "$have_libpcre" = "yes"]) + # Check for gpgme AC_MSG_CHECKING(whether to link with libgpgme) AS_IF([test "x$with_gpgme" != "xno"], @@ -523,7 +534,7 @@ ${PACKAGE_NAME}: compiler : ${CC} preprocessor flags : ${CPPFLAGS} compiler flags : ${WARNING_CFLAGS} ${CFLAGS} - library flags : ${LIBS} ${LIBSSL_LIBS} ${LIBARCHIVE_LIBS} ${LIBCURL_LIBS} ${GPGME_LIBS} + library flags : ${LIBS} ${LIBSSL_LIBS} ${LIBPCRE_LIBS} ${LIBARCHIVE_LIBS} ${LIBCURL_LIBS} ${GPGME_LIBS} linker flags : ${LDFLAGS}
Architecture : ${CARCH} diff --git a/lib/libalpm/Makefile.am b/lib/libalpm/Makefile.am index 171bc46..0f71aed 100644 --- a/lib/libalpm/Makefile.am +++ b/lib/libalpm/Makefile.am @@ -75,13 +75,15 @@ libalpm_la_CFLAGS = \ $(GPGME_CFLAGS) \ $(LIBARCHIVE_CFLAGS) \ $(LIBCURL_CFLAGS) \ - $(LIBSSL_CFLAGS) + $(LIBSSL_CFLAGS) \ + $(LIBPCRE_CFLAGS)
libalpm_la_LIBADD = \ $(LTLIBINTL) \ $(GPGME_LIBS) \ $(LIBARCHIVE_LIBS) \ $(LIBCURL_LIBS) \ - $(LIBSSL_LIBS) + $(LIBSSL_LIBS) \ + $(LIBPCRE_LIBS)
# vim:set ts=2 sw=2 noet: diff --git a/lib/libalpm/add.c b/lib/libalpm/add.c index c20e7c6..d80dbe4 100644 --- a/lib/libalpm/add.c +++ b/lib/libalpm/add.c @@ -183,7 +183,7 @@ static int extract_single_file(alpm_handle_t *handle, struct archive *archive, }
/* if a file is in NoExtract then we never extract it */ - if(alpm_list_find(handle->noextract, entryname, _alpm_fnmatch)) { + if(alpm_list_find(handle->noextract, entryname, _alpm_regex)) { _alpm_log(handle, ALPM_LOG_DEBUG, "%s is in NoExtract," " skipping extraction of %s\n", entryname, filename); diff --git a/lib/libalpm/util.c b/lib/libalpm/util.c index f84fb66..5858d28 100644 --- a/lib/libalpm/util.c +++ b/lib/libalpm/util.c @@ -33,6 +33,7 @@ #include <sys/wait.h> #include <locale.h> /* setlocale */ #include <fnmatch.h> +#include <pcre.h>
/* libarchive */ #include <archive.h> @@ -1261,6 +1262,27 @@ int _alpm_fnmatch(const void *pattern, const void *string) return fnmatch(pattern, string, 0); }
+/** Checks whether a string matches an extended Perl regular expression. + * @param pattern string to match against + * @param string string to check against pattern + * @return 0 if string matches pattern, non-zero if they don't match and on + * error + */ +int _alpm_regex(const void *pattern, const void *string) +{ + const char *error; + int erroroff; + + pcre *pcre = pcre_compile(pattern, PCRE_EXTENDED | PCRE_ANCHORED, &error, &erroroff, NULL); + if (error != NULL) { + return -1; + } + + int ret = pcre_exec(pcre, NULL, string, strlen(string), 0, 0, NULL, 0); + pcre_free(pcre); + return ret < 0; +} + void _alpm_alloc_fail(size_t size) { fprintf(stderr, "alloc failure: could not allocate %zd bytes\n", size); diff --git a/lib/libalpm/util.h b/lib/libalpm/util.h index 56031f3..5ec36b6 100644 --- a/lib/libalpm/util.h +++ b/lib/libalpm/util.h @@ -141,6 +141,7 @@ int _alpm_raw_cmp(const char *first, const char *second); int _alpm_raw_ncmp(const char *first, const char *second, size_t max); int _alpm_access(alpm_handle_t *handle, const char *dir, const char *file, int amode); int _alpm_fnmatch(const void *pattern, const void *string); +int _alpm_regex(const void *pattern, const void *string);
#ifndef HAVE_STRSEP char *strsep(char **, const char *); -- 1.8.3
Regards, Patrick
On 05/06/13 02:34, Dave Reisner wrote:
On Tue, Jun 04, 2013 at 06:24:03PM +0200, Patrick Steinhardt wrote:
On Tue, Jun 04, 2013 at 11:46:50AM -0400, Dave Reisner wrote:
On Tue, Jun 04, 2013 at 05:03:47PM +0200, Patrick Steinhardt wrote:
Until now it was not easily possible to remove all files but some in a given directory. By enabling Perl regular expressions for NoExtract this is now made possible through negative lookahead.
Fixes FS#31749. ---
This patch is work in progress. I want to check if there is interest in having this feature available in pacman and how the extra dependency on libpcre is received. If it is well received I might still do some rework for the actual matching of NoExtract items, as currently I'm always recompiling the pattern for each match.
I've considered moving to PCRE, but I'm curious:
1) Why do you need this for NoExtract? Please be *specific* 2) What is insufficient about POSIX regex? (#include <regex.h>)
My usecase is specifically the directory /usr/share/locale. As I only need two files/directories (locale.alias and en_US) in there I'd need to specify a list of a whopping 106 entries that shouldn't be extracted.
POSIX regular expressions do not support negative lookarounds. Thus I might add entries like usr/share/locale{af,am,...} but it is not possible to remove everything but those two entries. With Perl regular expressions this is as easy as usr/share/locale/((?!en_US|locale.alias).+).
I'm sure there are other usecases where POSIX regular expressions may be insufficient but can't currently think about any other. In generl PCRE is in the core repo, widely used and very powerful, so I don't see any reason to not use it.
I think a major hurdle here is going to be that we've already standardized on fnmatch() as the matcher for not only NoExtract, but also IgnorePkg, IgnoreGroup, HoldPkg, and NoUpgrade. Moving to regex based matching flat out breaks expectations since a valid glob will *not* match the same data when interpreted as a regex, and it might not even be a valid regex (consider "*foo").
If you want to propose such a move, you either need a migration path (unlikely), or some sort of flag to determine what style of matching is performed on these config options, something like:
MatchStyle = (Glob|Regex)
Along with the necessary documentation. I'm not sure this is a road I want to go down.
I am not sure what I think about this. So here are a collection of thoughts: 1) Being able to ignore e.g. all locales apart from the one you want would be a good thing. 2) PCRE is a widely used library and I doubt many Linux distributions do not have it installed (and have grep linked to it) 3) This would need to be entirely optional at configure time, much like gpgme is. 4) I really do not want another configuration option for this. It would have to be glob matches when built without pcre and regex when built with it. But then how would the user know which pacman is using? 5) The upgrade path is particularly important given pacman is mostly (only) used on rolling release distros. But would the distribution adding a note in pacman.conf that the user will merge be enough? So... that has me leaning towards this being a good idea. @Dave: would suggesting the distros handle the "migration path" be fine with you? I'm not sure wildcards in any of those configuration options are widely used. Allan
On 06/05/13 at 02:35pm, Allan McRae wrote:
On 05/06/13 02:34, Dave Reisner wrote:
On Tue, Jun 04, 2013 at 06:24:03PM +0200, Patrick Steinhardt wrote:
On Tue, Jun 04, 2013 at 11:46:50AM -0400, Dave Reisner wrote:
On Tue, Jun 04, 2013 at 05:03:47PM +0200, Patrick Steinhardt wrote:
Until now it was not easily possible to remove all files but some in a given directory. By enabling Perl regular expressions for NoExtract this is now made possible through negative lookahead.
Fixes FS#31749. ---
This patch is work in progress. I want to check if there is interest in having this feature available in pacman and how the extra dependency on libpcre is received. If it is well received I might still do some rework for the actual matching of NoExtract items, as currently I'm always recompiling the pattern for each match.
I've considered moving to PCRE, but I'm curious:
1) Why do you need this for NoExtract? Please be *specific* 2) What is insufficient about POSIX regex? (#include <regex.h>)
My usecase is specifically the directory /usr/share/locale. As I only need two files/directories (locale.alias and en_US) in there I'd need to specify a list of a whopping 106 entries that shouldn't be extracted.
POSIX regular expressions do not support negative lookarounds. Thus I might add entries like usr/share/locale{af,am,...} but it is not possible to remove everything but those two entries. With Perl regular expressions this is as easy as usr/share/locale/((?!en_US|locale.alias).+).
I'm sure there are other usecases where POSIX regular expressions may be insufficient but can't currently think about any other. In generl PCRE is in the core repo, widely used and very powerful, so I don't see any reason to not use it.
I think a major hurdle here is going to be that we've already standardized on fnmatch() as the matcher for not only NoExtract, but also IgnorePkg, IgnoreGroup, HoldPkg, and NoUpgrade. Moving to regex based matching flat out breaks expectations since a valid glob will *not* match the same data when interpreted as a regex, and it might not even be a valid regex (consider "*foo").
If you want to propose such a move, you either need a migration path (unlikely), or some sort of flag to determine what style of matching is performed on these config options, something like:
MatchStyle = (Glob|Regex)
Along with the necessary documentation. I'm not sure this is a road I want to go down.
I am not sure what I think about this. So here are a collection of thoughts:
1) Being able to ignore e.g. all locales apart from the one you want would be a good thing.
2) PCRE is a widely used library and I doubt many Linux distributions do not have it installed (and have grep linked to it)
3) This would need to be entirely optional at configure time, much like gpgme is.
4) I really do not want another configuration option for this. It would have to be glob matches when built without pcre and regex when built with it. But then how would the user know which pacman is using?
5) The upgrade path is particularly important given pacman is mostly (only) used on rolling release distros. But would the distribution adding a note in pacman.conf that the user will merge be enough?
So... that has me leaning towards this being a good idea.
@Dave: would suggesting the distros handle the "migration path" be fine with you? I'm not sure wildcards in any of those configuration options are widely used.
Allan
I tend to think that the particular problem at issue here would be better solved by a negation operator ala gitignore: NoExtract = usr/share/locale/* !usr/share/locale/en_US/* If we do add pcre support, could we include the type of pattern being used in the pattern itself? Something like: NoExtract = glob::usr/share/locale/* NoExtract = pcre::usr/share/locale/(?!en_US).* That way there would be no ambiguity about which type of pattern is being used and we could default to a glob if no type is specified, allowing existing configs to continue to work. apg
On Wed, Jun 05, 2013 at 02:16:22AM -0400, Andrew Gregory wrote:
On 06/05/13 at 02:35pm, Allan McRae wrote:
On 05/06/13 02:34, Dave Reisner wrote:
On Tue, Jun 04, 2013 at 06:24:03PM +0200, Patrick Steinhardt wrote:
On Tue, Jun 04, 2013 at 11:46:50AM -0400, Dave Reisner wrote:
On Tue, Jun 04, 2013 at 05:03:47PM +0200, Patrick Steinhardt wrote:
Until now it was not easily possible to remove all files but some in a given directory. By enabling Perl regular expressions for NoExtract this is now made possible through negative lookahead.
Fixes FS#31749. ---
This patch is work in progress. I want to check if there is interest in having this feature available in pacman and how the extra dependency on libpcre is received. If it is well received I might still do some rework for the actual matching of NoExtract items, as currently I'm always recompiling the pattern for each match.
I've considered moving to PCRE, but I'm curious:
1) Why do you need this for NoExtract? Please be *specific* 2) What is insufficient about POSIX regex? (#include <regex.h>)
My usecase is specifically the directory /usr/share/locale. As I only need two files/directories (locale.alias and en_US) in there I'd need to specify a list of a whopping 106 entries that shouldn't be extracted.
POSIX regular expressions do not support negative lookarounds. Thus I might add entries like usr/share/locale{af,am,...} but it is not possible to remove everything but those two entries. With Perl regular expressions this is as easy as usr/share/locale/((?!en_US|locale.alias).+).
I'm sure there are other usecases where POSIX regular expressions may be insufficient but can't currently think about any other. In generl PCRE is in the core repo, widely used and very powerful, so I don't see any reason to not use it.
I think a major hurdle here is going to be that we've already standardized on fnmatch() as the matcher for not only NoExtract, but also IgnorePkg, IgnoreGroup, HoldPkg, and NoUpgrade. Moving to regex based matching flat out breaks expectations since a valid glob will *not* match the same data when interpreted as a regex, and it might not even be a valid regex (consider "*foo").
If you want to propose such a move, you either need a migration path (unlikely), or some sort of flag to determine what style of matching is performed on these config options, something like:
MatchStyle = (Glob|Regex)
Along with the necessary documentation. I'm not sure this is a road I want to go down.
I am not sure what I think about this. So here are a collection of thoughts:
1) Being able to ignore e.g. all locales apart from the one you want would be a good thing.
2) PCRE is a widely used library and I doubt many Linux distributions do not have it installed (and have grep linked to it)
3) This would need to be entirely optional at configure time, much like gpgme is.
4) I really do not want another configuration option for this. It would have to be glob matches when built without pcre and regex when built with it. But then how would the user know which pacman is using?
5) The upgrade path is particularly important given pacman is mostly (only) used on rolling release distros. But would the distribution adding a note in pacman.conf that the user will merge be enough?
So... that has me leaning towards this being a good idea.
@Dave: would suggesting the distros handle the "migration path" be fine with you? I'm not sure wildcards in any of those configuration options are widely used.
Allan
I tend to think that the particular problem at issue here would be better solved by a negation operator ala gitignore:
NoExtract = usr/share/locale/* !usr/share/locale/en_US/*
If we do add pcre support, could we include the type of pattern being used in the pattern itself? Something like:
NoExtract = glob::usr/share/locale/* NoExtract = pcre::usr/share/locale/(?!en_US).*
That way there would be no ambiguity about which type of pattern is being used and we could default to a glob if no type is specified, allowing existing configs to continue to work.
This way of handling would work well with the build time option Allan wants: if pcre support is not builtin all pattern starting with pcre:: might just be skipped. Still, it doesn't feel right to prefix every regex and it's unlikely users will mix glob and pcre. Because of this I'd prefer the MatchStyle option. If pacman wasn't built with pcre it might throw a warning/error at startup when MatchStyle is set to pcre and inform the user that it cannot handle that because of missing libpcre support.
apg
On 05.06.2013 08:23, Patrick Steinhardt wrote:
On Wed, Jun 05, 2013 at 02:16:22AM -0400, Andrew Gregory wrote:
NoExtract = glob::usr/share/locale/* NoExtract = pcre::usr/share/locale/(?!en_US).*
Still, it doesn't feel right to prefix every regex and it's unlikely users will mix glob and pcre.
Because of this I'd prefer the MatchStyle option.
I'm leaning towards MatchStyle here too. fnmatch's features (probably * being the only one used, if any at all) can be rewritten in pcre rather simply.
On 05/06/13 16:16, Andrew Gregory wrote:
On 06/05/13 at 02:35pm, Allan McRae wrote:
On 05/06/13 02:34, Dave Reisner wrote:
On Tue, Jun 04, 2013 at 06:24:03PM +0200, Patrick Steinhardt wrote:
On Tue, Jun 04, 2013 at 11:46:50AM -0400, Dave Reisner wrote:
On Tue, Jun 04, 2013 at 05:03:47PM +0200, Patrick Steinhardt wrote:
Until now it was not easily possible to remove all files but some in a given directory. By enabling Perl regular expressions for NoExtract this is now made possible through negative lookahead.
Fixes FS#31749. ---
This patch is work in progress. I want to check if there is interest in having this feature available in pacman and how the extra dependency on libpcre is received. If it is well received I might still do some rework for the actual matching of NoExtract items, as currently I'm always recompiling the pattern for each match.
I've considered moving to PCRE, but I'm curious:
1) Why do you need this for NoExtract? Please be *specific* 2) What is insufficient about POSIX regex? (#include <regex.h>)
My usecase is specifically the directory /usr/share/locale. As I only need two files/directories (locale.alias and en_US) in there I'd need to specify a list of a whopping 106 entries that shouldn't be extracted.
POSIX regular expressions do not support negative lookarounds. Thus I might add entries like usr/share/locale{af,am,...} but it is not possible to remove everything but those two entries. With Perl regular expressions this is as easy as usr/share/locale/((?!en_US|locale.alias).+).
I'm sure there are other usecases where POSIX regular expressions may be insufficient but can't currently think about any other. In generl PCRE is in the core repo, widely used and very powerful, so I don't see any reason to not use it.
I think a major hurdle here is going to be that we've already standardized on fnmatch() as the matcher for not only NoExtract, but also IgnorePkg, IgnoreGroup, HoldPkg, and NoUpgrade. Moving to regex based matching flat out breaks expectations since a valid glob will *not* match the same data when interpreted as a regex, and it might not even be a valid regex (consider "*foo").
If you want to propose such a move, you either need a migration path (unlikely), or some sort of flag to determine what style of matching is performed on these config options, something like:
MatchStyle = (Glob|Regex)
Along with the necessary documentation. I'm not sure this is a road I want to go down.
I am not sure what I think about this. So here are a collection of thoughts:
1) Being able to ignore e.g. all locales apart from the one you want would be a good thing.
2) PCRE is a widely used library and I doubt many Linux distributions do not have it installed (and have grep linked to it)
3) This would need to be entirely optional at configure time, much like gpgme is.
4) I really do not want another configuration option for this. It would have to be glob matches when built without pcre and regex when built with it. But then how would the user know which pacman is using?
5) The upgrade path is particularly important given pacman is mostly (only) used on rolling release distros. But would the distribution adding a note in pacman.conf that the user will merge be enough?
So... that has me leaning towards this being a good idea.
@Dave: would suggesting the distros handle the "migration path" be fine with you? I'm not sure wildcards in any of those configuration options are widely used.
Allan
I tend to think that the particular problem at issue here would be better solved by a negation operator ala gitignore:
NoExtract = usr/share/locale/* !usr/share/locale/en_US/*
I like this idea.
If we do add pcre support, could we include the type of pattern being used in the pattern itself? Something like:
NoExtract = glob::usr/share/locale/* NoExtract = pcre::usr/share/locale/(?!en_US).*
I don't like this idea.
On Wed, Jun 05, 2013 at 10:28:25PM +1000, Allan McRae wrote:
On 05/06/13 16:16, Andrew Gregory wrote:
On 06/05/13 at 02:35pm, Allan McRae wrote:
On 05/06/13 02:34, Dave Reisner wrote:
On Tue, Jun 04, 2013 at 06:24:03PM +0200, Patrick Steinhardt wrote:
On Tue, Jun 04, 2013 at 11:46:50AM -0400, Dave Reisner wrote:
On Tue, Jun 04, 2013 at 05:03:47PM +0200, Patrick Steinhardt wrote: > Until now it was not easily possible to remove all files but some > in a given directory. By enabling Perl regular expressions for > NoExtract this is now made possible through negative lookahead. > > Fixes FS#31749. > --- > > This patch is work in progress. I want to check if there is > interest in having this feature available in pacman and how the > extra dependency on libpcre is received. If it is well received I > might still do some rework for the actual matching of NoExtract > items, as currently I'm always recompiling the pattern for each > match.
I've considered moving to PCRE, but I'm curious:
1) Why do you need this for NoExtract? Please be *specific* 2) What is insufficient about POSIX regex? (#include <regex.h>)
My usecase is specifically the directory /usr/share/locale. As I only need two files/directories (locale.alias and en_US) in there I'd need to specify a list of a whopping 106 entries that shouldn't be extracted.
POSIX regular expressions do not support negative lookarounds. Thus I might add entries like usr/share/locale{af,am,...} but it is not possible to remove everything but those two entries. With Perl regular expressions this is as easy as usr/share/locale/((?!en_US|locale.alias).+).
I'm sure there are other usecases where POSIX regular expressions may be insufficient but can't currently think about any other. In generl PCRE is in the core repo, widely used and very powerful, so I don't see any reason to not use it.
I think a major hurdle here is going to be that we've already standardized on fnmatch() as the matcher for not only NoExtract, but also IgnorePkg, IgnoreGroup, HoldPkg, and NoUpgrade. Moving to regex based matching flat out breaks expectations since a valid glob will *not* match the same data when interpreted as a regex, and it might not even be a valid regex (consider "*foo").
If you want to propose such a move, you either need a migration path (unlikely), or some sort of flag to determine what style of matching is performed on these config options, something like:
MatchStyle = (Glob|Regex)
Along with the necessary documentation. I'm not sure this is a road I want to go down.
I am not sure what I think about this. So here are a collection of thoughts:
1) Being able to ignore e.g. all locales apart from the one you want would be a good thing.
2) PCRE is a widely used library and I doubt many Linux distributions do not have it installed (and have grep linked to it)
3) This would need to be entirely optional at configure time, much like gpgme is.
4) I really do not want another configuration option for this. It would have to be glob matches when built without pcre and regex when built with it. But then how would the user know which pacman is using?
5) The upgrade path is particularly important given pacman is mostly (only) used on rolling release distros. But would the distribution adding a note in pacman.conf that the user will merge be enough?
So... that has me leaning towards this being a good idea.
@Dave: would suggesting the distros handle the "migration path" be fine with you? I'm not sure wildcards in any of those configuration options are widely used.
Allan
I tend to think that the particular problem at issue here would be better solved by a negation operator ala gitignore:
NoExtract = usr/share/locale/* !usr/share/locale/en_US/*
I like this idea.
How would you ignore all but two files in this directory then? NoExtract = !usr/share/locale/en_US/* !usr/share/locale/locale.alias One problem that comes to my mind: the first inverse match already includes locale.alias and as such it would not be extracted. The second term would exclude locale.alias but include en_US/*. Currently we abort as soon as the first expression (whether it is PCRE or fnmatch) matches. If we do it like that we would need to always iterate over all entries in NoExtract and check if a later occurence of the same file exists that overwrites previous occurences. _If_ a later term matches again it is unclear as to what to do, as stated above.
If we do add pcre support, could we include the type of pattern being used in the pattern itself? Something like:
NoExtract = glob::usr/share/locale/* NoExtract = pcre::usr/share/locale/(?!en_US).*
I don't like this idea.
On 06/05/13 at 03:10pm, Patrick Steinhardt wrote:
On Wed, Jun 05, 2013 at 10:28:25PM +1000, Allan McRae wrote:
On 05/06/13 16:16, Andrew Gregory wrote:
I tend to think that the particular problem at issue here would be better solved by a negation operator ala gitignore:
NoExtract = usr/share/locale/* !usr/share/locale/en_US/*
I like this idea.
How would you ignore all but two files in this directory then?
NoExtract = !usr/share/locale/en_US/* !usr/share/locale/locale.alias
One problem that comes to my mind: the first inverse match already includes locale.alias and as such it would not be extracted. The second term would exclude locale.alias but include en_US/*.
Currently we abort as soon as the first expression (whether it is PCRE or fnmatch) matches. If we do it like that we would need to always iterate over all entries in NoExtract and check if a later occurence of the same file exists that overwrites previous occurences. _If_ a later term matches again it is unclear as to what to do, as stated above.
I would simply have the last match win. It's simple, flexible, and we can still abort as soon as we find a match if we just check them in reverse order. So you could do what you want with: NoExtract = usr/share/locale/* NoExtract = !usr/share/locale/en_US/* NoExtract = !usr/share/locale/locale.alias apg
On Wed, Jun 05, 2013 at 03:19:18PM -0400, Andrew Gregory wrote:
On 06/05/13 at 03:10pm, Patrick Steinhardt wrote:
On Wed, Jun 05, 2013 at 10:28:25PM +1000, Allan McRae wrote:
On 05/06/13 16:16, Andrew Gregory wrote:
I tend to think that the particular problem at issue here would be better solved by a negation operator ala gitignore:
NoExtract = usr/share/locale/* !usr/share/locale/en_US/*
I like this idea.
How would you ignore all but two files in this directory then?
NoExtract = !usr/share/locale/en_US/* !usr/share/locale/locale.alias
One problem that comes to my mind: the first inverse match already includes locale.alias and as such it would not be extracted. The second term would exclude locale.alias but include en_US/*.
Currently we abort as soon as the first expression (whether it is PCRE or fnmatch) matches. If we do it like that we would need to always iterate over all entries in NoExtract and check if a later occurence of the same file exists that overwrites previous occurences. _If_ a later term matches again it is unclear as to what to do, as stated above.
I would simply have the last match win. It's simple, flexible, and we can still abort as soon as we find a match if we just check them in reverse order. So you could do what you want with:
NoExtract = usr/share/locale/* NoExtract = !usr/share/locale/en_US/* NoExtract = !usr/share/locale/locale.alias
usr/share/locale/locale.alias already matches en_US/*.
apg
On 06/06/13 05:11, Patrick Steinhardt wrote:
On Wed, Jun 05, 2013 at 03:19:18PM -0400, Andrew Gregory wrote:
On 06/05/13 at 03:10pm, Patrick Steinhardt wrote:
On Wed, Jun 05, 2013 at 10:28:25PM +1000, Allan McRae wrote:
On 05/06/13 16:16, Andrew Gregory wrote:
I tend to think that the particular problem at issue here would be better solved by a negation operator ala gitignore:
NoExtract = usr/share/locale/* !usr/share/locale/en_US/*
I like this idea.
How would you ignore all but two files in this directory then?
NoExtract = !usr/share/locale/en_US/* !usr/share/locale/locale.alias
One problem that comes to my mind: the first inverse match already includes locale.alias and as such it would not be extracted. The second term would exclude locale.alias but include en_US/*.
Currently we abort as soon as the first expression (whether it is PCRE or fnmatch) matches. If we do it like that we would need to always iterate over all entries in NoExtract and check if a later occurence of the same file exists that overwrites previous occurences. _If_ a later term matches again it is unclear as to what to do, as stated above.
I would simply have the last match win. It's simple, flexible, and we can still abort as soon as we find a match if we just check them in reverse order. So you could do what you want with:
NoExtract = usr/share/locale/* NoExtract = !usr/share/locale/en_US/* NoExtract = !usr/share/locale/locale.alias
usr/share/locale/locale.alias already matches en_US/*.
Huh? Also, what does gitignore do in this case. Anyway, it should be fairly easy to implement this. 1) exact file matches follow their rule 2) more specific globs take precedence over less specific globs. Done...
participants (5)
-
Allan McRae
-
Andrew Gregory
-
Dave Reisner
-
Florian Pritz
-
Patrick Steinhardt