[pacman-dev] [PATCH] Enable Perl regular expressions in NoExtract.
Dave Reisner
d at falconindy.com
Tue Jun 4 12:34:13 EDT 2013
On Tue, Jun 04, 2013 at 06:24:03PM +0200, Patrick Steinhardt wrote:
> On Tue, Jun 04, 2013 at 11:46:50AM -0400, Dave Reisner wrote:
> > On Tue, Jun 04, 2013 at 05:03:47PM +0200, Patrick Steinhardt wrote:
> > > Until now it was not easily possible to remove all files but some
> > > in a given directory. By enabling Perl regular expressions for
> > > NoExtract this is now made possible through negative lookahead.
> > >
> > > Fixes FS#31749.
> > > ---
> > >
> > > This patch is work in progress. I want to check if there is
> > > interest in having this feature available in pacman and how the
> > > extra dependency on libpcre is received. If it is well received I
> > > might still do some rework for the actual matching of NoExtract
> > > items, as currently I'm always recompiling the pattern for each
> > > match.
> >
> > I've considered moving to PCRE, but I'm curious:
> >
> > 1) Why do you need this for NoExtract? Please be *specific*
> > 2) What is insufficient about POSIX regex? (#include <regex.h>)
>
> My usecase is specifically the directory /usr/share/locale. As I
> only need two files/directories (locale.alias and en_US) in there
> I'd need to specify a list of a whopping 106 entries that
> shouldn't be extracted.
>
> POSIX regular expressions do not support negative lookarounds.
> Thus I might add entries like usr/share/locale{af,am,...} but it
> is not possible to remove everything but those two entries. With
> Perl regular expressions this is as easy as
> usr/share/locale/((?!en_US|locale.alias).+).
>
> I'm sure there are other usecases where POSIX regular expressions
> may be insufficient but can't currently think about any other. In
> generl PCRE is in the core repo, widely used and very powerful,
> so I don't see any reason to not use it.
I think a major hurdle here is going to be that we've already
standardized on fnmatch() as the matcher for not only NoExtract, but
also IgnorePkg, IgnoreGroup, HoldPkg, and NoUpgrade. Moving to regex
based matching flat out breaks expectations since a valid glob will
*not* match the same data when interpreted as a regex, and it might not
even be a valid regex (consider "*foo").
If you want to propose such a move, you either need a migration path
(unlikely), or some sort of flag to determine what style of matching is
performed on these config options, something like:
MatchStyle = (Glob|Regex)
Along with the necessary documentation. I'm not sure this is a road I
want to go down.
>
> >
> > > configure.ac | 13 ++++++++++++-
> > > lib/libalpm/Makefile.am | 6 ++++--
> > > lib/libalpm/add.c | 2 +-
> > > lib/libalpm/util.c | 22 ++++++++++++++++++++++
> > > lib/libalpm/util.h | 1 +
> > > 5 files changed, 40 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/configure.ac b/configure.ac
> > > index 1ea0fce..29ea101 100644
> > > --- a/configure.ac
> > > +++ b/configure.ac
> > > @@ -230,6 +230,17 @@ if test "x$with_libcurl" != "xno"; then
> > > fi
> > > AM_CONDITIONAL(HAVE_LIBCURL, [test "$have_libcurl" = "yes"])
> > >
> > > +# Check for PCRE
> > > +have_libpcre=no
> > > +if test "x$with_libpcre" != "xno"; then
> > > + PKG_CHECK_MODULES(LIBPCRE, [libpcre],
> > > + [AC_DEFINE(HAVE_LIBPCRE, 1, [Define if libpcre is available]) have_libpcre=yes], have_libpcre=no)
> > > + if test "x$have_libpcre" = xno -a "x$with_libpcre" = xyes; then
> > > + AC_MSG_ERROR([*** libpcre is required for file matching support])
> > > + fi
> > > +fi
> > > +AM_CONDITIONAL(HAVE_LIBPCRE, [test "$have_libpcre" = "yes"])
> > > +
> > > # Check for gpgme
> > > AC_MSG_CHECKING(whether to link with libgpgme)
> > > AS_IF([test "x$with_gpgme" != "xno"],
> > > @@ -523,7 +534,7 @@ ${PACKAGE_NAME}:
> > > compiler : ${CC}
> > > preprocessor flags : ${CPPFLAGS}
> > > compiler flags : ${WARNING_CFLAGS} ${CFLAGS}
> > > - library flags : ${LIBS} ${LIBSSL_LIBS} ${LIBARCHIVE_LIBS} ${LIBCURL_LIBS} ${GPGME_LIBS}
> > > + library flags : ${LIBS} ${LIBSSL_LIBS} ${LIBPCRE_LIBS} ${LIBARCHIVE_LIBS} ${LIBCURL_LIBS} ${GPGME_LIBS}
> > > linker flags : ${LDFLAGS}
> > >
> > > Architecture : ${CARCH}
> > > diff --git a/lib/libalpm/Makefile.am b/lib/libalpm/Makefile.am
> > > index 171bc46..0f71aed 100644
> > > --- a/lib/libalpm/Makefile.am
> > > +++ b/lib/libalpm/Makefile.am
> > > @@ -75,13 +75,15 @@ libalpm_la_CFLAGS = \
> > > $(GPGME_CFLAGS) \
> > > $(LIBARCHIVE_CFLAGS) \
> > > $(LIBCURL_CFLAGS) \
> > > - $(LIBSSL_CFLAGS)
> > > + $(LIBSSL_CFLAGS) \
> > > + $(LIBPCRE_CFLAGS)
> > >
> > > libalpm_la_LIBADD = \
> > > $(LTLIBINTL) \
> > > $(GPGME_LIBS) \
> > > $(LIBARCHIVE_LIBS) \
> > > $(LIBCURL_LIBS) \
> > > - $(LIBSSL_LIBS)
> > > + $(LIBSSL_LIBS) \
> > > + $(LIBPCRE_LIBS)
> > >
> > > # vim:set ts=2 sw=2 noet:
> > > diff --git a/lib/libalpm/add.c b/lib/libalpm/add.c
> > > index c20e7c6..d80dbe4 100644
> > > --- a/lib/libalpm/add.c
> > > +++ b/lib/libalpm/add.c
> > > @@ -183,7 +183,7 @@ static int extract_single_file(alpm_handle_t *handle, struct archive *archive,
> > > }
> > >
> > > /* if a file is in NoExtract then we never extract it */
> > > - if(alpm_list_find(handle->noextract, entryname, _alpm_fnmatch)) {
> > > + if(alpm_list_find(handle->noextract, entryname, _alpm_regex)) {
> > > _alpm_log(handle, ALPM_LOG_DEBUG, "%s is in NoExtract,"
> > > " skipping extraction of %s\n",
> > > entryname, filename);
> > > diff --git a/lib/libalpm/util.c b/lib/libalpm/util.c
> > > index f84fb66..5858d28 100644
> > > --- a/lib/libalpm/util.c
> > > +++ b/lib/libalpm/util.c
> > > @@ -33,6 +33,7 @@
> > > #include <sys/wait.h>
> > > #include <locale.h> /* setlocale */
> > > #include <fnmatch.h>
> > > +#include <pcre.h>
> > >
> > > /* libarchive */
> > > #include <archive.h>
> > > @@ -1261,6 +1262,27 @@ int _alpm_fnmatch(const void *pattern, const void *string)
> > > return fnmatch(pattern, string, 0);
> > > }
> > >
> > > +/** Checks whether a string matches an extended Perl regular expression.
> > > + * @param pattern string to match against
> > > + * @param string string to check against pattern
> > > + * @return 0 if string matches pattern, non-zero if they don't match and on
> > > + * error
> > > + */
> > > +int _alpm_regex(const void *pattern, const void *string)
> > > +{
> > > + const char *error;
> > > + int erroroff;
> > > +
> > > + pcre *pcre = pcre_compile(pattern, PCRE_EXTENDED | PCRE_ANCHORED, &error, &erroroff, NULL);
> > > + if (error != NULL) {
> > > + return -1;
> > > + }
> > > +
> > > + int ret = pcre_exec(pcre, NULL, string, strlen(string), 0, 0, NULL, 0);
> > > + pcre_free(pcre);
> > > + return ret < 0;
> > > +}
> > > +
> > > void _alpm_alloc_fail(size_t size)
> > > {
> > > fprintf(stderr, "alloc failure: could not allocate %zd bytes\n", size);
> > > diff --git a/lib/libalpm/util.h b/lib/libalpm/util.h
> > > index 56031f3..5ec36b6 100644
> > > --- a/lib/libalpm/util.h
> > > +++ b/lib/libalpm/util.h
> > > @@ -141,6 +141,7 @@ int _alpm_raw_cmp(const char *first, const char *second);
> > > int _alpm_raw_ncmp(const char *first, const char *second, size_t max);
> > > int _alpm_access(alpm_handle_t *handle, const char *dir, const char *file, int amode);
> > > int _alpm_fnmatch(const void *pattern, const void *string);
> > > +int _alpm_regex(const void *pattern, const void *string);
> > >
> > > #ifndef HAVE_STRSEP
> > > char *strsep(char **, const char *);
> > > --
> > > 1.8.3
> > >
>
> Regards,
> Patrick
>
More information about the pacman-dev
mailing list