[pacman-dev] [PATCH] Enable Perl regular expressions in NoExtract.

Patrick Steinhardt steinhardt.ptk at gmail.com
Tue Jun 4 12:55:02 EDT 2013


On Tue, Jun 04, 2013 at 06:46:01PM +0200, Patrick Steinhardt wrote:
> On Tue, Jun 04, 2013 at 12:34:13PM -0400, Dave Reisner wrote:
> > On Tue, Jun 04, 2013 at 06:24:03PM +0200, Patrick Steinhardt wrote:
> > > On Tue, Jun 04, 2013 at 11:46:50AM -0400, Dave Reisner wrote:
> > > > On Tue, Jun 04, 2013 at 05:03:47PM +0200, Patrick Steinhardt wrote:
> > > > > Until now it was not easily possible to remove all files but some
> > > > > in a given directory. By enabling Perl regular expressions for
> > > > > NoExtract this is now made possible through negative lookahead.
> > > > > 
> > > > > Fixes FS#31749.
> > > > > ---
> > > > > 
> > > > > This patch is work in progress. I want to check if there is
> > > > > interest in having this feature available in pacman and how the
> > > > > extra dependency on libpcre is received. If it is well received I
> > > > > might still do some rework for the actual matching of NoExtract
> > > > > items, as currently I'm always recompiling the pattern for each
> > > > > match.
> > > > 
> > > > I've considered moving to PCRE, but I'm curious:
> > > > 
> > > > 1) Why do you need this for NoExtract? Please be *specific*
> > > > 2) What is insufficient about POSIX regex? (#include <regex.h>)
> > > 
> > > My usecase is specifically the directory /usr/share/locale. As I
> > > only need two files/directories (locale.alias and en_US) in there
> > > I'd need to specify a list of a whopping 106 entries that
> > > shouldn't be extracted.
> > > 
> > > POSIX regular expressions do not support negative lookarounds.
> > > Thus I might add entries like usr/share/locale{af,am,...} but it
> > > is not possible to remove everything but those two entries. With
> > > Perl regular expressions this is as easy as
> > > usr/share/locale/((?!en_US|locale.alias).+).
> > > 
> > > I'm sure there are other usecases where POSIX regular expressions
> > > may be insufficient but can't currently think about any other. In
> > > generl PCRE is in the core repo, widely used and very powerful,
> > > so I don't see any reason to not use it.
> > 
> > I think a major hurdle here is going to be that we've already
> > standardized on fnmatch() as the matcher for not only NoExtract, but
> > also IgnorePkg, IgnoreGroup, HoldPkg, and NoUpgrade. Moving to regex
> > based matching flat out breaks expectations since a valid glob will
> > *not* match the same data when interpreted as a regex, and it might not
> > even be a valid regex (consider "*foo").
> > 
> > If you want to propose such a move, you either need a migration path
> > (unlikely), or some sort of flag to determine what style of matching is
> > performed on these config options, something like:
> > 
> > MatchStyle = (Glob|Regex)
> > 
> > Along with the necessary documentation. I'm not sure this is a road I
> > want to go down.
> 
> I think though that feeding PCRE with fnmatch-style globs won't
> produce (much) different results. At least I made sure that
> regexps are anchored (must match right at the beginning) and
> globbing-operator from fnmatch (which has no FNM_PATHNAME set)
> and star-operator from PCRE behave the same. But probably this is
> too short-sighted and there will be loads of problems.

Forget that paragraph. Already forgot about the difference
between star operators in fnmatch and PCRE, where it is a
quantificator in PCRE.

> 
> Why are you opposed to the MatchStyle option? I've thought about
> such a flag as well (as a build flag though, disabling the PCRE
> dependency, even though pacman.conf variables are better suited)
> and can't see much hassle with that.
> 
> If we decide to make that move one should consider changing
> behaviour of the other variables as well to achieve better
> consistency. No use in having similar things act differently.
> 
> I'd be more than willing to implement both, MatchStyle and
> behaviour changes of IgnorePkg, IgnoreGroup, HoldPkg, NoUpgrade.
> 
> > 
> > > 
> > > > 
> > > > >  configure.ac            | 13 ++++++++++++-
> > > > >  lib/libalpm/Makefile.am |  6 ++++--
> > > > >  lib/libalpm/add.c       |  2 +-
> > > > >  lib/libalpm/util.c      | 22 ++++++++++++++++++++++
> > > > >  lib/libalpm/util.h      |  1 +
> > > > >  5 files changed, 40 insertions(+), 4 deletions(-)
> > > > > 
> > > > > diff --git a/configure.ac b/configure.ac
> > > > > index 1ea0fce..29ea101 100644
> > > > > --- a/configure.ac
> > > > > +++ b/configure.ac
> > > > > @@ -230,6 +230,17 @@ if test "x$with_libcurl" != "xno"; then
> > > > >  fi
> > > > >  AM_CONDITIONAL(HAVE_LIBCURL, [test "$have_libcurl" = "yes"])
> > > > >  
> > > > > +# Check for PCRE
> > > > > +have_libpcre=no
> > > > > +if test "x$with_libpcre" != "xno"; then
> > > > > +	PKG_CHECK_MODULES(LIBPCRE, [libpcre],
> > > > > +		[AC_DEFINE(HAVE_LIBPCRE, 1, [Define if libpcre is available]) have_libpcre=yes], have_libpcre=no)
> > > > > +	if test "x$have_libpcre" = xno -a "x$with_libpcre" = xyes; then
> > > > > +		AC_MSG_ERROR([*** libpcre is required for file matching support])
> > > > > +	fi
> > > > > +fi
> > > > > +AM_CONDITIONAL(HAVE_LIBPCRE, [test "$have_libpcre" = "yes"])
> > > > > +
> > > > >  # Check for gpgme
> > > > >  AC_MSG_CHECKING(whether to link with libgpgme)
> > > > >  AS_IF([test "x$with_gpgme" != "xno"],
> > > > > @@ -523,7 +534,7 @@ ${PACKAGE_NAME}:
> > > > >      compiler               : ${CC}
> > > > >      preprocessor flags     : ${CPPFLAGS}
> > > > >      compiler flags         : ${WARNING_CFLAGS} ${CFLAGS}
> > > > > -    library flags          : ${LIBS} ${LIBSSL_LIBS} ${LIBARCHIVE_LIBS} ${LIBCURL_LIBS} ${GPGME_LIBS}
> > > > > +    library flags          : ${LIBS} ${LIBSSL_LIBS} ${LIBPCRE_LIBS} ${LIBARCHIVE_LIBS} ${LIBCURL_LIBS} ${GPGME_LIBS}
> > > > >      linker flags           : ${LDFLAGS}
> > > > >  
> > > > >      Architecture           : ${CARCH}
> > > > > diff --git a/lib/libalpm/Makefile.am b/lib/libalpm/Makefile.am
> > > > > index 171bc46..0f71aed 100644
> > > > > --- a/lib/libalpm/Makefile.am
> > > > > +++ b/lib/libalpm/Makefile.am
> > > > > @@ -75,13 +75,15 @@ libalpm_la_CFLAGS = \
> > > > >  	$(GPGME_CFLAGS) \
> > > > >  	$(LIBARCHIVE_CFLAGS) \
> > > > >  	$(LIBCURL_CFLAGS) \
> > > > > -	$(LIBSSL_CFLAGS)
> > > > > +	$(LIBSSL_CFLAGS) \
> > > > > +	$(LIBPCRE_CFLAGS)
> > > > >  
> > > > >  libalpm_la_LIBADD = \
> > > > >  	$(LTLIBINTL) \
> > > > >  	$(GPGME_LIBS) \
> > > > >  	$(LIBARCHIVE_LIBS) \
> > > > >  	$(LIBCURL_LIBS) \
> > > > > -	$(LIBSSL_LIBS)
> > > > > +	$(LIBSSL_LIBS) \
> > > > > +	$(LIBPCRE_LIBS)
> > > > >  
> > > > >  # vim:set ts=2 sw=2 noet:
> > > > > diff --git a/lib/libalpm/add.c b/lib/libalpm/add.c
> > > > > index c20e7c6..d80dbe4 100644
> > > > > --- a/lib/libalpm/add.c
> > > > > +++ b/lib/libalpm/add.c
> > > > > @@ -183,7 +183,7 @@ static int extract_single_file(alpm_handle_t *handle, struct archive *archive,
> > > > >  	}
> > > > >  
> > > > >  	/* if a file is in NoExtract then we never extract it */
> > > > > -	if(alpm_list_find(handle->noextract, entryname, _alpm_fnmatch)) {
> > > > > +	if(alpm_list_find(handle->noextract, entryname, _alpm_regex)) {
> > > > >  		_alpm_log(handle, ALPM_LOG_DEBUG, "%s is in NoExtract,"
> > > > >  				" skipping extraction of %s\n",
> > > > >  				entryname, filename);
> > > > > diff --git a/lib/libalpm/util.c b/lib/libalpm/util.c
> > > > > index f84fb66..5858d28 100644
> > > > > --- a/lib/libalpm/util.c
> > > > > +++ b/lib/libalpm/util.c
> > > > > @@ -33,6 +33,7 @@
> > > > >  #include <sys/wait.h>
> > > > >  #include <locale.h> /* setlocale */
> > > > >  #include <fnmatch.h>
> > > > > +#include <pcre.h>
> > > > >  
> > > > >  /* libarchive */
> > > > >  #include <archive.h>
> > > > > @@ -1261,6 +1262,27 @@ int _alpm_fnmatch(const void *pattern, const void *string)
> > > > >  	return fnmatch(pattern, string, 0);
> > > > >  }
> > > > >  
> > > > > +/** Checks whether a string matches an extended Perl regular expression.
> > > > > + * @param pattern string to match against
> > > > > + * @param string string to check against pattern
> > > > > + * @return 0 if string matches pattern, non-zero if they don't match and on
> > > > > + * error
> > > > > + */
> > > > > +int _alpm_regex(const void *pattern, const void *string)
> > > > > +{
> > > > > +	const char *error;
> > > > > +	int erroroff;
> > > > > +
> > > > > +	pcre *pcre = pcre_compile(pattern, PCRE_EXTENDED | PCRE_ANCHORED, &error, &erroroff, NULL);
> > > > > +	if (error != NULL) {
> > > > > +		return -1;
> > > > > +	}
> > > > > +
> > > > > +	int ret = pcre_exec(pcre, NULL, string, strlen(string), 0, 0, NULL, 0);
> > > > > +	pcre_free(pcre);
> > > > > +	return ret < 0;
> > > > > +}
> > > > > +
> > > > >  void _alpm_alloc_fail(size_t size)
> > > > >  {
> > > > >  	fprintf(stderr, "alloc failure: could not allocate %zd bytes\n", size);
> > > > > diff --git a/lib/libalpm/util.h b/lib/libalpm/util.h
> > > > > index 56031f3..5ec36b6 100644
> > > > > --- a/lib/libalpm/util.h
> > > > > +++ b/lib/libalpm/util.h
> > > > > @@ -141,6 +141,7 @@ int _alpm_raw_cmp(const char *first, const char *second);
> > > > >  int _alpm_raw_ncmp(const char *first, const char *second, size_t max);
> > > > >  int _alpm_access(alpm_handle_t *handle, const char *dir, const char *file, int amode);
> > > > >  int _alpm_fnmatch(const void *pattern, const void *string);
> > > > > +int _alpm_regex(const void *pattern, const void *string);
> > > > >  
> > > > >  #ifndef HAVE_STRSEP
> > > > >  char *strsep(char **, const char *);
> > > > > -- 
> > > > > 1.8.3
> > > > > 
> > > 
> > > Regards,
> > > Patrick
> > 
> > 
> > 
> > > 
> > 
> > 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mailman.archlinux.org/pipermail/pacman-dev/attachments/20130604/4f3881b4/attachment.asc>


More information about the pacman-dev mailing list