[pacman-dev] [PATCH] Enable Perl regular expressions in NoExtract.

Andrew Gregory andrew.gregory.8 at gmail.com
Wed Jun 5 02:16:22 EDT 2013

On 06/05/13 at 02:35pm, Allan McRae wrote:
> On 05/06/13 02:34, Dave Reisner wrote:
> > On Tue, Jun 04, 2013 at 06:24:03PM +0200, Patrick Steinhardt wrote:
> >> On Tue, Jun 04, 2013 at 11:46:50AM -0400, Dave Reisner wrote:
> >>> On Tue, Jun 04, 2013 at 05:03:47PM +0200, Patrick Steinhardt wrote:
> >>>> Until now it was not easily possible to remove all files but some
> >>>> in a given directory. By enabling Perl regular expressions for
> >>>> NoExtract this is now made possible through negative lookahead.
> >>>>
> >>>> Fixes FS#31749.
> >>>> ---
> >>>>
> >>>> This patch is work in progress. I want to check if there is
> >>>> interest in having this feature available in pacman and how the
> >>>> extra dependency on libpcre is received. If it is well received I
> >>>> might still do some rework for the actual matching of NoExtract
> >>>> items, as currently I'm always recompiling the pattern for each
> >>>> match.
> >>>
> >>> I've considered moving to PCRE, but I'm curious:
> >>>
> >>> 1) Why do you need this for NoExtract? Please be *specific*
> >>> 2) What is insufficient about POSIX regex? (#include <regex.h>)
> >>
> >> My usecase is specifically the directory /usr/share/locale. As I
> >> only need two files/directories (locale.alias and en_US) in there
> >> I'd need to specify a list of a whopping 106 entries that
> >> shouldn't be extracted.
> >>
> >> POSIX regular expressions do not support negative lookarounds.
> >> Thus I might add entries like usr/share/locale{af,am,...} but it
> >> is not possible to remove everything but those two entries. With
> >> Perl regular expressions this is as easy as
> >> usr/share/locale/((?!en_US|locale.alias).+).
> >>
> >> I'm sure there are other usecases where POSIX regular expressions
> >> may be insufficient but can't currently think about any other. In
> >> generl PCRE is in the core repo, widely used and very powerful,
> >> so I don't see any reason to not use it.
> > 
> > I think a major hurdle here is going to be that we've already
> > standardized on fnmatch() as the matcher for not only NoExtract, but
> > also IgnorePkg, IgnoreGroup, HoldPkg, and NoUpgrade. Moving to regex
> > based matching flat out breaks expectations since a valid glob will
> > *not* match the same data when interpreted as a regex, and it might not
> > even be a valid regex (consider "*foo").
> > 
> > If you want to propose such a move, you either need a migration path
> > (unlikely), or some sort of flag to determine what style of matching is
> > performed on these config options, something like:
> > 
> > MatchStyle = (Glob|Regex)
> > 
> > Along with the necessary documentation. I'm not sure this is a road I
> > want to go down.
> I am not sure what I think about this.  So here are a collection of
> thoughts:
> 1) Being able to ignore e.g. all locales apart from the one you want
> would be a good thing.
> 2) PCRE is a widely used library and I doubt many Linux distributions do
> not have it installed (and have grep linked to it)
> 3) This would need to be entirely optional at configure time, much like
> gpgme is.
> 4) I really do not want another configuration option for this.  It would
> have to be glob matches when built without pcre and regex when built
> with it.   But then how would the user know which pacman is using?
> 5) The upgrade path is particularly important given pacman is mostly
> (only) used on rolling release distros.  But would the distribution
> adding a note in pacman.conf that the user will merge be enough?
> So...  that has me leaning towards this being a good idea.
> @Dave: would suggesting the distros handle the "migration path" be fine
> with you?  I'm not sure wildcards in any of those configuration options
> are widely used.
> Allan

I tend to think that the particular problem at issue here would be
better solved by a negation operator ala gitignore:

 NoExtract = usr/share/locale/* !usr/share/locale/en_US/*

If we do add pcre support, could we include the type of pattern being
used in the pattern itself?  Something like:

 NoExtract = glob::usr/share/locale/*
 NoExtract = pcre::usr/share/locale/(?!en_US).*

That way there would be no ambiguity about which type of pattern is
being used and we could default to a glob if no type is specified,
allowing existing configs to continue to work.


More information about the pacman-dev mailing list