[pacman-dev] [PATCH] Enable Perl regular expressions in NoExtract.

Patrick Steinhardt steinhardt.ptk at gmail.com
Wed Jun 5 09:10:50 EDT 2013


On Wed, Jun 05, 2013 at 10:28:25PM +1000, Allan McRae wrote:
> On 05/06/13 16:16, Andrew Gregory wrote:
> > On 06/05/13 at 02:35pm, Allan McRae wrote:
> >> On 05/06/13 02:34, Dave Reisner wrote:
> >>> On Tue, Jun 04, 2013 at 06:24:03PM +0200, Patrick Steinhardt wrote:
> >>>> On Tue, Jun 04, 2013 at 11:46:50AM -0400, Dave Reisner wrote:
> >>>>> On Tue, Jun 04, 2013 at 05:03:47PM +0200, Patrick Steinhardt wrote:
> >>>>>> Until now it was not easily possible to remove all files but some
> >>>>>> in a given directory. By enabling Perl regular expressions for
> >>>>>> NoExtract this is now made possible through negative lookahead.
> >>>>>>
> >>>>>> Fixes FS#31749.
> >>>>>> ---
> >>>>>>
> >>>>>> This patch is work in progress. I want to check if there is
> >>>>>> interest in having this feature available in pacman and how the
> >>>>>> extra dependency on libpcre is received. If it is well received I
> >>>>>> might still do some rework for the actual matching of NoExtract
> >>>>>> items, as currently I'm always recompiling the pattern for each
> >>>>>> match.
> >>>>>
> >>>>> I've considered moving to PCRE, but I'm curious:
> >>>>>
> >>>>> 1) Why do you need this for NoExtract? Please be *specific*
> >>>>> 2) What is insufficient about POSIX regex? (#include <regex.h>)
> >>>>
> >>>> My usecase is specifically the directory /usr/share/locale. As I
> >>>> only need two files/directories (locale.alias and en_US) in there
> >>>> I'd need to specify a list of a whopping 106 entries that
> >>>> shouldn't be extracted.
> >>>>
> >>>> POSIX regular expressions do not support negative lookarounds.
> >>>> Thus I might add entries like usr/share/locale{af,am,...} but it
> >>>> is not possible to remove everything but those two entries. With
> >>>> Perl regular expressions this is as easy as
> >>>> usr/share/locale/((?!en_US|locale.alias).+).
> >>>>
> >>>> I'm sure there are other usecases where POSIX regular expressions
> >>>> may be insufficient but can't currently think about any other. In
> >>>> generl PCRE is in the core repo, widely used and very powerful,
> >>>> so I don't see any reason to not use it.
> >>>
> >>> I think a major hurdle here is going to be that we've already
> >>> standardized on fnmatch() as the matcher for not only NoExtract, but
> >>> also IgnorePkg, IgnoreGroup, HoldPkg, and NoUpgrade. Moving to regex
> >>> based matching flat out breaks expectations since a valid glob will
> >>> *not* match the same data when interpreted as a regex, and it might not
> >>> even be a valid regex (consider "*foo").
> >>>
> >>> If you want to propose such a move, you either need a migration path
> >>> (unlikely), or some sort of flag to determine what style of matching is
> >>> performed on these config options, something like:
> >>>
> >>> MatchStyle = (Glob|Regex)
> >>>
> >>> Along with the necessary documentation. I'm not sure this is a road I
> >>> want to go down.
> >>
> >> I am not sure what I think about this.  So here are a collection of
> >> thoughts:
> >>
> >> 1) Being able to ignore e.g. all locales apart from the one you want
> >> would be a good thing.
> >>
> >> 2) PCRE is a widely used library and I doubt many Linux distributions do
> >> not have it installed (and have grep linked to it)
> >>
> >> 3) This would need to be entirely optional at configure time, much like
> >> gpgme is.
> >>
> >> 4) I really do not want another configuration option for this.  It would
> >> have to be glob matches when built without pcre and regex when built
> >> with it.   But then how would the user know which pacman is using?
> >>
> >> 5) The upgrade path is particularly important given pacman is mostly
> >> (only) used on rolling release distros.  But would the distribution
> >> adding a note in pacman.conf that the user will merge be enough?
> >>
> >> So...  that has me leaning towards this being a good idea.
> >>
> >>
> >> @Dave: would suggesting the distros handle the "migration path" be fine
> >> with you?  I'm not sure wildcards in any of those configuration options
> >> are widely used.
> >>
> >> Allan
> >>
> > 
> > I tend to think that the particular problem at issue here would be
> > better solved by a negation operator ala gitignore:
> > 
> >  NoExtract = usr/share/locale/* !usr/share/locale/en_US/*
> 
> 
> I like this idea.

How would you ignore all but two files in this directory then?

NoExtract = !usr/share/locale/en_US/* !usr/share/locale/locale.alias

One problem that comes to my mind: the first inverse match
already includes locale.alias and as such it would not be
extracted. The second term would exclude locale.alias but include
en_US/*.

Currently we abort as soon as the first expression (whether it is
PCRE or fnmatch) matches. If we do it like that we would need to
always iterate over all entries in NoExtract and check if a later
occurence of the same file exists that overwrites previous
occurences. _If_ a later term matches again it is unclear as to
what to do, as stated above.

> 
> > If we do add pcre support, could we include the type of pattern being
> > used in the pattern itself?  Something like:
> > 
> >  NoExtract = glob::usr/share/locale/*
> >  NoExtract = pcre::usr/share/locale/(?!en_US).*
> 
> I don't like this idea.
> 
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mailman.archlinux.org/pipermail/pacman-dev/attachments/20130605/00a02952/attachment.asc>


More information about the pacman-dev mailing list