[pacman-dev] [PATCH] Enable Perl regular expressions in NoExtract.

Allan McRae allan at archlinux.org
Wed Jun 5 08:28:25 EDT 2013

On 05/06/13 16:16, Andrew Gregory wrote:
> On 06/05/13 at 02:35pm, Allan McRae wrote:
>> On 05/06/13 02:34, Dave Reisner wrote:
>>> On Tue, Jun 04, 2013 at 06:24:03PM +0200, Patrick Steinhardt wrote:
>>>> On Tue, Jun 04, 2013 at 11:46:50AM -0400, Dave Reisner wrote:
>>>>> On Tue, Jun 04, 2013 at 05:03:47PM +0200, Patrick Steinhardt wrote:
>>>>>> Until now it was not easily possible to remove all files but some
>>>>>> in a given directory. By enabling Perl regular expressions for
>>>>>> NoExtract this is now made possible through negative lookahead.
>>>>>> Fixes FS#31749.
>>>>>> ---
>>>>>> This patch is work in progress. I want to check if there is
>>>>>> interest in having this feature available in pacman and how the
>>>>>> extra dependency on libpcre is received. If it is well received I
>>>>>> might still do some rework for the actual matching of NoExtract
>>>>>> items, as currently I'm always recompiling the pattern for each
>>>>>> match.
>>>>> I've considered moving to PCRE, but I'm curious:
>>>>> 1) Why do you need this for NoExtract? Please be *specific*
>>>>> 2) What is insufficient about POSIX regex? (#include <regex.h>)
>>>> My usecase is specifically the directory /usr/share/locale. As I
>>>> only need two files/directories (locale.alias and en_US) in there
>>>> I'd need to specify a list of a whopping 106 entries that
>>>> shouldn't be extracted.
>>>> POSIX regular expressions do not support negative lookarounds.
>>>> Thus I might add entries like usr/share/locale{af,am,...} but it
>>>> is not possible to remove everything but those two entries. With
>>>> Perl regular expressions this is as easy as
>>>> usr/share/locale/((?!en_US|locale.alias).+).
>>>> I'm sure there are other usecases where POSIX regular expressions
>>>> may be insufficient but can't currently think about any other. In
>>>> generl PCRE is in the core repo, widely used and very powerful,
>>>> so I don't see any reason to not use it.
>>> I think a major hurdle here is going to be that we've already
>>> standardized on fnmatch() as the matcher for not only NoExtract, but
>>> also IgnorePkg, IgnoreGroup, HoldPkg, and NoUpgrade. Moving to regex
>>> based matching flat out breaks expectations since a valid glob will
>>> *not* match the same data when interpreted as a regex, and it might not
>>> even be a valid regex (consider "*foo").
>>> If you want to propose such a move, you either need a migration path
>>> (unlikely), or some sort of flag to determine what style of matching is
>>> performed on these config options, something like:
>>> MatchStyle = (Glob|Regex)
>>> Along with the necessary documentation. I'm not sure this is a road I
>>> want to go down.
>> I am not sure what I think about this.  So here are a collection of
>> thoughts:
>> 1) Being able to ignore e.g. all locales apart from the one you want
>> would be a good thing.
>> 2) PCRE is a widely used library and I doubt many Linux distributions do
>> not have it installed (and have grep linked to it)
>> 3) This would need to be entirely optional at configure time, much like
>> gpgme is.
>> 4) I really do not want another configuration option for this.  It would
>> have to be glob matches when built without pcre and regex when built
>> with it.   But then how would the user know which pacman is using?
>> 5) The upgrade path is particularly important given pacman is mostly
>> (only) used on rolling release distros.  But would the distribution
>> adding a note in pacman.conf that the user will merge be enough?
>> So...  that has me leaning towards this being a good idea.
>> @Dave: would suggesting the distros handle the "migration path" be fine
>> with you?  I'm not sure wildcards in any of those configuration options
>> are widely used.
>> Allan
> I tend to think that the particular problem at issue here would be
> better solved by a negation operator ala gitignore:
>  NoExtract = usr/share/locale/* !usr/share/locale/en_US/*

I like this idea.

> If we do add pcre support, could we include the type of pattern being
> used in the pattern itself?  Something like:
>  NoExtract = glob::usr/share/locale/*
>  NoExtract = pcre::usr/share/locale/(?!en_US).*

I don't like this idea.

More information about the pacman-dev mailing list