[arch-dev-public] Fwd: Re: Fwd: Perl packaging guidelines.

Tue Feb 9 16:11:07 EST 2010

On 09/02/2010 19:34, Dan McGee wrote:
> On Tue, Feb 9, 2010 at 12:26 PM, Aaron Griffin<aaronmgriffin at gmail.com>  wrote:
>    
>> On Tue, Feb 9, 2010 at 3:32 AM, Firmicus<Firmicus at gmx.net>  wrote:
>>      
>>> [Forwarding this from Xyne]
>>>
>>>        
>>>>   This is an area where I have a pretty solid experience, as as perl dev,
>>>>   arch user, and maintainer of several perl packages in extra and
>>>>   community. I tend to agree with Allan here. We should only package and
>>>>   take into consideration what on CPAN is called a distribution, i.e. a
>>>>   tarball containing a bunch or related perl modules. The mapping between
>>>>   modules and distributions is available in a plain text database on each
>>>>   CPAN mirror, and can be figured out by the common tools used to install
>>>>   cpan stuff from the command-line (cpan and cpanp). I think it is quite a
>>>>   BAD idea to put all module names in the provides array, as this can
>>>>   easily yield hundreds of elements without obvious advantage I can think
>>>>   of. Our poor lil Pacman has better things to do than take that overload
>>>>   into account.
>>>>          
>>>
>>> Is the overhead even significant? How expensive are PROVIDES lookups?
>>> (Coincidentally, I've thought for a while now that there should be a
>>> single PROVIDES list for the local database to avoid opening hundreds
>>> of files unnecessarily, even if that is quite fast).
>>>
>>> What about when distributions change their module roster? This has
>>> happened before and will happen again. I see robust dependency
>>> resolution as an "obvious advantage". If any important modules ever get
>>> moved around (e.g. subsumed into the base distribution from a popular
>>> module), then all packages which depend on that module would need to
>>> update their depends array.
>>>        
Yes, but see further below.

>>> The fact that META.yml files for distributions on CPAN specify modules
>>> and not distributions shows that the dependencies are actually the
>>> modules themselves. Ignoring upstream convention and Pacman's built-in
>>> capabilities to shave a little bit of time off of a PROVIDES lookup
>>> doesn't right to me, especially when you factor in the potential
>>> dependency breakage, however unlikely that is to affect large
>>> distributions.
>>>        
This makes sense from a developer point of view. What you use are the 
modules, not the tarballs within which they were distributed. But from 
the point of view of the creators of third-party packages (or the 
"vendors", to adopt perl terminology), the modules are not really 
relevant. The user should use search.cpan.org to find out the 
distribution which a particular module belongs to (this is obvious in 
well over 95% of cases though). This is immediately translatable to a 
package name: "Some-Silly-Name" becomes perl-some-silly-name. (There are 
a few exceptions, especially when "perl" is part of the name of the 
distribution, then "perl" is not prefixed, as with glade-perl, modperl, 
etc.).

>>> I've looked at perl-cpanplus-dist-arch while rewriting the pacpan
>>> backend. I found the entire CPANPLUS backend to be overkill for
>>> creating Pacman packages. Pacpan uses the same files and gets the same
>>> results, but without the CPAN shell and other bells and whistles that
>>> have no significance for Pacman packaging.
>>>        
I agree. Yes, but the API is already there, it is part of the core perl 
package, you can use it for free :)

>>> That said, if there are particular features that
>>> perl-cpanplus-dist-arch has that pacman lacks, let me know which and I
>>> will consider adding them. If that project does show itself to be
>>> superior for Pacman packaging then I probably switch the backend over
>>> to it, but so far the current backend works as expected and is fairly
>>> simple with no external dependencies.
>>>
>>>        
That was not the meaning of my comment, as I meant the resulting 
PKGBUILD, not the backend. But it was based on a rather superficial look 
at both tools, and perl-cpanplus-dist-arch appealed more to me. I don't 
use it though. Since you wrote your own backend from scratch, you should 
continue using it. I am sure it works fine with minimal coding in 
comparision with CPANPLUS, which is a bit bloated I agree.

>>> Two follow-up questions:
>>> Do you at least support the idea of renaming CPAN packages with
>>> non-standard names to their standard names?
>>>        
Example?
>>> If not, what about including the standard name in the provides array?
>>>        
Makes sense.
>>> If pacpan didn't include the individual modules in the arrays but
>>> instead mapped then all to their distribution, how would you see the
>>> matter then?
>>>        
Not sure I understand...

>> I have to say that I'm not really up to speed on the perl module
>> stuff, but I can see the benefit of adding the actual CPAN name into
>> the provides array. I have wasted time in the past looking for some
>> "Random::FooBar" type module, only to find that it's in perl-libfoobar
>> or something silly
>>      
The Arch packages are always named after the name of the distribution 
(the tarball), not the module(s) they contain. Besides, it is very easy 
to find which distribution a particular module belongs to by using 
search.cpan.org :)

BTW for my personal cpan repo and for the perl stuff I maintain for 
extra and community, I have adopted this convention in the PKGBUILD:
     name=perl-foo-bar
     _cpanname=Foo-Bar
which is quite helpful. Other packagers use _realname instead.

Back to the issue of the provides array: I do see the advantage. My 
point is that in many cases, including ALL modules in there leads to 
things like this:
http://aur.archlinux.org/packages/perl-kiokudb/perl-kiokudb/PKGBUILD
which looks insane to me! And do we really want the PKGBUILD of 
perl-datetime-timezone to provide all modules listed here: 
http://search.cpan.org/~drolsky/DateTime-TimeZone-1.10/ ?

In comparision the traditional approach is cleaner, for instance here:
http://aur.archlinux.org/packages/perl-catalyst-runtime/perl-catalyst-runtime/PKGBUILD
(for which pacpan would have created a very long string of modules in 
the provides array).

Note that in the above two examples all dependencies are versioned, 
which makes Allan fume :) More often than not, this introduces needless 
problems, so yes, Allan is right on this. OTOH there are many 
"bleeding-edge" modules on CPAN than do require very recent versions of 
other modules to work properly, so there would be a clear downside in 
getting rid of them...

Perhaps I am just being too conservative... I do understand Xyne's 
point: if the module Catalyst::Foo::Bar in the distribution 
Catalyst-FooStuff which is packaged as perl-catalyst-foostuff were 
eventually to become part of perl-catalyst-runtime, then having it in 
the provides array would indeed be of some help. But in real life such 
situations occur very rarely. And this extra metadata in the PKGBUILD is 
convenient mainly for the ideal situation where everything is fully 
automated, which is not very realistic, as errors also creep in the CPAN 
metadata and human beings still have to fix those things manually. The 
human packager should review the generated PKGBUILDs anyway.

F