[arch-dev-public] Fwd: Re: Fwd: Perl packaging guidelines.
[Forwarding this from Xyne]
This is an area where I have a pretty solid experience, as as perl dev, arch user, and maintainer of several perl packages in extra and community. I tend to agree with Allan here. We should only package and take into consideration what on CPAN is called a distribution, i.e. a tarball containing a bunch or related perl modules. The mapping between modules and distributions is available in a plain text database on each CPAN mirror, and can be figured out by the common tools used to install cpan stuff from the command-line (cpan and cpanp). I think it is quite a BAD idea to put all module names in the provides array, as this can easily yield hundreds of elements without obvious advantage I can think of. Our poor lil Pacman has better things to do than take that overload into account.
Is the overhead even significant? How expensive are PROVIDES lookups? (Coincidentally, I've thought for a while now that there should be a single PROVIDES list for the local database to avoid opening hundreds of files unnecessarily, even if that is quite fast). What about when distributions change their module roster? This has happened before and will happen again. I see robust dependency resolution as an "obvious advantage". If any important modules ever get moved around (e.g. subsumed into the base distribution from a popular module), then all packages which depend on that module would need to update their depends array. The fact that META.yml files for distributions on CPAN specify modules and not distributions shows that the dependencies are actually the modules themselves. Ignoring upstream convention and Pacman's built-in capabilities to shave a little bit of time off of a PROVIDES lookup doesn't right to me, especially when you factor in the potential dependency breakage, however unlikely that is to affect large distributions.
That said, I do acknowledge Xyne's effort! I have written a very similar tool years ago, which is still available in AUR (perl-cpanplus-pacman), but for which I have alas not dedicated as much effort and commitment as Xyne did with pacpan. My own approach has however inspired another project, called perl-cpanplus-dist-arch (also in AUR), which IMHO is superior to both my own cpan4pacman and Xyne's pacpan. (That said I still use cpan4pacman (together with a few helper shell scripts and the devtools) to maintain my own local repository of 550 CPAN packages, all of which I keep uptodate with relative ease).
I've looked at perl-cpanplus-dist-arch while rewriting the pacpan backend. I found the entire CPANPLUS backend to be overkill for creating Pacman packages. Pacpan uses the same files and gets the same results, but without the CPAN shell and other bells and whistles that have no significance for Pacman packaging. That said, if there are particular features that perl-cpanplus-dist-arch has that pacman lacks, let me know which and I will consider adding them. If that project does show itself to be superior for Pacman packaging then I probably switch the backend over to it, but so far the current backend works as expected and is fairly simple with no external dependencies. Two follow-up questions: Do you at least support the idea of renaming CPAN packages with non-standard names to their standard names? If not, what about including the standard name in the provides array? If pacpan didn't include the individual modules in the arrays but instead mapped then all to their distribution, how would you see the matter then? Regards, Xyne
On Tue, Feb 9, 2010 at 3:32 AM, Firmicus <Firmicus@gmx.net> wrote:
[Forwarding this from Xyne]
This is an area where I have a pretty solid experience, as as perl dev, arch user, and maintainer of several perl packages in extra and community. I tend to agree with Allan here. We should only package and take into consideration what on CPAN is called a distribution, i.e. a tarball containing a bunch or related perl modules. The mapping between modules and distributions is available in a plain text database on each CPAN mirror, and can be figured out by the common tools used to install cpan stuff from the command-line (cpan and cpanp). I think it is quite a BAD idea to put all module names in the provides array, as this can easily yield hundreds of elements without obvious advantage I can think of. Our poor lil Pacman has better things to do than take that overload into account.
Is the overhead even significant? How expensive are PROVIDES lookups? (Coincidentally, I've thought for a while now that there should be a single PROVIDES list for the local database to avoid opening hundreds of files unnecessarily, even if that is quite fast).
What about when distributions change their module roster? This has happened before and will happen again. I see robust dependency resolution as an "obvious advantage". If any important modules ever get moved around (e.g. subsumed into the base distribution from a popular module), then all packages which depend on that module would need to update their depends array.
The fact that META.yml files for distributions on CPAN specify modules and not distributions shows that the dependencies are actually the modules themselves. Ignoring upstream convention and Pacman's built-in capabilities to shave a little bit of time off of a PROVIDES lookup doesn't right to me, especially when you factor in the potential dependency breakage, however unlikely that is to affect large distributions.
That said, I do acknowledge Xyne's effort! I have written a very similar tool years ago, which is still available in AUR (perl-cpanplus-pacman), but for which I have alas not dedicated as much effort and commitment as Xyne did with pacpan. My own approach has however inspired another project, called perl-cpanplus-dist-arch (also in AUR), which IMHO is superior to both my own cpan4pacman and Xyne's pacpan. (That said I still use cpan4pacman (together with a few helper shell scripts and the devtools) to maintain my own local repository of 550 CPAN packages, all of which I keep uptodate with relative ease).
I've looked at perl-cpanplus-dist-arch while rewriting the pacpan backend. I found the entire CPANPLUS backend to be overkill for creating Pacman packages. Pacpan uses the same files and gets the same results, but without the CPAN shell and other bells and whistles that have no significance for Pacman packaging.
That said, if there are particular features that perl-cpanplus-dist-arch has that pacman lacks, let me know which and I will consider adding them. If that project does show itself to be superior for Pacman packaging then I probably switch the backend over to it, but so far the current backend works as expected and is fairly simple with no external dependencies.
Two follow-up questions: Do you at least support the idea of renaming CPAN packages with non-standard names to their standard names?
If not, what about including the standard name in the provides array?
If pacpan didn't include the individual modules in the arrays but instead mapped then all to their distribution, how would you see the matter then?
I have to say that I'm not really up to speed on the perl module stuff, but I can see the benefit of adding the actual CPAN name into the provides array. I have wasted time in the past looking for some "Random::FooBar" type module, only to find that it's in perl-libfoobar or something silly
On Tue, Feb 9, 2010 at 12:26 PM, Aaron Griffin <aaronmgriffin@gmail.com> wrote:
On Tue, Feb 9, 2010 at 3:32 AM, Firmicus <Firmicus@gmx.net> wrote:
[Forwarding this from Xyne]
This is an area where I have a pretty solid experience, as as perl dev, arch user, and maintainer of several perl packages in extra and community. I tend to agree with Allan here. We should only package and take into consideration what on CPAN is called a distribution, i.e. a tarball containing a bunch or related perl modules. The mapping between modules and distributions is available in a plain text database on each CPAN mirror, and can be figured out by the common tools used to install cpan stuff from the command-line (cpan and cpanp). I think it is quite a BAD idea to put all module names in the provides array, as this can easily yield hundreds of elements without obvious advantage I can think of. Our poor lil Pacman has better things to do than take that overload into account.
Is the overhead even significant? How expensive are PROVIDES lookups? (Coincidentally, I've thought for a while now that there should be a single PROVIDES list for the local database to avoid opening hundreds of files unnecessarily, even if that is quite fast).
What about when distributions change their module roster? This has happened before and will happen again. I see robust dependency resolution as an "obvious advantage". If any important modules ever get moved around (e.g. subsumed into the base distribution from a popular module), then all packages which depend on that module would need to update their depends array.
The fact that META.yml files for distributions on CPAN specify modules and not distributions shows that the dependencies are actually the modules themselves. Ignoring upstream convention and Pacman's built-in capabilities to shave a little bit of time off of a PROVIDES lookup doesn't right to me, especially when you factor in the potential dependency breakage, however unlikely that is to affect large distributions.
That said, I do acknowledge Xyne's effort! I have written a very similar tool years ago, which is still available in AUR (perl-cpanplus-pacman), but for which I have alas not dedicated as much effort and commitment as Xyne did with pacpan. My own approach has however inspired another project, called perl-cpanplus-dist-arch (also in AUR), which IMHO is superior to both my own cpan4pacman and Xyne's pacpan. (That said I still use cpan4pacman (together with a few helper shell scripts and the devtools) to maintain my own local repository of 550 CPAN packages, all of which I keep uptodate with relative ease).
I've looked at perl-cpanplus-dist-arch while rewriting the pacpan backend. I found the entire CPANPLUS backend to be overkill for creating Pacman packages. Pacpan uses the same files and gets the same results, but without the CPAN shell and other bells and whistles that have no significance for Pacman packaging.
That said, if there are particular features that perl-cpanplus-dist-arch has that pacman lacks, let me know which and I will consider adding them. If that project does show itself to be superior for Pacman packaging then I probably switch the backend over to it, but so far the current backend works as expected and is fairly simple with no external dependencies.
Two follow-up questions: Do you at least support the idea of renaming CPAN packages with non-standard names to their standard names?
If not, what about including the standard name in the provides array?
If pacpan didn't include the individual modules in the arrays but instead mapped then all to their distribution, how would you see the matter then?
I have to say that I'm not really up to speed on the perl module stuff, but I can see the benefit of adding the actual CPAN name into the provides array. I have wasted time in the past looking for some "Random::FooBar" type module, only to find that it's in perl-libfoobar or something silly
provides are searched by -Ss as well, so I also agree here. -Dan
On 09/02/2010 19:34, Dan McGee wrote:
On Tue, Feb 9, 2010 at 12:26 PM, Aaron Griffin<aaronmgriffin@gmail.com> wrote:
On Tue, Feb 9, 2010 at 3:32 AM, Firmicus<Firmicus@gmx.net> wrote:
[Forwarding this from Xyne]
This is an area where I have a pretty solid experience, as as perl dev, arch user, and maintainer of several perl packages in extra and community. I tend to agree with Allan here. We should only package and take into consideration what on CPAN is called a distribution, i.e. a tarball containing a bunch or related perl modules. The mapping between modules and distributions is available in a plain text database on each CPAN mirror, and can be figured out by the common tools used to install cpan stuff from the command-line (cpan and cpanp). I think it is quite a BAD idea to put all module names in the provides array, as this can easily yield hundreds of elements without obvious advantage I can think of. Our poor lil Pacman has better things to do than take that overload into account.
Is the overhead even significant? How expensive are PROVIDES lookups? (Coincidentally, I've thought for a while now that there should be a single PROVIDES list for the local database to avoid opening hundreds of files unnecessarily, even if that is quite fast).
What about when distributions change their module roster? This has happened before and will happen again. I see robust dependency resolution as an "obvious advantage". If any important modules ever get moved around (e.g. subsumed into the base distribution from a popular module), then all packages which depend on that module would need to update their depends array.
Yes, but see further below.
The fact that META.yml files for distributions on CPAN specify modules and not distributions shows that the dependencies are actually the modules themselves. Ignoring upstream convention and Pacman's built-in capabilities to shave a little bit of time off of a PROVIDES lookup doesn't right to me, especially when you factor in the potential dependency breakage, however unlikely that is to affect large distributions.
This makes sense from a developer point of view. What you use are the modules, not the tarballs within which they were distributed. But from the point of view of the creators of third-party packages (or the "vendors", to adopt perl terminology), the modules are not really relevant. The user should use search.cpan.org to find out the distribution which a particular module belongs to (this is obvious in well over 95% of cases though). This is immediately translatable to a package name: "Some-Silly-Name" becomes perl-some-silly-name. (There are a few exceptions, especially when "perl" is part of the name of the distribution, then "perl" is not prefixed, as with glade-perl, modperl, etc.).
I've looked at perl-cpanplus-dist-arch while rewriting the pacpan backend. I found the entire CPANPLUS backend to be overkill for creating Pacman packages. Pacpan uses the same files and gets the same results, but without the CPAN shell and other bells and whistles that have no significance for Pacman packaging.
I agree. Yes, but the API is already there, it is part of the core perl package, you can use it for free :)
That said, if there are particular features that perl-cpanplus-dist-arch has that pacman lacks, let me know which and I will consider adding them. If that project does show itself to be superior for Pacman packaging then I probably switch the backend over to it, but so far the current backend works as expected and is fairly simple with no external dependencies.
That was not the meaning of my comment, as I meant the resulting PKGBUILD, not the backend. But it was based on a rather superficial look at both tools, and perl-cpanplus-dist-arch appealed more to me. I don't use it though. Since you wrote your own backend from scratch, you should continue using it. I am sure it works fine with minimal coding in comparision with CPANPLUS, which is a bit bloated I agree.
Two follow-up questions: Do you at least support the idea of renaming CPAN packages with non-standard names to their standard names?
Example?
If not, what about including the standard name in the provides array?
Makes sense.
If pacpan didn't include the individual modules in the arrays but instead mapped then all to their distribution, how would you see the matter then?
Not sure I understand...
I have to say that I'm not really up to speed on the perl module stuff, but I can see the benefit of adding the actual CPAN name into the provides array. I have wasted time in the past looking for some "Random::FooBar" type module, only to find that it's in perl-libfoobar or something silly
The Arch packages are always named after the name of the distribution (the tarball), not the module(s) they contain. Besides, it is very easy to find which distribution a particular module belongs to by using search.cpan.org :)
BTW for my personal cpan repo and for the perl stuff I maintain for extra and community, I have adopted this convention in the PKGBUILD: name=perl-foo-bar _cpanname=Foo-Bar which is quite helpful. Other packagers use _realname instead. Back to the issue of the provides array: I do see the advantage. My point is that in many cases, including ALL modules in there leads to things like this: http://aur.archlinux.org/packages/perl-kiokudb/perl-kiokudb/PKGBUILD which looks insane to me! And do we really want the PKGBUILD of perl-datetime-timezone to provide all modules listed here: http://search.cpan.org/~drolsky/DateTime-TimeZone-1.10/ ? In comparision the traditional approach is cleaner, for instance here: http://aur.archlinux.org/packages/perl-catalyst-runtime/perl-catalyst-runtim... (for which pacpan would have created a very long string of modules in the provides array). Note that in the above two examples all dependencies are versioned, which makes Allan fume :) More often than not, this introduces needless problems, so yes, Allan is right on this. OTOH there are many "bleeding-edge" modules on CPAN than do require very recent versions of other modules to work properly, so there would be a clear downside in getting rid of them... Perhaps I am just being too conservative... I do understand Xyne's point: if the module Catalyst::Foo::Bar in the distribution Catalyst-FooStuff which is packaged as perl-catalyst-foostuff were eventually to become part of perl-catalyst-runtime, then having it in the provides array would indeed be of some help. But in real life such situations occur very rarely. And this extra metadata in the PKGBUILD is convenient mainly for the ideal situation where everything is fully automated, which is not very realistic, as errors also creep in the CPAN metadata and human beings still have to fix those things manually. The human packager should review the generated PKGBUILDs anyway. F
On 10/02/10 07:11, Firmicus wrote:
Back to the issue of the provides array: I do see the advantage. My point is that in many cases, including ALL modules in there leads to things like this: http://aur.archlinux.org/packages/perl-kiokudb/perl-kiokudb/PKGBUILD which looks insane to me! And do we really want the PKGBUILD of perl-datetime-timezone to provide all modules listed here: http://search.cpan.org/~drolsky/DateTime-TimeZone-1.10/ ?
In comparision the traditional approach is cleaner, for instance here: http://aur.archlinux.org/packages/perl-catalyst-runtime/perl-catalyst-runtim...
(for which pacpan would have created a very long string of modules in the provides array).
This is my main concern with this provides stuff. The perl provides array is already ridiculous at 112 packages, but this will increase it to a stonking great 445. With discussions like this, I always wonder if we even have someone prepared to update the 300+ perl packages in the repos to a new standard. If not, then there is little point discussing this further as nothing will actually get done... Allan
participants (4)
-
Aaron Griffin
-
Allan McRae
-
Dan McGee
-
Firmicus