[pacman-dev] Better handling for domain-specific packages (rubygems, cpan, ...)

Wed Jun 26 17:26:56 EDT 2013

Hi pacman&arch devs

Pacman like many other linux package managers are created with c
projects in mind. Those projects usually release sources as tarball.
It is expected that user will install dependencies and compile
sources. Pacman takes those responsibilities and handles it very well.

The situation a little bit different with packages for language
ecosystems that have their own language-specific package manager (let
me call it domain-specific package managers). Those package managers
also aim to install system libraries/binaries (very similar to what
pacman does). Let me give you a few examples of such package managers:

perl - cpan
js/nodejs - npm
ruby - rubygems
haskell - cabal
.....

I played/changed a bunch of Arch packages recently and every time I
modified/updated a package for perl-* or nodejs-* I felt like I am
wasting my time. Think about it. Those CPAN/rubygems/... packages
already have all required information: author name, license, latest
released version, dependencies, postinstall instructions, instruction
how to run tests,.... It means if one adds a perl-* package he has to
copy/paste a bunch of info from CPAN package info.

Could this situation be improved? Could Arch generate packages
automatically or semiautomatically from domain-specific package?
Something like *_PACKAGE macro for PKGBUILD:

$ cat PKGBUILD
# maintainer -- foo at bar.com
CPAN_PACKAGE('net-ssl')

then when maintainer runs "makepkg --regenerate" (or something like
this) the tool will:
 - download CPAN index
 - check if index has newer package than we have locally
 - extract all package information from CPAN package
 - add/update that info below CPAN_PACKAGE macro

The reasons why automatic generation/update is better than manually
tracking packages:
 - less chance for errors, e.g. tool will never forget to add a
correct dependency and tests
 - saves maintainer time. It is better if maintainers work on a real
bugs rather that waste time on updating package fields.
 - more consistent packages. They will use the same
compilation/installation flags.
 - easier to make large changes, e.g. haskell packages need "!strip"
option - now we can change macro itself instead of manually changing
PKGBUILD files one by one.

What do you think?

There are a few questions that should be resolved:
 - There should be a way to customize generated packages. e.g. some
domain packages require 'native' packages as a dependency.
 - Some packages can be distributed as a domain package and as a
source tarball. It might cause a confusion. An example - many
javascript packages were tarballs, but recently a lot of them moved to
npm (node-js package manager). We probably want to stick with domain
package.
 - Naming convention. Generated domain packages should be named as
'PREFIX-DOMAINPKGNAME' e.g. perl-net-ssl. Non-domain package cannot
use that prefix.

In fact Arch can go even further. For example import *all* domain
packages and automatically create PKGBUILD files for them. Would it be
cool to disable CPAN/rubygems/... and install everything via pacman?
In fact having the same software installed via different package
managers (e.g. pacman & CPAN) can cause a confusion like this [1].

[1] https://mailman.archlinux.org/pipermail/arch-general/2013-May/033599.html