[pacman-dev] code reuse for PKGBUILD

Thu Dec 6 18:33:06 EST 2012

On 7 December 2012 09:30, LANGLOIS Olivier PIS -EXT <
olivier.pis.langlois at transport.alstom.com> wrote:

> Kent,
>
> 1. Avoiding dependency issues in a system that manage dependencies is a
> good thing. With your suggestion, you would eventually end up with PKGBUILD
> not immediately usable because you need to update your reusable perl module
>
>
Indeed, but this is why its best to have this reusable "module" ( not a
perl module, just bash really ) distributed via some more authoritative
measure, so that although it will update periodically, people who write
code to use these 'modules' can rely on them existing.

On gentoo, this guarantee is given by the fact that these modules reside in
the main repository tree, so for official packages, all thats required is
to make sure the requisite "modules" ( we call them 'eclasses' but they're
basically just specialised bash scripts ) are committed to tree .

This way, although there is a degree of dependency, its about as moot a
point as PKGBUILDS having patches ship with them which are also
"Dependencies".

>
> 2. "no code reuse" also implicitly enforce KISS
>

I've seen anything but KISS in the PKGBUILDS I'm seeing.  Many of them have
cargoculting all over them.

And there is also the concept that complexity is immutable, you can only
manage it by moving it around, you cant eliminate it entirely.

And with perl modules in particular, packaging them has many forms of
unavoidable complexities .

The fact upstream depend on *modules* , not packages, and packages have
multiple modules, creates an entire host of problems. ( And this is
especially bad when you compound that with packages depending on a specific
version of a module, and the package that module of that version being
shipped in not being correlated in any way to the package version itself )

And then you have package splits/merges where the package a module is
shipped in is vulnerable to change over time.

These 2 problems are at present somewhat unsolveable in realsitic ways.

There are however  a few cases of things we can easily solve, and many of
the boiler plate code copied into each and every PKGBUILD does just this.

ie: Upstream sometimes opt to change their distribution toolkit, and might
change from EUMM to MB without any obvious other changes. Its quite easy to
codify behaviour to handle this case, so you don't have to worry about this.

But this is starting to get distracted.

The benefits from this are that:

1. Maintaining a module often requires you to change many fields
2. The more fields you have to change, the more likely you'll make a
mistake and something will go wrong.
3. Even the recommended behaviours in the Perl Guide for Arch are pretty
arbitrary and confusing , and require that every maintainer of every perl
module pay very close attention to the guide to keep their  installation to
"best practices", ie:
    - having to make sure they install to vendor
    - having to make sure they set the right ENV values so that EUMM won't
explode if test-deps aren't satisified during build
    - knowing what toolkit they're using and what the right invocations are
for that toolkit

And some of these are "Policy based", ie: what happens if 6 months from
now, whoever is head of Arch perl stuff decides their "install to vendor"
behaviour is wrong, instead of just changing the install mechanic for *all*
perl module installations and being done with it, they have to
 a. Document the change
 b. Inform developers of the change
 c. Wait 12 - 36 months for the changes to propagate ...

In essence, the benefits from having a reusable shareable blob of managed
code are real, for exactly the same reason that "using a library" is a
superior choice to "blindly copy paste the code" in *every programming
language I know of* .

While the negatives of this are hard to quantify, and are possibly only *
perceived* negatives, not *actual* negatives.

ie: We do exactly this in Gentoo, we do it all the time, we leverage it
extensively. All the paranoid rationale for why it is bad : Never happens.

> While I love Perl for some tasks such as processing nm output to create
> automatically bps in gdb, I feel that some languages weren't meant for
> reusability, with all respect for Perl, I believe that it is one them.
>
> My past experience for having work a year in a Perl shop is that, give
> just a couple of months to a reusable Perl module and it will at some point
> use OO Perl with Moose that needs a special module to highjack the import
> statement so that lookups in a serie of yaml files are performed to build
> dynamically the module search path env for the hundreds of required Perl
> module and the beast will grow to several hundred of MBs during runtime and
> will become slow like a camel with a moose head. It will become so complex
> that no mere mortals will understand what is going on, when the thing
> breaks.
>
>
I'm not sure what you're trying to say here, if you're saying "In perl, you
shouldn't use modules, you should just inline all the code", then I'll have
to politely disagree.

If you inline the code, then *you* have to maintain all the bugs in that
code, and *all code has bugs hiding in it*. All you achieve by inlining it,
is that when upstream discover the bugs, and fix them, your code will still
be vulnerable!

And this is basically the foundation of this suggestion, provide a way to
optimise the usecases for various classes of problems, to streamline
development, so that the time for maintaining a package is substantially
reduced.

And consider, from a security perspective, which is safter:

1. Having dozens of PKGBUILDS contributed by users, each with *many* *many*
lines of arbitrary executable code that end users will have neither
time, knowledge, or patience to manually vet

or

2. Equally many PKGBUILDS contributed by many users, but most of them have
little to no exectuble code to review, because it simply uses default
behaviours borrowed from libraries published by a well trusted authority,
and these libraries have each been reviewed thousands of times by people
who know what they're doing.

Or, to put simply, which is easier to review for security risks?

https://gist.github.com/4229354  # where
"/usr/lib/pacman/extensions/perl-module.shlib" is produced by a trusted
authority ... or

https://gist.github.com/4229359   # this big mess of code which could
easily hide any number of security flaws.

As it is , a large amount of people just upload stuff made by generation
tools, but it would be so easy to put in bad code hiding amongst that ,
exploiting peoples sense of security in the assumption that the generated
code was safe, but hiding a nasty 'sudo rm -rf /'  in one of the phases.

By using the library, and removing the need for these blocks of code, you
reduce the number of places bad code can lurk.

And somebody reviewing the first of those 2 github links can *quickly*
divine the lack of any nefarious code.