[pacman-dev] [RFC] Package parser in python
Sebastian Nowicki
sebnow at gmail.com
Sun Dec 13 02:59:53 EST 2009
On 12/12/2009, at 10:11 PM, Allan McRae wrote:
> Sebastian Nowicki wrote:
>> As you may have heard, I started a proper PKGBUILD parser[1], which
>> parses according to shell semantics and does a little interpreting.
>> I just released the first version, which doesn't handle errors, or
>> multi-line values (like arrays or escaped newlines) very well. It
>> does however support split packages. I'm in the process of
>> modifying parched to essentially turn it into python bindings[2]
>> for pkgparse.
>> You probably already have a parser at this point, so I'm not sure
>> how useful this would be to you (it might be overkill anyway), I
>> just though I'd let you know.
>> [1]: http://github.com/sebnow/pkgparse
>> [2]: http://github.com/sebnow/parched/tree/pkgparse_pyrex
>
> Looks interesting. I will take it for a spin later. I assume this
> is going towards AUR2?
>
> I had not done any further work on my parser as I was uncertain what
> was the best way to go in developing a makepkg test suite. Given
> the makepkg test suite will use a safe set of PGKBUILDs, I was
> thinking of just using bash to parse them.
That would probably be the simplest and most accurate way (you don't
want bugs in the parser to fail tests). The namcap method is a good
way to go. You could look at the current AUR2 implementation to get
information (it sources the PKGBUILD and spits out python code)[1].
It is indeed for AUR2. I thought that since you were contemplating a
PKGBUILD parser, this might be of use to you.
On 12/12/2009, at 11:44 PM, Xavier wrote:
> I can't help but think this whole situation is stupid.
>
> I would suppose that PKGBUILDs were written in bash for simplicity
> reason : makepkg just needs to source them, and that's it. Whole
> parsing done for free.
> And now we realize that when using untrusted source, we cannot do that
> anymore. And now we basically have to rewrite a bash parser from
> scratch. I mean, it's hard to imagine a more flawed design, and more
> complex solution to a simple problem.
It is, but as Allan pointed out, makepkg's goals are different. AUR
effectively uses PKGBUILDs in an unconventional way.
> Somehow we manage to go from a very KISS solution to a completely
> anti-KISS one.
>
> I only see two solutions :
> - we keep using bash, but try to do that in the most restricted
> environment possible (e.g. namcap way , or maybe there is something
> even more restrictive and secure ?)
I don't think it's possible to make the bash environment any more
restrictive than namcap's way. There are fundamental things about bash
that make it insecure for blind parsing. The first example that comes
to mind are infinite loops. They may not be deliberate, or occur
often, but it's a possibility. The others regard various utilities
which are required for a sane build environment (mv, install, cat,
etc) but could possibly be dangerous when used maliciously. I recall
Callan (or someone else) provided a nice example of this on the aur-
dev mailing list. It's probably lost now.
A custom parser omits these problems by not executing commands (and by
inference possibly getting inaccurate metadata, e.g. sed) and
restrictive language constructs like loops (maximum amount of
iterations before an error is raised). There are of course pros and
cons to both approaches.
>> It is not particularly possible given all the bash "tricks" used in
>> PKGBUILDs.
>>
>
> Why is that ? Things like calling external commands and such ?
> And would the alternative parser support these bash tricks anyway ?
>
> So why don't we just forbid / not support them ?
Some "tricks" are required for the PKGBUILDs to be sane and remove
duplication. One particular trick that comes to mind is: if [ "$CARCH"
= "x86_64"]; then ... fi. This is something that is very common with
binary sources, and the PKGBUILD format has no native support for it.
Removing support for it would require two PKGBUILDs, almost identical.
Due to things like this, a bit of interpretation is required for the
parser to work properly. I plan on supporting conditionals like these
in pkgparse.
I have also seen for loops populating fields (I think in the kernel
package). These tricks should not be supported as it is due to the
laziness of the maintainer.
Essentially, for building packages, or extracting metadata from safe
packages, using bash directly is the way to go. For environments where
arbitrary packages are parsed and security is an issue, a parser is
currently required.
> - we decide that pkgbuild format is a flawed design, and was too
> limited for our needs, and switch to a new one (in which case Xyne's
> brainstorming could help : http://xyne.archlinux.ca/ideas/pkgmeta )
I have been thinking of designing a new package metadata format
myself, one that is more universal (i.e. not specific to pacman). I
doubt this is achievable, but it's an interesting experiment. The idea
is mostly inspired by the rockspec format for luarocks[2]. It seems
very flexible and provides lots of data. The Lua syntax resembles
JSON, so that format can be used instead. YAML could also be used to
make it "prettier" and even more flexible. Anyway that's a discussion
for another thread. A new format is unlikely to be used by Archlinux
any time soon.
Sorry for going a little off-topic there.
[1]: http://github.com/sebnow/aur2/blob/1ad21387a58eab0d3cf4296f08b6edee0a1e27d2/archlinux/aur/Package/parsepkgbuild.sh
[2]: http://www.luarocks.org/en/Creating_a_rock
More information about the pacman-dev
mailing list