On Mon, Sep 16, 2013 at 03:07:19PM +0000, Xyne wrote:
Chris “Kwpolska” Warrick wrote:
Why not adapt the actual Bash parser (in C) to only read and do stuff safely? In most cases, this would be enough. In the others, we already have mess in those fields in the AUR. (my C skills are not appropriate for this)
That is basically what needs to be done but it is a difficult task. Even if you can adapt the Bash source code to return the AST, you would still need to create an extensive whitelist of executables (both internal and external) that may be run in order interpolate all of the variables. The code must be able to detect variable settings nested in the package functions, skip commands that do not affect variables (which may require it to work backwards), count loop cycles to prevent infinite loops, track time to prevent timeouts, etc.
And you'd need to do all this work at a level lower than the parser itself to avoid subversion via aliases, functions, and scripts which mask the actual operation's nature... I think I've mentioned this a few times, but I think there's 2 options if you want better parsing on the AUR: 1) Extend .AURINFO, implement it as .SRCINFO in makepkg proper. To date, I think there's been a number of issues which no one has been willing to address to make this a reality. 2) Use a VM (e.g. http://www.vidarholen.net/contents/evalbot/) to evalulate the code. This would require something very similar to the guts of makepkg which understands per-package overrides. The output would be something similar to #1, so really... interested parties should just work on that.
I have thought about this before when I wrote the Bauerbill PKGBUILD parser, but I gave up trying to find a way to extract the AST using the Bash code. In the end my code would simply wrap the PKGBUILD in a function, source the file, spit it out with "set" to homogenize the syntax, and then parse it with regexes.
I started writing a Bash parser in Haskell with Parsec but my free time ran out and I had to move on to other things. I think that approach would work quite well if the Bash sources are too tangled to extract the parser, but it is a huge task for one person (word expansion, string manipulation, all of the built-ins, etc.). I would be willing to collaborate on that as well, if there is any interest.
You'd probably be interested in shellcheck: http://www.shellcheck.net/ It's written in Haskell, and while it doesn't execute anything, it does understand a large amount of bash syntax. I found an obscure bug in it recently which was quickly fixed by the author (he's a denizen of #bash on freenode).