[aur-dev] [PATCH] Use bash script to parse pkgbuilds

older
[aur-dev] [PATCH] Remove all vim...

Callan Barrett

8 Jun 2008 8 Jun '08

11:19 a.m.

Ok this is a test to get a bash script to parse pkgbuilds rather than the current crazy way it's done now. The pkgbuild is sent through a bash script (thanks Xilon) and the vars are output so the result can be evaluated by PHP, this way nothing can get missed and there are no bugs with what's parsed. This also gets rid of the package contents stuff, didn't really have a choice but it seems it's not used for anything, only wasted space. As far as I can tell this fixes about 4 bugs currently open on the bug tracker so aside from being a huge improvement over the current parsing stuff (has anyone ever looked at what that parses, wow) it also fixes a lot of bugs. The only downside to it is we can't tell whether a variable actually exists or is just empty so no more hand holding and doing some faux validation, we parse what we need and the rest of the pkgbuild is up to the maintainer. Although I'm not suggesting we apply *this* patch to the AUR I am suggesting we apply some cleaned up form of it since it would be a great improvement. This patch does work fine by the way it's just a bit messy. -- Callan Barrett

Attachments:

0002-Use-bash-script-to-parse-pkgbuilds.patch (application/octet-stream — 19.8 KB)

Show replies by date

Callan Barrett

10 Jun 10 Jun

7:16 a.m.

On Sun, Jun 8, 2008 at 7:19 PM, Callan Barrett <wizzomafizzo@gmail.com> wrote:

...

Ok this is a test to get a bash script to parse pkgbuilds rather than the current crazy way it's done now.

The pkgbuild is sent through a bash script (thanks Xilon) and the vars are output so the result can be evaluated by PHP, this way nothing can get missed and there are no bugs with what's parsed. This also gets rid of the package contents stuff, didn't really have a choice but it seems it's not used for anything, only wasted space. As far as I can tell this fixes about 4 bugs currently open on the bug tracker so aside from being a huge improvement over the current parsing stuff (has anyone ever looked at what that parses, wow) it also fixes a lot of bugs. The only downside to it is we can't tell whether a variable actually exists or is just empty so no more hand holding and doing some faux validation, we parse what we need and the rest of the pkgbuild is up to the maintainer.

Although I'm not suggesting we apply *this* patch to the AUR I am suggesting we apply some cleaned up form of it since it would be a great improvement. This patch does work fine by the way it's just a bit messy.

-- Callan Barrett

Note to self: restricted mode is apparently not as restricted as it sounds, need to look into that and probably how namcap deals with it. Possibly a chroot? Also I'm assuming everyone is fine and dandy with this behavior so when it's not terribly insecure anymore I'll apply this. -- Callan Barrett

Loui

2:45 p.m.

On Tue, 10 Jun 2008 15:16:29 +0800 "Callan Barrett" <wizzomafizzo@gmail.com> wrote:

...

Note to self: restricted mode is apparently not as restricted as it sounds, need to look into that and probably how namcap deals with it.

Possibly a chroot?

Also I'm assuming everyone is fine and dandy with this behavior so when it's not terribly insecure anymore I'll apply this.

Yeah I'm concerned about how secure it is to actually source the PKGBUILD in bash. I don't know if AUR really needs to cover every exotic PKGBUILD out there. Having to set up a chroot seems like a bit much. It would be nice to have all variables resolved though, and bash could make it a lot simpler.

Cilyan Olowen

6:25 p.m.

Is it really such a hard work to parse PKGBUILD and simulate a bash behaviour only on variables preceding the build() function ? The parser needs to be able to : - affect and replace variables - support arrays - support the simple ${x//find/replace} syntax - what else ? If this can be tedious with PHP, is it that difficult to realize in python ? Cilyan 2008/6/10 Loui <louipc.ist@gmail.com>:

...

On Tue, 10 Jun 2008 15:16:29 +0800 "Callan Barrett" <wizzomafizzo@gmail.com> wrote:

...
Note to self: restricted mode is apparently not as restricted as it sounds, need to look into that and probably how namcap deals with it.

Possibly a chroot?

Also I'm assuming everyone is fine and dandy with this behavior so when it's not terribly insecure anymore I'll apply this.

Yeah I'm concerned about how secure it is to actually source the PKGBUILD in bash. I don't know if AUR really needs to cover every exotic PKGBUILD out there. Having to set up a chroot seems like a bit much. It would be nice to have all variables resolved though, and bash could make it a lot simpler.

Callan Barrett

11 Jun 11 Jun

5:13 a.m.

On Wed, Jun 11, 2008 at 2:25 AM, Cilyan Olowen <gaknar@gmail.com> wrote:

...

Is it really such a hard work to parse PKGBUILD and simulate a bash behaviour only on variables preceding the build() function ? The parser needs to be able to : - affect and replace variables - support arrays - support the simple ${x//find/replace} syntax - what else ?

If this can be tedious with PHP, is it that difficult to realize in python ?

It's not that it's tedious in PHP, it would be equally tedious and difficult in any other language to write something to parse bash as nicely as bash does. At this point although the bash parser works and there's no way for any really malicious stuff to happen this is really only a good solution on a local machine, that's why namcap is fine with it. There's nothing stopping a user from, say, embedding an infinite loop which would be evaluated by the server or at very least listing any files in the FS (thanks to the globbing). It's not bad but it's more than enough to stop its use on a server. I think at this point we'll just have to write a better parser for pkgbuilds than what we have now (we really need to, the current one is pretty wishy washy and it amazes me how long it's stood up). -- Callan Barrett

Sebastian Nowicki

12 Jun 12 Jun

6:17 p.m.

On 10/06/2008, at 12:16 AM, Callan Barrett wrote:

...

On Sun, Jun 8, 2008 at 7:19 PM, Callan Barrett <wizzomafizzo@gmail.com> wrote:

...
Ok this is a test to get a bash script to parse pkgbuilds rather than the current crazy way it's done now.

The pkgbuild is sent through a bash script (thanks Xilon) and the vars are output so the result can be evaluated by PHP, this way nothing can get missed and there are no bugs with what's parsed. This also gets rid of the package contents stuff, didn't really have a choice but it seems it's not used for anything, only wasted space. As far as I can tell this fixes about 4 bugs currently open on the bug tracker so aside from being a huge improvement over the current parsing stuff (has anyone ever looked at what that parses, wow) it also fixes a lot of bugs. The only downside to it is we can't tell whether a variable actually exists or is just empty so no more hand holding and doing some faux validation, we parse what we need and the rest of the pkgbuild is up to the maintainer.

Although I'm not suggesting we apply *this* patch to the AUR I am suggesting we apply some cleaned up form of it since it would be a great improvement. This patch does work fine by the way it's just a bit messy.

-- Callan Barrett

Note to self: restricted mode is apparently not as restricted as it sounds, need to look into that and probably how namcap deals with it.

What do you mean? If the path is set to nothing, executables sare not found, so any sort of "rm -rf /" spits out an error. I tested it quite a bit with various commands like that - they don't work. Btw, credit should really go to namcap, I just modified the script that's used there. ps. WWDC is awesome!

Callan Barrett

13 Jun 13 Jun

10:11 a.m.

On Fri, Jun 13, 2008 at 2:17 AM, Sebastian Nowicki <sebnow@gmail.com> wrote:

...

On 10/06/2008, at 12:16 AM, Callan Barrett wrote:

...
Note to self: restricted mode is apparently not as restricted as it sounds, need to look into that and probably how namcap deals with it.

What do you mean? If the path is set to nothing, executables sare not found, so any sort of "rm -rf /" spits out an error. I tested it quite a bit with various commands like that - they don't work.

Btw, credit should really go to namcap, I just modified the script that's used there.

I understand this all and I've tried it all out too but I'm talking about the stuff that can get evaluated that's just pure bash now. As far as I can tell stuff like infinite loops can really screw us over and it's possible to do things like get a directories contents using relative paths. It's not really lethal like rm but it's not particularly good for a server either. (I'm not against this idea idea, please prove me wrong if you can)

...

ps. WWDC is awesome!

Get on Jabber :( -- Callan Barrett

Callan Barrett

19 Jun 19 Jun

4:54 p.m.

Here's another iteration of this patch, I'm still looking for as much input as possible but this is basically what I would push to testing at this point. The script now outputs in a different format to be parsed and there is some cleanup done in pkgsubmit.php to get it working more cleanly with the script. -- Callan Barrett

Loui

20 Jun 20 Jun

5:36 a.m.

On Fri, Jun 20, 2008 at 12:54:29AM +0800, Callan Barrett wrote:

...

Here's another iteration of this patch, I'm still looking for as much input as possible but this is basically what I would push to testing at this point. The script now outputs in a different format to be parsed and there is some cleanup done in pkgsubmit.php to get it working more cleanly with the script.

Kewl.

Simo Leone

8:22 a.m.

On Fri, Jun 20, 2008 at 12:54:29AM +0800, Callan Barrett wrote:

...

Here's another iteration of this patch, I'm still looking for as much input as possible but this is basically what I would push to testing at this point. The script now outputs in a different format to be parsed and there is some cleanup done in pkgsubmit.php to get it working more cleanly with the script.

Unfortunately Callan and I found a way to easily defeat this tonight, the proof-of-concept is attached, the attack is based on this little bit about restricted shells (from the manpage): --- When a command that is found to be a shell script is executed (see COM- MAND EXECUTION above), rbash turns off any restrictions in the shell spawned to execute the script. --- Too bad too, real bash parsing would have been nice :/ -S

Sebastian Nowicki

5:04 p.m.

On 20/06/2008, at 4:22 PM, Simo Leone wrote:

...

On Fri, Jun 20, 2008 at 12:54:29AM +0800, Callan Barrett wrote:

...
Here's another iteration of this patch, I'm still looking for as much input as possible but this is basically what I would push to testing at this point. The script now outputs in a different format to be parsed and there is some cleanup done in pkgsubmit.php to get it working more cleanly with the script.

Unfortunately Callan and I found a way to easily defeat this tonight, the proof-of-concept is attached, the attack is based on this little bit about restricted shells (from the manpage): --- When a command that is found to be a shell script is executed (see COM- MAND EXECUTION above), rbash turns off any restrictions in the shell spawned to execute the script. ---

Too bad too, real bash parsing would have been nice :/

-S <script.txt><PKGBUILD.txt><fucked.sh>

Perhaps we should write up our own tiny little bash parser. I've never really done anything like that before, but after a little googling I found two tools that could simplify the whole process a lot; Lex and YACC. Lex tokenizes the source code and YACC recognizes higher-level patterns (expressions, assignments, etc). I believe the output of YACC is compiled as a C program, so we could run that and use the output somehow. We mostly only deal with assignments, which should be fairly straight forward. I have seen a few (official) PKGBUILDs that used loops to generate the source and md5sums arrays, so we might want to support that as well. Of course doing so would introduce the problem with infinite loops. "If" statements (both the traditional syntax and the shorthand) would probably be necessary, since checksums may be different for different architectures (different blob for each architecture). I don't see any harm in supporting those. Of course going down this path means a LOT more work and most likely a lot more bugs and hair pulling. On the other hand we do have very tight control over the parsing and we can modify it at will.

6192

Age (days ago)

6204

Last active (days ago)

List overview

Download

10 comments

5 participants

participants (5)

Callan Barrett
Cilyan Olowen
Loui
Sebastian Nowicki
Simo Leone