[pacman-dev] RFC: Really running package_* functions in rbash during printsrcinfo
Howdy -- I recently had need to dig into the implementation of makepkg --printsrcinfo, and ran into the "running regular expressions against source code" operations in the backend. Obviously, this is not ideal. Indeed, I've previously written packages (doing unusual and typically-undesirable things, granted) with conditional logic *assuming* that actual execution would be taking place. I fully appreciate the decision not to try to go with more expansive attempts at emulating bash parsing/execution in the future, but do folks have any thoughts on **really** executing PKGBUILDs in a restricted environment, including execution of the package_* functions? See a simple sandboxed parser for config files implemented as bash code in code I've written for NixOS at https://github.com/charles-dyfis-net/nixpkgs/blob/f50bfe267a312515d88e86c12a.... We might need a little more complexity here -- using DEBUG traps to avoid "|| exit" logic from aborting, f/e -- but my initial impression is that "more accurate than the current implementation" (and maybe a fair bit faster, if we extract all variables in one subshell per function) is not a hard goal to achieve. Thoughts?
On 1/6/19 1:58 PM, Charles Duffy wrote:
Howdy --
I recently had need to dig into the implementation of makepkg --printsrcinfo, and ran into the "running regular expressions against source code" operations in the backend.
Obviously, this is not ideal. Indeed, I've previously written packages (doing unusual and typically-undesirable things, granted) with conditional logic *assuming* that actual execution would be taking place.
I fully appreciate the decision not to try to go with more expansive attempts at emulating bash parsing/execution in the future, but do folks have any thoughts on **really** executing PKGBUILDs in a restricted environment, including execution of the package_* functions?
See a simple sandboxed parser for config files implemented as bash code in code I've written for NixOS at https://github.com/charles-dyfis-net/nixpkgs/blob/f50bfe267a312515d88e86c12a.... We might need a little more complexity here -- using DEBUG traps to avoid "|| exit" logic from aborting, f/e -- but my initial impression is that "more accurate than the current implementation" (and maybe a fair bit faster, if we extract all variables in one subshell per function) is not a hard goal to achieve.
Thoughts?
How would this work considering that it would have to actually do things like cd into $pkgdir, attempt to run /usr/bin/make, and so on? Setting the PATH to something empty won't help with what I'd guess is the primary use of complex functions in the wild, as discussed here: https://bugs.archlinux.org/task/58776 Namely, executing /usr/bin/perl in order to discover its version and implement dependency ranges. -- Eli Schwartz Bug Wrangler and Trusted User
How it would work is that those operations fail, and we'd let them fail -- we don't need them to succeed for the (global) variables we're there for to be set. As a proof-of-concept-y example (obviously, we'd want to suppress all the "command not found"s, "cd"s, etc unless the user has turned up the verbosity level a bit), see the data correctly extracted at the end of the below: $ env -i PATH=/var/empty ENV='' "$(type -P bash)" -r -c 'eval "$(</dev/stdin)" >&2; package_postgresql >&2; declare -p pkgdesc backup depends optdepends options install' <postgresql/PKGBUILD environment: line 125: cd: restricted environment: line 128: make: command not found environment: line 129: make: command not found environment: line 130: make: command not found environment: line 134: make: command not found environment: line 134: make: command not found environment: line 134: make: command not found environment: line 134: make: command not found environment: line 134: make: command not found environment: line 138: rm: command not found environment: line 138: rm: command not found environment: line 138: rm: command not found environment: line 138: rm: command not found environment: line 138: rm: command not found environment: line 138: rm: command not found environment: line 138: rm: command not found environment: line 138: rm: command not found environment: line 138: rm: command not found environment: line 138: rm: command not found environment: line 138: rm: command not found environment: line 138: rm: command not found environment: line 138: rm: command not found environment: line 138: rm: command not found environment: line 138: rm: command not found environment: line 142: install: command not found environment: line 145: rm: command not found environment: line 146: rm: command not found environment: line 147: find: command not found environment: line 148: rmdir: command not found environment: line 150: install: command not found environment: line 152: install: command not found environment: line 155: install: command not found environment: line 158: install: command not found declare -- pkgdesc="Sophisticated object-relational DBMS" declare -a backup=([0]="etc/pam.d/postgresql" [1]="etc/logrotate.d/postgresql") declare -a depends=([0]="postgresql-libs>=9.6.10" [1]="krb5" [2]="libxml2" [3]="readline>=6.0" [4]="openssl>=1.0.0" [5]="pam") declare -a optdepends=([0]="python2: for PL/Python support" [1]="perl: for PL/Perl support" [2]="tcl: for PL/Tcl support" [3]="postgresql-old-upgrade: upgrade from previous major version using pg_upgrade") declare -a options=([0]="staticlibs") declare -- install="postgresql.install" On Sun, Jan 6, 2019 at 1:09 PM Eli Schwartz <eschwartz@archlinux.org> wrote:
On 1/6/19 1:58 PM, Charles Duffy wrote:
Howdy --
I recently had need to dig into the implementation of makepkg --printsrcinfo, and ran into the "running regular expressions against source code" operations in the backend.
Obviously, this is not ideal. Indeed, I've previously written packages (doing unusual and typically-undesirable things, granted) with conditional logic *assuming* that actual execution would be taking place.
I fully appreciate the decision not to try to go with more expansive attempts at emulating bash parsing/execution in the future, but do folks have any thoughts on **really** executing PKGBUILDs in a restricted environment, including execution of the package_* functions?
See a simple sandboxed parser for config files implemented as bash code in code I've written for NixOS at
https://github.com/charles-dyfis-net/nixpkgs/blob/f50bfe267a312515d88e86c12a... .
We might need a little more complexity here -- using DEBUG traps to avoid "|| exit" logic from aborting, f/e -- but my initial impression is that "more accurate than the current implementation" (and maybe a fair bit faster, if we extract all variables in one subshell per function) is not a hard goal to achieve.
Thoughts?
How would this work considering that it would have to actually do things like cd into $pkgdir, attempt to run /usr/bin/make, and so on?
Setting the PATH to something empty won't help with what I'd guess is the primary use of complex functions in the wild, as discussed here: https://bugs.archlinux.org/task/58776
Namely, executing /usr/bin/perl in order to discover its version and implement dependency ranges.
-- Eli Schwartz Bug Wrangler and Trusted User
Apologies for not having fully internalized your post before responding -- you make a good point that there's functionality for which we need real, unrestricted evaluation. Whether that functionality is worthwhile is a different matter -- my immediate use case is one where I care about extracting accurate-as-possible data for a large number of packages *quickly*, and I'm actually somewhat unhappy with how expensive the current approach taken by makepkg is (considerably more subprocesses there than could be strictly needed if we streamlined it).
participants (2)
-
Charles Duffy
-
Eli Schwartz