[pacman-dev] [RFC] pactree rewrite in C
The recent patchwork over the weekend inspired me to take a closer look at pactree. My first impression was that it would be a great candidate for a rewrite in C. http://www.github.com/falconindy/pactree Results are entertaining: # bash version $ time pactree -r glib2 >/dev/null real 0m2.538s user 0m1.593s sys 0m0.435s # C rewrite $ time ./pactree -r glib2 >/dev/null real 0m0.016s user 0m0.007s sys 0m0.009s I'm finding that it's even slightly more accurate than the bash version WRT to reverse dependency tracking as it's hooking into alpm's alpm_pkg_compute_required_by() instead of wandering through the file hierarchy. It's feature complete compared to the Bash version, aside from the export to graphviz. The only functional change I've made is: when walking reverse depends, don't re-walk already visited dependencies. I thought this was a logical change given the behavior of the standard dependency tracking. I'm happy to offer this up for merging if desirable. There's a few ugly bits in the code I still want to refactor -- if anyone has any comments or criticisms, they're very welcome. d
On Mon, Oct 11, 2010 at 8:06 AM, Dave Reisner <d@falconindy.com> wrote:
The recent patchwork over the weekend inspired me to take a closer look at pactree. My first impression was that it would be a great candidate for a rewrite in C.
http://www.github.com/falconindy/pactree
Results are entertaining:
# bash version $ time pactree -r glib2 >/dev/null
real 0m2.538s user 0m1.593s sys 0m0.435s
# C rewrite $ time ./pactree -r glib2 >/dev/null
real 0m0.016s user 0m0.007s sys 0m0.009s
I'm finding that it's even slightly more accurate than the bash version WRT to reverse dependency tracking as it's hooking into alpm's alpm_pkg_compute_required_by() instead of wandering through the file hierarchy.
It's feature complete compared to the Bash version, aside from the export to graphviz. The only functional change I've made is: when walking reverse depends, don't re-walk already visited dependencies. I thought this was a logical change given the behavior of the standard dependency tracking.
I'm happy to offer this up for merging if desirable. There's a few ugly bits in the code I still want to refactor -- if anyone has any comments or criticisms, they're very welcome.
Didn't look at the code yet, but: 1. Tools that look at the database directly are bound to be broken in the near future anyway- some of this stuff is going to get redone, so we should be doing all interaction via libalpm. 2. As far as graphviz/dot, I think the feature would be more interesting anyway if it just generated the graphviz input rather than actually invoking graphviz. That should be easy to implement. So I at least give a +1 to this, and we can move it into src/util/ like vercmp to make it easier to build and such. -Dan
On Mon, Oct 11, 2010 at 3:32 PM, Dan McGee <dpmcgee@gmail.com> wrote:
On Mon, Oct 11, 2010 at 8:06 AM, Dave Reisner <d@falconindy.com> wrote:
The recent patchwork over the weekend inspired me to take a closer look at pactree. My first impression was that it would be a great candidate for a rewrite in C.
http://www.github.com/falconindy/pactree
Results are entertaining:
# bash version $ time pactree -r glib2 >/dev/null
real 0m2.538s user 0m1.593s sys 0m0.435s
# C rewrite $ time ./pactree -r glib2 >/dev/null
real 0m0.016s user 0m0.007s sys 0m0.009s
I'm finding that it's even slightly more accurate than the bash version WRT to reverse dependency tracking as it's hooking into alpm's alpm_pkg_compute_required_by() instead of wandering through the file hierarchy.
It's feature complete compared to the Bash version, aside from the export to graphviz. The only functional change I've made is: when walking reverse depends, don't re-walk already visited dependencies. I thought this was a logical change given the behavior of the standard dependency tracking.
I'm happy to offer this up for merging if desirable. There's a few ugly bits in the code I still want to refactor -- if anyone has any comments or criticisms, they're very welcome.
Didn't look at the code yet, but: 1. Tools that look at the database directly are bound to be broken in the near future anyway- some of this stuff is going to get redone, so we should be doing all interaction via libalpm. 2. As far as graphviz/dot, I think the feature would be more interesting anyway if it just generated the graphviz input rather than actually invoking graphviz. That should be easy to implement.
So I at least give a +1 to this, and we can move it into src/util/ like vercmp to make it easier to build and such.
This looks awesome. Here is my benchmark : pactree -r glibc : 1m37.494s -> 0m0.329s Also the -r output deals correctly with providers. The only drawback I could find is that arguments cannot be providers. [xavier@xps-m1530 pactree (master)]$ pactree cron +--dcron provides cron |--glibc |--linux-api-headers |--tzdata [xavier@xps-m1530 pactree (master)]$ ./pactree cron error: package 'cron' not found Anyway Dave, it would be great if you could integrate this in src/utils, have a look at Makefile.am for the defines you can re-use (dbpath and/or config). Maybe you can keep it simple and just use DBPATH as I did in testdb. And as a bonus, keep generation of graphviz input file (or similar graph tool) and provider feature :)
On Mon, Oct 11, 2010 at 04:49:00PM +0200, Xavier Chantry wrote:
On Mon, Oct 11, 2010 at 3:32 PM, Dan McGee <dpmcgee@gmail.com> wrote:
On Mon, Oct 11, 2010 at 8:06 AM, Dave Reisner <d@falconindy.com> wrote:
The recent patchwork over the weekend inspired me to take a closer look at pactree. My first impression was that it would be a great candidate for a rewrite in C.
http://www.github.com/falconindy/pactree
Results are entertaining:
# bash version $ time pactree -r glib2 >/dev/null
real 0m2.538s user 0m1.593s sys 0m0.435s
# C rewrite $ time ./pactree -r glib2 >/dev/null
real 0m0.016s user 0m0.007s sys 0m0.009s
I'm finding that it's even slightly more accurate than the bash version WRT to reverse dependency tracking as it's hooking into alpm's alpm_pkg_compute_required_by() instead of wandering through the file hierarchy.
It's feature complete compared to the Bash version, aside from the export to graphviz. The only functional change I've made is: when walking reverse depends, don't re-walk already visited dependencies. I thought this was a logical change given the behavior of the standard dependency tracking.
I'm happy to offer this up for merging if desirable. There's a few ugly bits in the code I still want to refactor -- if anyone has any comments or criticisms, they're very welcome.
Didn't look at the code yet, but: 1. Tools that look at the database directly are bound to be broken in the near future anyway- some of this stuff is going to get redone, so we should be doing all interaction via libalpm. 2. As far as graphviz/dot, I think the feature would be more interesting anyway if it just generated the graphviz input rather than actually invoking graphviz. That should be easy to implement.
So I at least give a +1 to this, and we can move it into src/util/ like vercmp to make it easier to build and such.
This looks awesome.
Here is my benchmark : pactree -r glibc : 1m37.494s -> 0m0.329s
Also the -r output deals correctly with providers.
The only drawback I could find is that arguments cannot be providers. [xavier@xps-m1530 pactree (master)]$ pactree cron +--dcron provides cron |--glibc |--linux-api-headers |--tzdata [xavier@xps-m1530 pactree (master)]$ ./pactree cron error: package 'cron' not found
Anyway Dave, it would be great if you could integrate this in src/utils, have a look at Makefile.am for the defines you can re-use (dbpath and/or config). Maybe you can keep it simple and just use DBPATH as I did in testdb.
And as a bonus, keep generation of graphviz input file (or similar graph tool) and provider feature :)
Oooh, thanks for the reminder -- I had meant to resolve command line args as providers and never got around to it. I'll need to read up on dot's graph specification language. My next project needs to revolve around increasing the number of hours in a given weekend. d
On Mon, Oct 11, 2010 at 5:10 PM, Dave Reisner <d@falconindy.com> wrote:
Oooh, thanks for the reminder -- I had meant to resolve command line args as providers and never got around to it. I'll need to read up on dot's graph specification language. My next project needs to revolve around increasing the number of hours in a given weekend.
Don't worry, take as many weekends as you want (well not too much :D), as this will be planned for the next major release 3.5.0 (minor releases == bugfixes). Let's hope this will happen in a few months, but it often gets delayed. In any cases, it would be awesome if you could complete this.
On Mon, Oct 11, 2010 at 11:12:47PM +0200, Xavier Chantry wrote:
On Mon, Oct 11, 2010 at 5:10 PM, Dave Reisner <d@falconindy.com> wrote:
Oooh, thanks for the reminder -- I had meant to resolve command line args as providers and never got around to it. I'll need to read up on dot's graph specification language. My next project needs to revolve around increasing the number of hours in a given weekend.
Don't worry, take as many weekends as you want (well not too much :D), as this will be planned for the next major release 3.5.0 (minor releases == bugfixes). Let's hope this will happen in a few months, but it often gets delayed. In any cases, it would be awesome if you could complete this.
Not a problem. I merged it into a branch off of my pacman repo today (hurray for learning about git-filter-branch) and weaved it into Makefile.am. This should be feature complete in a few weeks at most. One issue -- I'm not very familiar with autotools. When I build, pactree in src/util is "compiled" into a shell script. The real binary seems to be left behind src/util/.libs, and this doesn't seem to be the case for the other binaries in the directory. If you get a chance, I've got a pacman branch at: http://github.com/falconindy/pacman with my work in the pactree-c branch. Could you take a glance at it (no rush) and let me know if I've done something wrong with Makefile.am or neglected to mention the binary elsewhere for libtool? thanks, dave
On Mon, Oct 11, 2010 at 6:44 PM, Dave Reisner <d@falconindy.com> wrote:
One issue -- I'm not very familiar with autotools. When I build, pactree in src/util is "compiled" into a shell script. The real binary seems to be left behind src/util/.libs, and this doesn't seem to be the case for the other binaries in the directory.
No, this is exactly what is supposed to happen. This is actually libtool's doing. Otherwise, the binary would use the library from the system rather than the one built in the same source tree. Try an ldd on the generated lt-pactree binary after you run the shell script at least once and you'll see what is going on. Really, you got lucky and looked at the one binary that is *not* a shell script (vercmp). cleanupdelta, testdb and testpkg are definitely shell scripts, along with the pacman executable itself in src/pacman/. -Dan
On Mon, Oct 11, 2010 at 07:08:28PM -0500, Dan McGee wrote:
On Mon, Oct 11, 2010 at 6:44 PM, Dave Reisner <d@falconindy.com> wrote:
One issue -- I'm not very familiar with autotools. When I build, pactree in src/util is "compiled" into a shell script. The real binary seems to be left behind src/util/.libs, and this doesn't seem to be the case for the other binaries in the directory.
No, this is exactly what is supposed to happen. This is actually libtool's doing. Otherwise, the binary would use the library from the system rather than the one built in the same source tree. Try an ldd on the generated lt-pactree binary after you run the shell script at least once and you'll see what is going on.
Really, you got lucky and looked at the one binary that is *not* a shell script (vercmp). cleanupdelta, testdb and testpkg are definitely shell scripts, along with the pacman executable itself in src/pacman/.
-Dan
Oh wow, don't I feel like a goof. Makes sense since vercmp is handled differently in the Makefile as well. And this is all taken care of in the install target. Awesome. /me chugs on... d
On Mon, Oct 11, 2010 at 6:44 PM, Dave Reisner <d@falconindy.com> wrote:
On Mon, Oct 11, 2010 at 11:12:47PM +0200, Xavier Chantry wrote:
On Mon, Oct 11, 2010 at 5:10 PM, Dave Reisner <d@falconindy.com> wrote:
Oooh, thanks for the reminder -- I had meant to resolve command line args as providers and never got around to it. I'll need to read up on dot's graph specification language. My next project needs to revolve around increasing the number of hours in a given weekend.
Don't worry, take as many weekends as you want (well not too much :D), as this will be planned for the next major release 3.5.0 (minor releases == bugfixes). Let's hope this will happen in a few months, but it often gets delayed. In any cases, it would be awesome if you could complete this.
Not a problem. I merged it into a branch off of my pacman repo today (hurray for learning about git-filter-branch) and weaved it into Makefile.am. This should be feature complete in a few weeks at most.
One issue -- I'm not very familiar with autotools. When I build, pactree in src/util is "compiled" into a shell script. The real binary seems to be left behind src/util/.libs, and this doesn't seem to be the case for the other binaries in the directory. If you get a chance, I've got a pacman branch at:
http://github.com/falconindy/pacman
with my work in the pactree-c branch. Could you take a glance at it (no rush) and let me know if I've done something wrong with Makefile.am or neglected to mention the binary elsewhere for libtool?
Comments on things: 1. The tertiary patch (b6a6f1ee8421) honestly just makes things unnecessarily compact. We aren't having a fewest line number contest, so I'd drop this completely. 2. For commit messages, I like to have signoffs (-s option to commit or a few other commands), as well as descriptive messages that use full sentences and proper capitalization and punctuation. For these, prefixing them with "pactree:" might be a nice idea. 3. With your permission, adding the standard copyright line we have in the rest of our scripts would be nice if you don't have a problem with it. You can just start the year with 2010. I'm also not sure why you changed the case of those "Authored" and "Original" lines in a random commit. 4. Please follow the include/header guidelines in HACKING. system headers, followed by libalpm headers (and probably a blank line in between). 5. I'm not a huge fan of the STREQ macro; it is only used 3 times and that isn't a really crazy construct anyway. 6. None of your function definitions fit our normal style. All on one line, opening brace next line. 7. Once again, HACKING- we use tabs and not spaces, so no et. The standard vim modeline there should be used. 8. The opening "/**" might confuse Doxygen, we should probably use "/*". 9. We are now reaching three versions of strtrim in the codebase- it would at least be good if they were all the same. 10. All functions (except main) can be marked static; this would allow inlining when compiling. 11. return x --> return(x), HACKING. 12. you are mixing return codes and pm_errno codes in alpm_local_init, and I don't like the "only error" assumption there. 14. I don't like the exit() call in cleanup- just return and let main return(0), or call exit() after a "bad" call to cleanup. I'm not sure why we do this in pacman...I don't like it there either. 15. I think it would be best to avoid parsing pacman.conf at all (too many variables there, since you gave no way of even specifying a different config file). I'd go the testdb route and permit a command line -b flag to specify a different DB location, and that is it. 16. I'd prefer Doxygen comments before the function declaration as opposed to inside, to match our precedent elsehwere. 17. For finding provides, you need to use alpm_depcmp() and not strcmp() as provides can have versions and such tacked on. See lib/libalpm/deps.c for some example usages. 18. char **argv --> char *argv[] for consistency. 19. print_provider and print_pkg look awfully similar. Can we put these together and do some smarts to decide which format string and such to pick? You can also use C string concatenation to your advantage perhaps to make this more understandable. "foo" CONSTANT "bar" CONSTANT "baz" will all be folded into one string. 20. For version, I'd just use the pacman version (see version(void) in pacman.c). 21. No-arg functions should really be written foo(void) according to the C spec, looks like we missed that in a few places in other util programs but might as well get it right here. 22. We try to wrap lines before they hit 80 characters. 23. Please always include {} in conditionals (e.g. walk_deps, walk_reverse_deps), even if the body is just one statement. HACKING 24. I wouldn't bail on a non-existant provider, I'd just print "provider <can't resolve>" or something sensical and move on (and not return). And again, returning a libalpm error code doesn't make a whole lot of sense here. 25. I know it isn't required by C99, but we tend to use forward variable declaration elsewhere in our codebase. 26. The magic argv[optind] would be nice to pull into a local variable as you reference it 5 times. 13 is unlucky so I skipped it. :P -Dan
On Mon, Oct 11, 2010 at 07:49:57PM -0500, Dan McGee wrote:
Comments on things:
1. The tertiary patch (b6a6f1ee8421) honestly just makes things unnecessarily compact. We aren't having a fewest line number contest, so I'd drop this completely. 2. For commit messages, I like to have signoffs (-s option to commit or a few other commands), as well as descriptive messages that use full sentences and proper capitalization and punctuation. For these, prefixing them with "pactree:" might be a nice idea. 3. With your permission, adding the standard copyright line we have in the rest of our scripts would be nice if you don't have a problem with it. You can just start the year with 2010. I'm also not sure why you changed the case of those "Authored" and "Original" lines in a random commit. 4. Please follow the include/header guidelines in HACKING. system headers, followed by libalpm headers (and probably a blank line in between). 5. I'm not a huge fan of the STREQ macro; it is only used 3 times and that isn't a really crazy construct anyway. 6. None of your function definitions fit our normal style. All on one line, opening brace next line. 7. Once again, HACKING- we use tabs and not spaces, so no et. The standard vim modeline there should be used. 8. The opening "/**" might confuse Doxygen, we should probably use "/*". 9. We are now reaching three versions of strtrim in the codebase- it would at least be good if they were all the same. 10. All functions (except main) can be marked static; this would allow inlining when compiling. 11. return x --> return(x), HACKING. 12. you are mixing return codes and pm_errno codes in alpm_local_init, and I don't like the "only error" assumption there. 14. I don't like the exit() call in cleanup- just return and let main return(0), or call exit() after a "bad" call to cleanup. I'm not sure why we do this in pacman...I don't like it there either. 15. I think it would be best to avoid parsing pacman.conf at all (too many variables there, since you gave no way of even specifying a different config file). I'd go the testdb route and permit a command line -b flag to specify a different DB location, and that is it. 16. I'd prefer Doxygen comments before the function declaration as opposed to inside, to match our precedent elsehwere. 17. For finding provides, you need to use alpm_depcmp() and not strcmp() as provides can have versions and such tacked on. See lib/libalpm/deps.c for some example usages. 18. char **argv --> char *argv[] for consistency. 19. print_provider and print_pkg look awfully similar. Can we put these together and do some smarts to decide which format string and such to pick? You can also use C string concatenation to your advantage perhaps to make this more understandable. "foo" CONSTANT "bar" CONSTANT "baz" will all be folded into one string. 20. For version, I'd just use the pacman version (see version(void) in pacman.c). 21. No-arg functions should really be written foo(void) according to the C spec, looks like we missed that in a few places in other util programs but might as well get it right here. 22. We try to wrap lines before they hit 80 characters. 23. Please always include {} in conditionals (e.g. walk_deps, walk_reverse_deps), even if the body is just one statement. HACKING 24. I wouldn't bail on a non-existant provider, I'd just print "provider <can't resolve>" or something sensical and move on (and not return). And again, returning a libalpm error code doesn't make a whole lot of sense here. 25. I know it isn't required by C99, but we tend to use forward variable declaration elsewhere in our codebase. 26. The magic argv[optind] would be nice to pull into a local variable as you reference it 5 times.
13 is unlucky so I skipped it. :P
-Dan
Awesome got my work cut out for me. Quick notes on your comments: 1-8, 10-12, 14-16, 18, 20-24, 26: i agree, i'll fix it. 9: nuts to that, i'm just cutting it out since i'm not parsing config. 17: this requires some reworking, but i think i like where this goes. 19: i may need some help with this one. pushing it towards the bottom of the list for now. 25: not sure i understand what a forward declaration means with regard to variables. do you just want all variables declared up front, at the start of each function? As I mentioned on IRC, I'll present a rewrite instead of trying to rebase this out. Trying to rebase this would be fairly insidious. d
On Mon, Oct 11, 2010 at 07:49:57PM -0500, Dan McGee wrote:
<snip awesome enormous code review>
13 is unlucky so I skipped it. :P
-Dan
Round two! I've merged pactree into src/util and it's living in my pacman tree: http://github.com/falconindy/pacman/tree/pactree-c/src/util/pactree.c Dan: I think I tackled your list with a great amount of success. With some good additional advice from Xavier and yourself (plus the proposed API addition), I'm feeling that it's greatly improved all around. Concerns: 1) Commenting. Everything seems fairly self explanatory to me, but you all may want me to 'splain myself. 2) I really hate seeing the enormous nests of if/else going on when it comes to output. If anyone has any advice about how I can clean this up, I'm all ears. d
participants (3)
-
Dan McGee
-
Dave Reisner
-
Xavier Chantry