[pacman-dev] [PATCH 0/3] makepkg: Alternate implementation of VCS URLs in sources array.

Luke T.Shumaker lukeshu at sbcglobal.net
Mon Aug 27 09:55:57 EDT 2012

At Sat, 25 Aug 2012 14:43:32 -0400,
Dave Reisner wrote:
> On Sat, Aug 25, 2012 at 01:36:41PM -0400, Luke Shumaker wrote:
> > A while ago I started working on a derivative of makepkg to support
> > having 'git://...' type urls in the sources=() array.  When preparing
> > to file this patch, I did a `git rebase`, and noticed that Allan McRae
> > began working on a similar feature. Our implementations are in many
> > ways similar. Hopefully mine will be useful.
> An interesting approach. As you've noticed, I think we're fairly
> committed to the implementation that Allan has provided. If you have
> specific concerns, maybe we can work to fix those.

 * Allan's URL parsing has some issues, see point 2 below. This is an
   easy fix.
 * Composability. It's not really a "problem", but it seems to me that
   It's better to create a new general-purpose tool, instead of
   shoving a bunch of "special case" behavior into makepkg. Allan's
   implementation could even fairly easily be pulled out of makepkg.

And really, does it make sense to have these URL schemes hardcoded
into makepkg? Why have DLAGENTS at all, if we're going to hard-code
schemes into makepkg? Again, I'm not against Allan's imlementation,
but I am for moving it out of makepkg.

> > My implementation makes minimal changes to makepkg itself (only adding
> > blob expansion to DLAGENTS, allowing for things like
> > "git+*::""). Instead I added a `vcsget` tool which generates a tarball
> > from the VCS repo, in a very similar manner to the way Allan's
> > implementation does so within makepkg.
> I'm not thrilled with the shell I saw in that patch -- there's extensive
> use of external commands when simple bash features would have sufficed.

I assume you're speaking of my
 1. use of cut in get_field
 2. use of sed to parse URLs
 3. use of sed to parse URL authorities
 4. use of readlink to establish the tarball name
 5. use of sed to create a basedir name from the tarball name
 6. use of '[' with if statements

My defense of each:

1. `cut` in get_field

This was simply the simplest solution that was obviously
correct. Another solution would have been to manipulate IFS, which has
all kinds of funny side effects and fringe behaviors.

That was a weak argument, and as a coder, I would be thrilled to find
out if there is a better solution.

2. `sed` to parse URLs

I wanted full URL parsing, for a few cases when the translation is
less than than straight-forward. For example, including authentication
info in svn URLs.

Further, I wanted to make sure that *any* URL parsing done would be
robust. Given that the regex used was taken straight from the relevent
RFC, I knew I could count on it to work, even in fringe cases. For
example a "fragment" *should* be able to contain a '#', but Allan's
implementation discards anything before the last '#' in the
fragment. This is a problem for (at least) git, as tags and branches
may contain the symbol.

3. `sed` to parse URL authorities

I believe that a robust (not having the same problems as fragments for
URLs) implementation in bash would be non-trivial.

4. `readlink` to establish the tarball name

I used `readlink -m` to turn the tarball name into an absolute
path. This was not an absolute must, but it allowed me to avoid
worrying about keeping track of the current directory. It would be
fairly easy to remove this by useing pushd/popd instead of cd. If you
would like this changed, I can do that.

5. `sed` to create a basedir name from the tarball name

I'll admit, this was me getting a little lazy. A pure bash
implementation is:


6. `[` with if statements

Can I call stylistic choice on this one? It is trivial to replace this
with bash's '[['.

> > It looks as if Allan's download_*() functions are more verbose than
> > mine about what failed when there is an error. His svn and hg handlers
> > are likely more robust--though my git is pretty solid. I also have
> > a half-written handler for for bzr.
> > 
> > An advantage of my design is that it does allow for integrity checks
> > of VCS packages, rather than inserting 'SKIP' into the md5sums
> > array. This is very important to the derivative distribution Parabola.
> > (However, the 'SKIP' option is still valuable for URLs that track a
> > branch)
> I don't see this as an advantage so much as a duplication. The backing
> VCS takes care of integrity checks. They're only necessary with tarballs
> because there is no "builtin" checksum to reference.

This is only partially true of the non-distributed VCSs.

There was concern from other Parabola developers about being able to
use the checksum to unambiguously be able to confirm that a source
tree is the correct version, but Allan's implementation seems
acceptable on that front.

Happy hacking,
~ Luke Shumaker

More information about the pacman-dev mailing list