[arch-dev-public] Killing CVS [was: Status Report 2007-10-15]

Mon Oct 22 22:32:44 EDT 2007

On 10/19/07, Aaron Griffin <aaronmgriffin at gmail.com> wrote:
> I'd still love to see a git or even mercurial implementation (bonus
> points: mercurial works on Win32 better, so we could ease into a
> 'cygwin' arch, which is something I wanted to try out for a while,
> hah).
> When I have time and finish up some things I wanted to finish, I will
> see about a git implementation.
> I believe Dan already has something, so he may beat me.
>
> Anyone else have something to add?

And now for part 2 of 2 of my email. This one may be a bit long so
hold on to your hats and try to stay with me.

Links:
Gitweb: http://www.archlinux.org/~dan/newrepo/gitweb.cgi?p=repo.git;a=summary
Download bare repo: http://www.archlinux.org/~dan/newrepo/repo.git.tar.gz
Clone repo address: http://www.archlinux.org/~dan/newrepo/repo.git

First off, I'm going to step through the examples in the repository,
which is much easier to visualize if you have a local clone and are
using a gitk or qgit-type tool where you can see the branches and
merges.

Let's start by stepping through my example repo.
Example 1- simple commit. Should be self explanatory. Given a package
already in the repo, update the PKGBUILD and any auxiliary files, and
make the commit straight to the master branch. Any SCM can do this.
Example 2- addition of a new package. As simple as git add
<packagedir>, commit, and boom- we are in business.
Example 3- single package bump through testing. Here is where the GIT
approach will start to slide from a centralized SCM. Because of
lightweight branching, we have no issues with creating a lot of
branches. For my proposal, each independent "testing group" gets its
own branch. In this commit's case, we are creating a group for just
one package, udev. The steps here were "git checkout -b test_udev;
<edit files>; git commit". Note the testing prefix- this will be used
by repo-generation scripts to know that a package is in what we
currently call the testing repository.
Example 4- rebuild a set of packages. Here we see an example of
rebuilding multiple packages. This happens frequently for us with
things like db. I did the rebuild here in two separate commits-
whether it is one commit or 100 it really doesn't matter as long as
they are all done on the same branch.
Example 5- kernel RC development. This is something I have yet to see
CVS do well at. We haven't ever tracked this in a version control
system simply because our current framework won't let us. Here, we
simply have a long running branch with the 'devel_' prefix used to
indicate it is something special. There will be a few more commits on
this branch, but that won't hurt us later when we...

< take a breather, still a few examples to go here :) >

(unnumbered example)- oops, forgot to number a commit, but the one
labeled "Kernel pkg bump" was a commit to testing for a new minor
kernel release. Note that we can successfully use our VCS to track
both a kernel RC and the mainline release.
Example 6- merging test back into "current". Here we merge our pacman
branch back into current, pulling all of the changes made on the
branch in with us. This is a simple "git checkout master; git merge
test_pacman". For your sake reading this, I did not delete the
test_pacman branch tag at this point, although in the real usage of
this system we would in order for DB scripts to drop the pacman
package out of testing.
Example 7- merge multiple test branches at once. Here we pulled both
the udev and kernel branches back into master. (once again, we would
delete the 'test_' branches.)
Example 8- The new kernel is released. its simple to base this branch
off of the RC PKGBUILDs we have already been tracking, so we do this.
In this sense, we are successfully tracking three different kernel
PKGBUILDs if we need that capability.
Example 9- Another merge, this time the kernel. We would delete the
test_kernel24 branch, but the devel_kernel branch could live on to
support the next round of RCs.

Wow, sorry about that overload there.

Paul's list of problems to address:
> a) Moving packages from one repo to another is hard.
In this case, the definition of a repository will be outside of the
actual PKGBUILD tree. Similar to Jason's flat-tree layout, the actual
pushing of packages into repos wouldn't be determined by where the
package files live, but by an external configuration. Thus all you
would need to move a package is modify the config.

> b) Placing packages in multiple repos is hard.
I'm going to interpret this as the current/testing idea. With this
system, placing a package in testing should be as easy as making the
changes on a 'test_' topic branch.

> c) Continued separate-track development on a package while in
> testing is hard.
Addressed above.

> d) Tracking multiple binary repos for different architectures is hard.
This is something that needs to be addressed, and I do need to figure
it out. The issue can be generalized more- how do we track what is
actually released vs. what is in the repository? (A split CURRENT &
HEAD tag scenario comes to mind from current CVS repositories) With
GIT, we surely have the tools that can solve this issue but it needs
some further thinking.

> e) Maintenance of a package by more than one person is hard.
Developers on this system can dive right into the full GIT toolset and
leverage it to do what they need to do. If two people are working on a
rebuild, they could easily push and pull trees between each other
before making the final commit to the main repository. In additon, GIT
merges rather than overwrites by default so it is hard to stomp on
changes by other developers.

Things that need to be done:
1. Write a toolkit. Not looking for anything huge or crazy here (just
take a look at Jason's SVN-based tools), but usage of this system by a
developer should be easy. Typing one command instead of having to
string together multiple GIT commands is a plus, and allows for more
uniform procedure-following by all devs.
2. Examine feasibility of automated DB scripts. I think if things like
the 'test_' prefix are done, we can automate db building when a master
repository is pushed to. However, I don't know this for sure, and it
would be wise to check on what would happen with edge cases and such.
3. Figure out best way to track binary repos and changes that haven't
yet been pushed to a built package.

Please comment on anything and everything above and I'll respond. And
as I said in my first email tonight- so far Jason's SVN solution is
the only other one that looks good (besides hopefully this one). Let's
keep the ball rolling here.

-Dan