[pacman-dev] Git's views versus parallel branches (was Re: [PATCH 1/2] Enabled ...)

Tue Feb 24 01:26:57 EST 2009

On 24/02/2009, at 1:45 PM, Bryan Ischo wrote:

> An analogy: using git is kind of like keeping every directory in  
> your home directory in a separate tar file, except for one untarred  
> working directory.  Whenever you want to cd into a new directory,  
> you have to tar up the directory you were just in, and then untar  
> the directory you want to cd into.

Although I understand what you're getting at, the analogy is an  
exaggeration. The workflow you described requires archiving the  
directory, storing the archive somewhere, then removing the directory,  
and un-archiving a different archive, then changing to that directory.  
This is in contrast with a simple "git checkout branch" (possibly with  
a prior commit, or stash, operation). I.e., five operations as opposed  
to one or two.

> An example of what I can do with parallel branches: if my branch  
> 'new' and 'old' were stored in two separate subdirectories, then I  
> could grep through all of the files in both branches with one  
> command and see collated results; or could diff only files in which  
> a given identifier appeared only in a file on one branch but not the  
> corresponding file on another.

Given git's flexibility and UNIX philosophy, I'm sure it would be  
possible to create tools which did this. Most of my development thus  
far has been either adding features, or modifying small things, so I  
haven't had a need for this. In general the changes where not  
conflicting between branches, and I did not need to compare my  
branches with an upstream master branch. Perhaps this is a workflow  
issue (doing too many things at once)?

> Another problem with git: I have to constantly rebuild stuff when I  
> stash and unstash because my build directory doesn't stash.

I don't see why anyone would want to frequently change directories and  
compile. When I switch to a branch I tend to work on it for a while  
before switching to another one.

> I'd rather use a paradigm that thousands of tools already depend on,  
> than the special case paradigm that is git.

I don't see how this would improve your workflow in any way  
whatsoever, but it is possible to simply keep multiple trees, with a  
specific branch checked out. For instance instead of having one git  
repository "foobar", you could have a project directory "foobar", and  
have two repositories "eggs" and "spam", cloned from the same "master"  
repository, but with the "eggs" and "spam" branches checked out,  
respectively. You could still do all the other operations with git, if  
you add the other repositories as remotes, but this adds unnecessary  
maintenance. I see Dan beat me to pointing this out already.

> If anything, git seems messier to me because some files get changed  
> in-place as you switch branches in git, and some files are ignored  
> and left as they are (those that aren't actually tracked by git),  
> and distinguishing between the two types requires git commands and  
> lots of mental notes.

As I mentioned in my previous post, the files simply shouldn't be  
ignored, they should be added to a temporary commit before switching,  
otherwise you get confused. This is the downside of switching branches  
in-place. I don't really have a preference between a directory-based  
branch structure or an in-place structure. However, git's branches let  
you have a single tree (working directory), which seems simpler and  
cleaner.

> you can do a grep over multiple branches at once to find identifiers

git grep (not sure if it can actually grep over multiple branches)

> , you can count line numbers for entire branches if you like to see  
> how much bigger your code base is in one branch than another

git diff my_branch..other_branch --stat

> you have to run many git commands

Not really. Git is highly scriptable, if you do something often, you  
can script the git commands to do it (if they don't already exist),  
and just run the script. This is also what you'd call the UNIX  
philosophy.

> git seems to require keeping a mental model of branches that you  
> can't even "see" because they aren't in your filesystem anywhere.

Indeed, git doesn't track files, it tracks file content. Branches are  
just labels for a sequence of changes. I'm sure this gives git many  
advantages over other SCMs, but I'm not familiar enough with the  
underlying implementation to give any insight.

> Except that not only am I getting confused about the state of my  
> branches as I commit partially complete changes to them for the  
> purpose of saving state as I switch between branches, the tools that  
> I use are getting confused as well.

This seems to be a consistent theme in your workflow - "I switch  
branches frequently". This is probably why you're having so much  
trouble with git, and I suggest you ask yourself _why_ you switch  
branches so frequently. It seems to me like you're treating branches  
more like commits - each branch is a single logical change to the  
tree. Even though branches are cheap, I still only have a few around,  
and rarely switch between them. They are there to separate related  
commits, and to provide isolation from other branches. It may be that  
you're a "hacker" and simply work on unrelated code arbitrarily. I do  
that a lot too, but I somehow don't have the same problems you're  
facing.

> Do some google searches.  File renaming in version control systems  
> is a big deal, and for good reason.

Like this? ;)
http://article.gmane.org/gmane.comp.version-control.git/217

> Yes; it's called refactoring.  On a well managed project it doesn't  
> happen often, but it does happen.  Reorganization of subtrees of  
> code is something that source control systems should support well.   
> It's the merging after such reorganization that tools that don't  
> track renames have problems with.  For examples of git failing in  
> this, take a look at the simple scripts I sent out to the list  
> earlier today.

I don't see how renames are relevant in this situation. In fact,  
rename information would probably cause more problems. When you  
refactor, you're moving content, not files. Typically you also change  
that content significantly. I don't see why git would have any problem  
with this - this is actually where git's content tracking shines.

> Git doesn't allow you to rename and change contents of a file at the  
> same time:
>
> [snip]
>
> If I 'git commit' it will take the rename of file to file2, but not  
> the modification of file2.  If I try to add file2 before committing,  
> git status now shows:
>
> [snip]
>
> If you check this in, git will not be able to merge changes to file2  
> into a branch taken before this change.

I just tried a a more complex example that I was sure would result in  
a merge conflict. I basically created "file" and committed. I created  
the "move" branch I moved "file" to "file2", and edited it. At this  
point, `git status` showed something different to your output:

$ git mv file file2
$ vi file2
$ git add file2
$ git status
# On branch move
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#	renamed:    file -> file2
#
$ git commit -m "changed file and renamed as file2"
[move]: created 1fe9963: "changed file and renamed as file2"
  1 files changed, 6 insertions(+), 8 deletions(-)
  rename file => file2 (58%)

I commited the change, and created a branch "change" from master. I  
edited the same parts of the file and commited. Now I merged branch  
"move" into this branch. Naturally I get a conflict:

$ git merge move
Renaming file => file2
Auto-merging file2
CONFLICT (rename/modify): Merge conflict in file2
Automatic merge failed; fix conflicts and then commit the result.
bash-3.2$ git status
file2: needs merge
# On branch changes
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#	deleted:    file
#
# Changed but not updated:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working  
directory)
#
#	unmerged:   file2
#

Even when the file was moved, and a significant amount of it was  
changed (42%), git managed to see that it was moved, and still merged  
"file2" from the "move" branch with "file" from the "change" branch.  
The conflict would have occurred regardless of the move operation.  
Perhaps in a less contrived example the result would be different, but  
it would be an edge case where the renamed file's content really  
doesn't resemble that of the original file.

> My impression of git is that it feels very much like what its  
> history suggests that it is: a tool for managing patches that grew  
> into a source control system.  For better or worse, git feels  
> 'messy' to me, like it wasn't thought out ahead of time but kind of  
> organically grew

I'm sure it was planned out quite well. Linus knew what he hated about  
other SCMs, he had some good ideas about how to improve those areas,  
and he did. Git does everything very well so far, and it's faster than  
any SCM I know about.

> Git has dozens of commands, each with dozens of subtle and tricky  
> options; that seems like needless complexity to me.

Git was made by a developer for developers. Of course the interface  
won't be nice and shiny. The difference between Git's interface and  
other SCMs' interfaces is that Git has it's guts exposed. Fortunately  
there is nice porcelain now.

I suggest that you discuss these problems with the people at #git.  
They seem friendly and they know a _lot_ about git. I'm sure they  
could either explain how to use git to accommodate your workflow, or  
perhaps expose "flaws" in your workflow. At the very least you will  
know if git is for you or not.

Sorry for going so off-topic here. This isn't even about pacman  
anymore :P.