[pacman-dev] Git's views versus parallel branches (was Re: [PATCH 1/2] Enabled ...)

Mon Feb 23 23:45:40 EST 2009

Thank you for your excellent comments, Sebastian.  I hope I can do them 
justice with my responses, inlined below ...

Sebastian Nowicki wrote:
>
> On 24/02/2009, at 6:10 AM, Bryan Ischo wrote:
>
>> It's not the ability to modify a single file in two different places 
>> at once.  It's the ability to keep changes logically separated by 
>> directory, in a persistent manner that doesn't require git commands 
>> to put changes away and bring them back, that I care about.  I find 
>> it infinitely easier to keep track of what I am doing by persistently 
>> retaining directory contents than by having a single working view and 
>> everything else being stashed away to be retrieved later.
>
> I apologise if I'm missing something, but what's the difference 
> between "cd ../branch" and "git checkout branch"? The state of your 
> "working directory" is changed. You still have only one view of the 
> directory. You can obviously view files in the other directory, but 
> that's also possible with git, albeit somewhat harder if using a file 
> manager. There are many GUI front-ends that allow you to quickly look 
> at other branches and commits. What's the between "ls project-root" 
> and "git branch"? Both list the "branches", both allow you to switch 
> to that branch (cd, git checkout). I really don't see the difference, 
> besides a clean project directory, in git's case.

The difference is subtle.  Obviously you can work in both ways, and 
apparently, many people on this list like using git commands instead of 
'normal' filesystem navigation for visiting files in their branches.  I 
can think of a few ways to expand upon my thoughts about the 'git way' 
versus the 'Perforce way':

An analogy: using git is kind of like keeping every directory in your 
home directory in a separate tar file, except for one untarred working 
directory.  Whenever you want to cd into a new directory, you have to 
tar up the directory you were just in, and then untar the directory you 
want to cd into.  Only the current working directory (and everything 
under it) can be untarred, everything else has to be tarred up if you're 
not in it.  Would you like to maintain your home directory this way?  It 
sounds like a major pain to me.  Although having to issue tar and untar 
commands constantly while you are working with files in your home 
directory doesn't sound all that bad, in practice, it would be so much 
less convenient than if all of your files were untarred all the time and 
you could just look through them without having to manage the tar 
files.  I suppose if someone has only ever used a system where they had 
to constantly tar up and untar directories, they wouldn't think anything 
of it (and would think that a command like 'cd-stash' which tars up your 
cwd and untars some other tarred directory and cd's into it would be 
really cool), but if you've had the 'freedom' of just working with your 
files without such encumbrances, you'll really hate having to do it.

An example of what I can do with parallel branches: if my branch 'new' 
and 'old' were stored in two separate subdirectories, then I could grep 
through all of the files in both branches with one command and see 
collated results; or could diff only files in which a given identifier 
appeared only in a file on one branch but not the corresponding file on 
another.

Another example is the compiled results of each tree, as I mentioned 
before, which I can see and compare in place if my branches are in 
different subtrees, but which requires extra copying around of files and 
other management if I am using git.

Another problem with git: I have to constantly rebuild stuff when I 
stash and unstash because my build directory doesn't stash.

It's interesting that you note that viewing files in other branches with 
git is harder if you are using a file manager.  That's exactly the point 
I am trying to make: file managers and other tools (scripting languages, 
diffing tools, text searching and processing tools, etc) all work based 
on the standard Unix paradigm of "everything is a file".  Git works on a 
paradigm of "everything in your current branch is a file, everything 
else is accessible only as the output of a complex git command".  I'd 
rather use a paradigm that thousands of tools already depend on, than 
the special case paradigm that is git.

A few more points: I don't think that "many GUI front-ends" being 
available to help me manage my branches is better than a system that 
doesn't need any GUI front-ends to make the process palatable.  And, I'm 
not sure why what git does is any 'cleaner' than keeping branches in 
separate directories.  If anything, git seems messier to me because some 
files get changed in-place as you switch branches in git, and some files 
are ignored and left as they are (those that aren't actually tracked by 
git), and distinguishing between the two types requires git commands and 
lots of mental notes.

>> Parallel branch directories have an advantage over git's branch views 
>> whenever you need to compare the contents of branches.
>
> False. As mentioned earlier there are GUI tools which make this 
> simple. If you don't like GUIs, you can use the command line 
> equivalents (most tools execute git commands anyway). I don't know 
> what these are since I've never had the need to compare two branches 
> beyond `git diff`.

Well, I think that the fact that you have to qualify your statement by 
saying that it's easy if you use special GUIs, and otherwise doable with 
command line equivalents, exactly makes my point.

There are many more ways to compare branches than just 'git diff'.  You 
can compare the result of building both branches, you can do a grep over 
multiple branches at once to find identifiers, you can count line 
numbers for entire branches if you like to see how much bigger your code 
base is in one branch than another ... these things can all be done with 
git too, but you have to run many git commands to get the views of the 
branches that you need when you need them, whereas if they all live in 
separate subdirectories, there are no commands to run at all to get the 
files ... they're just there.

>
>> Maybe it's because I'm an emacs [...] [and] keeping track of [...] 
>> what sequence of [...] commands I need [...] is just more mental 
>> effort than I want to undertake.
>
> *cheap shot alert*
> You use emacs, yet remembering commands is too much of an effort? I 
> know I twisted your words a lot, and I'm not hating on emacs, but you 
> have to admit that that is somewhat hypocritical.

It wasn't meant to be hypocritical.  I was trying to allude to the fact 
that when using vi, you have to remember much more state about what you 
are doing (what mode am I in? insert mode? delete mode? what file am I 
working on? what line am I on? etc etc) than with emacs; I think this is 
one of the fundamental differences between vi and emacs.  I could be 
wrong though, I haven't used vi extensively, just enough to make minor 
edits to files on the way to getting emacs installed :)  But assuming 
that this is true, then it was just the fact that vi users are used to 
keeping more state about what they are doing in their head that makes 
git seem natural.  Perhaps I should have said 'ed' instead of 'vi' ...

Note that I'm not talking about remembering what commands do what 
(certainly emacs has tons of commands to remember), I'm talking about 
keeping track of working state as you are using the tool.  git seems to 
require keeping a mental model of branches that you can't even "see" 
because they aren't in your filesystem anywhere.

>
>>  I find it so much easier to just leave a branch subdirectory and 
>> when I return to it later, it is guaranteed to be exactly as it was 
>> when I left, without any effort on my part.  If I am working on 4 or 
>> 5 bugs in parallel (which I have certainly done at work, where 
>> working on just 1 or 2 bugs at once would be inefficient because of 
>> the downtime associated with building each tree) I can't even imagine 
>> using git stashes to sanely keep track of everything.
>
> This is exactly what branches are for. The exact same thing can be 
> said for git branches. It's guaranteed to be exactly as it was when 
> you switched to another branch. `git stash` should only be used when 
> something is not ready to be committed, but you _urgently_ need to do 
> something else, like a bug fix on the maintenance branch.

Except that not only am I getting confused about the state of my 
branches as I commit partially complete changes to them for the purpose 
of saving state as I switch between branches, the tools that I use are 
getting confused as well.  I may have editors and other tools open for 
files whose contents suddenly change when I git checkout to a different 
branch. For many tools, this is not a big deal, but I think it 
illustrates the subtle problems that such an approach introduces.  And 
since I encapsulate part of the state of "what I'm doing" in the state 
of the tools that I am using, confused state in those tools can often 
confuse me as well.

With git, I can't switch to another branch unless I either a) commit the 
changes (which I may not be ready to commit yet), or b) stash the 
changes.  Committing or stashing take extra work on my part.  Why should 
I have to do this work?

>
>>>> - Lack of rename tracking.  Yeah, I know, git claims that it can do 
>>>> it after
>>>> the fact when examining change histories but I've tried various 
>>>> scenarios
>>>> and it just doesn't work very well, and even when it does, requires 
>>>> stupidly
>>>> complex options to git commands to enable git to discover renames 
>>>> in the
>>>> history correctly
>
> I can't think of a situation where the file name is relevant. Even 
> when renaming...

Do some google searches.  File renaming in version control systems is a 
big deal, and for good reason.

>
>> The problem comes when someone, in a branch, renames a file, and then 
>> tries to merge their changes into another branch in which the file 
>> was not renamed.
>
> This would only be a problem if the file was not only renamed, but 
> also _changed_, and significantly at that. In this case git would only 
> see that, say, 60% of the file content was moved. I'm not sure how 
> merging would work, since I have never worked on a branch when a file 
> was moved (and changed) in another.

Yes; it's called refactoring.  On a well managed project it doesn't 
happen often, but it does happen.  Reorganization of subtrees of code is 
something that source control systems should support well.  It's the 
merging after such reorganization that tools that don't track renames 
have problems with.  For examples of git failing in this, take a look at 
the simple scripts I sent out to the list earlier today.

>
>>  Unless file renames are tracked, the merge becomes very difficult.
>
> Not at all. If git sees that the file content was _moved_ (not 
> changed), it should be able to figure that out easily. Again, I 
> haven't actually done this, but I don't see why it wouldn't work. I 
> would suggest asking about this on #git (or the git ML). If it is 
> indeed a problem then file a bug. I'm sure Linus would be happy to 
> comment on it ;).

It's not just moving.  It's moving and changing that git has a problem with.

>
>>  Refactoring a subsystem on a 'workbranch' is something that is done 
>> sometimes on large projects,  and with git, I would expect that to be 
>> basically impossible to do sanely.  Even if git's 'detect renames 
>> while examining history' technique did work, it still makes renames 
>> cumbersome, because you can't rename a file and change its contents 
>> at the same time or else git has almost no chance of detecting the 
>> rename via history.  And if you can't change a file and rename it at 
>> the same time, then you can't, for example, properly rename a Java 
>> class, because the class name and file name have to be the same.
>
> Why not? If you change the file contents and rename it, then obviously 
> you'd also change the class name. Why else would you rename it?

Git doesn't allow you to rename and change contents of a file at the 
same time:

$ git mv file file2
$ echo "changed file" > file2
$ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#       renamed:    file -> file2
#
# Changed but not updated:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working 
directory)
#
#       modified:   file2
#

If I 'git commit' it will take the rename of file to file2, but not the 
modification of file2.  If I try to add file2 before committing, git 
status now shows:

# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#       deleted:    file
#       new file:   file2

If you check this in, git will not be able to merge changes to file2 
into a branch taken before this change.

>
>> It just shouldn't be that hard.
>
> Why not? I can't imagine other SCMs doing this any better. If a file 
> contents changes drastically, it doesn't matter if the name of the 
> file is tracked. The name of the file is irrelevant. A merge conflict 
> would arise even if the file was never renamed.

Perforce does it better.  It is certainly possible to:

* Rename bunches of files on a branch as part of a code refactoring 
effort and change parts of those files to match (such as class names, etc)
* Make bug fixes to the original files in a different branch
* Merge those changes together on either branch in a way that makes 
sense and doesn't produce conflicts (assuming that the individual 
changes were not conflicting, which is often the case when one branch is 
doing minor bugfixes and the other is doing more structural work)

>
> I don't mean to contradict everything you say, it's just that I 
> haven't had the same experience with git as you. Using git has been 
> amazing. It does everything I want, it's sophisticated, it merges code 
> well, and it has some very powerful features (like rebase).

I'm glad you like git so much, alot of people do.  I'm not saying I 
don't like git, I'm just saying that there are a few things that I think 
suck about git.  That's how this discussion got started.  But alot of 
people defend git with great vigor if anything critical is said of it, 
and I don't understand the fervor.

My impression of git is that it feels very much like what its history 
suggests that it is: a tool for managing patches that grew into a source 
control system.  For better or worse, git feels 'messy' to me, like it 
wasn't thought out ahead of time but kind of organically grew as people 
realized that certain basic features could be twisted this way or that 
way to add the equivalent of standard source control functionality.  Git 
has dozens of commands, each with dozens of subtle and tricky options; 
that seems like needless complexity to me.  That's just my impression, 
take it for what it's worth, which is not much.

>
> By the way, Mercurial seems faster than Bazaar (though I haven't used 
> either much), and both are written in Python. Mercurial might not be 
> pure python though, I am unsure.

I really like what I read on the bazaar web pages; it feels more 
coherently designed than git, and much simpler to use.  But it worries 
me that it's had significant performance problems on larger projects.

Thanks,
Bryan

>