[pacman-dev] Git's views versus parallel branches (was Re: [PATCH 1/2] Enabled ...)
Bryan Ischo
bji-keyword-pacman.3644cb at www.ischo.com
Mon Feb 23 23:45:40 EST 2009
Thank you for your excellent comments, Sebastian. I hope I can do them
justice with my responses, inlined below ...
Sebastian Nowicki wrote:
>
> On 24/02/2009, at 6:10 AM, Bryan Ischo wrote:
>
>> It's not the ability to modify a single file in two different places
>> at once. It's the ability to keep changes logically separated by
>> directory, in a persistent manner that doesn't require git commands
>> to put changes away and bring them back, that I care about. I find
>> it infinitely easier to keep track of what I am doing by persistently
>> retaining directory contents than by having a single working view and
>> everything else being stashed away to be retrieved later.
>
> I apologise if I'm missing something, but what's the difference
> between "cd ../branch" and "git checkout branch"? The state of your
> "working directory" is changed. You still have only one view of the
> directory. You can obviously view files in the other directory, but
> that's also possible with git, albeit somewhat harder if using a file
> manager. There are many GUI front-ends that allow you to quickly look
> at other branches and commits. What's the between "ls project-root"
> and "git branch"? Both list the "branches", both allow you to switch
> to that branch (cd, git checkout). I really don't see the difference,
> besides a clean project directory, in git's case.
The difference is subtle. Obviously you can work in both ways, and
apparently, many people on this list like using git commands instead of
'normal' filesystem navigation for visiting files in their branches. I
can think of a few ways to expand upon my thoughts about the 'git way'
versus the 'Perforce way':
An analogy: using git is kind of like keeping every directory in your
home directory in a separate tar file, except for one untarred working
directory. Whenever you want to cd into a new directory, you have to
tar up the directory you were just in, and then untar the directory you
want to cd into. Only the current working directory (and everything
under it) can be untarred, everything else has to be tarred up if you're
not in it. Would you like to maintain your home directory this way? It
sounds like a major pain to me. Although having to issue tar and untar
commands constantly while you are working with files in your home
directory doesn't sound all that bad, in practice, it would be so much
less convenient than if all of your files were untarred all the time and
you could just look through them without having to manage the tar
files. I suppose if someone has only ever used a system where they had
to constantly tar up and untar directories, they wouldn't think anything
of it (and would think that a command like 'cd-stash' which tars up your
cwd and untars some other tarred directory and cd's into it would be
really cool), but if you've had the 'freedom' of just working with your
files without such encumbrances, you'll really hate having to do it.
An example of what I can do with parallel branches: if my branch 'new'
and 'old' were stored in two separate subdirectories, then I could grep
through all of the files in both branches with one command and see
collated results; or could diff only files in which a given identifier
appeared only in a file on one branch but not the corresponding file on
another.
Another example is the compiled results of each tree, as I mentioned
before, which I can see and compare in place if my branches are in
different subtrees, but which requires extra copying around of files and
other management if I am using git.
Another problem with git: I have to constantly rebuild stuff when I
stash and unstash because my build directory doesn't stash.
It's interesting that you note that viewing files in other branches with
git is harder if you are using a file manager. That's exactly the point
I am trying to make: file managers and other tools (scripting languages,
diffing tools, text searching and processing tools, etc) all work based
on the standard Unix paradigm of "everything is a file". Git works on a
paradigm of "everything in your current branch is a file, everything
else is accessible only as the output of a complex git command". I'd
rather use a paradigm that thousands of tools already depend on, than
the special case paradigm that is git.
A few more points: I don't think that "many GUI front-ends" being
available to help me manage my branches is better than a system that
doesn't need any GUI front-ends to make the process palatable. And, I'm
not sure why what git does is any 'cleaner' than keeping branches in
separate directories. If anything, git seems messier to me because some
files get changed in-place as you switch branches in git, and some files
are ignored and left as they are (those that aren't actually tracked by
git), and distinguishing between the two types requires git commands and
lots of mental notes.
>> Parallel branch directories have an advantage over git's branch views
>> whenever you need to compare the contents of branches.
>
> False. As mentioned earlier there are GUI tools which make this
> simple. If you don't like GUIs, you can use the command line
> equivalents (most tools execute git commands anyway). I don't know
> what these are since I've never had the need to compare two branches
> beyond `git diff`.
Well, I think that the fact that you have to qualify your statement by
saying that it's easy if you use special GUIs, and otherwise doable with
command line equivalents, exactly makes my point.
There are many more ways to compare branches than just 'git diff'. You
can compare the result of building both branches, you can do a grep over
multiple branches at once to find identifiers, you can count line
numbers for entire branches if you like to see how much bigger your code
base is in one branch than another ... these things can all be done with
git too, but you have to run many git commands to get the views of the
branches that you need when you need them, whereas if they all live in
separate subdirectories, there are no commands to run at all to get the
files ... they're just there.
>
>> Maybe it's because I'm an emacs [...] [and] keeping track of [...]
>> what sequence of [...] commands I need [...] is just more mental
>> effort than I want to undertake.
>
> *cheap shot alert*
> You use emacs, yet remembering commands is too much of an effort? I
> know I twisted your words a lot, and I'm not hating on emacs, but you
> have to admit that that is somewhat hypocritical.
It wasn't meant to be hypocritical. I was trying to allude to the fact
that when using vi, you have to remember much more state about what you
are doing (what mode am I in? insert mode? delete mode? what file am I
working on? what line am I on? etc etc) than with emacs; I think this is
one of the fundamental differences between vi and emacs. I could be
wrong though, I haven't used vi extensively, just enough to make minor
edits to files on the way to getting emacs installed :) But assuming
that this is true, then it was just the fact that vi users are used to
keeping more state about what they are doing in their head that makes
git seem natural. Perhaps I should have said 'ed' instead of 'vi' ...
Note that I'm not talking about remembering what commands do what
(certainly emacs has tons of commands to remember), I'm talking about
keeping track of working state as you are using the tool. git seems to
require keeping a mental model of branches that you can't even "see"
because they aren't in your filesystem anywhere.
>
>> I find it so much easier to just leave a branch subdirectory and
>> when I return to it later, it is guaranteed to be exactly as it was
>> when I left, without any effort on my part. If I am working on 4 or
>> 5 bugs in parallel (which I have certainly done at work, where
>> working on just 1 or 2 bugs at once would be inefficient because of
>> the downtime associated with building each tree) I can't even imagine
>> using git stashes to sanely keep track of everything.
>
> This is exactly what branches are for. The exact same thing can be
> said for git branches. It's guaranteed to be exactly as it was when
> you switched to another branch. `git stash` should only be used when
> something is not ready to be committed, but you _urgently_ need to do
> something else, like a bug fix on the maintenance branch.
Except that not only am I getting confused about the state of my
branches as I commit partially complete changes to them for the purpose
of saving state as I switch between branches, the tools that I use are
getting confused as well. I may have editors and other tools open for
files whose contents suddenly change when I git checkout to a different
branch. For many tools, this is not a big deal, but I think it
illustrates the subtle problems that such an approach introduces. And
since I encapsulate part of the state of "what I'm doing" in the state
of the tools that I am using, confused state in those tools can often
confuse me as well.
With git, I can't switch to another branch unless I either a) commit the
changes (which I may not be ready to commit yet), or b) stash the
changes. Committing or stashing take extra work on my part. Why should
I have to do this work?
>
>>>> - Lack of rename tracking. Yeah, I know, git claims that it can do
>>>> it after
>>>> the fact when examining change histories but I've tried various
>>>> scenarios
>>>> and it just doesn't work very well, and even when it does, requires
>>>> stupidly
>>>> complex options to git commands to enable git to discover renames
>>>> in the
>>>> history correctly
>
> I can't think of a situation where the file name is relevant. Even
> when renaming...
Do some google searches. File renaming in version control systems is a
big deal, and for good reason.
>
>> The problem comes when someone, in a branch, renames a file, and then
>> tries to merge their changes into another branch in which the file
>> was not renamed.
>
> This would only be a problem if the file was not only renamed, but
> also _changed_, and significantly at that. In this case git would only
> see that, say, 60% of the file content was moved. I'm not sure how
> merging would work, since I have never worked on a branch when a file
> was moved (and changed) in another.
Yes; it's called refactoring. On a well managed project it doesn't
happen often, but it does happen. Reorganization of subtrees of code is
something that source control systems should support well. It's the
merging after such reorganization that tools that don't track renames
have problems with. For examples of git failing in this, take a look at
the simple scripts I sent out to the list earlier today.
>
>> Unless file renames are tracked, the merge becomes very difficult.
>
> Not at all. If git sees that the file content was _moved_ (not
> changed), it should be able to figure that out easily. Again, I
> haven't actually done this, but I don't see why it wouldn't work. I
> would suggest asking about this on #git (or the git ML). If it is
> indeed a problem then file a bug. I'm sure Linus would be happy to
> comment on it ;).
It's not just moving. It's moving and changing that git has a problem with.
>
>> Refactoring a subsystem on a 'workbranch' is something that is done
>> sometimes on large projects, and with git, I would expect that to be
>> basically impossible to do sanely. Even if git's 'detect renames
>> while examining history' technique did work, it still makes renames
>> cumbersome, because you can't rename a file and change its contents
>> at the same time or else git has almost no chance of detecting the
>> rename via history. And if you can't change a file and rename it at
>> the same time, then you can't, for example, properly rename a Java
>> class, because the class name and file name have to be the same.
>
> Why not? If you change the file contents and rename it, then obviously
> you'd also change the class name. Why else would you rename it?
Git doesn't allow you to rename and change contents of a file at the
same time:
$ git mv file file2
$ echo "changed file" > file2
$ git status
# On branch master
# Changes to be committed:
# (use "git reset HEAD <file>..." to unstage)
#
# renamed: file -> file2
#
# Changed but not updated:
# (use "git add <file>..." to update what will be committed)
# (use "git checkout -- <file>..." to discard changes in working
directory)
#
# modified: file2
#
If I 'git commit' it will take the rename of file to file2, but not the
modification of file2. If I try to add file2 before committing, git
status now shows:
# On branch master
# Changes to be committed:
# (use "git reset HEAD <file>..." to unstage)
#
# deleted: file
# new file: file2
If you check this in, git will not be able to merge changes to file2
into a branch taken before this change.
>
>> It just shouldn't be that hard.
>
> Why not? I can't imagine other SCMs doing this any better. If a file
> contents changes drastically, it doesn't matter if the name of the
> file is tracked. The name of the file is irrelevant. A merge conflict
> would arise even if the file was never renamed.
Perforce does it better. It is certainly possible to:
* Rename bunches of files on a branch as part of a code refactoring
effort and change parts of those files to match (such as class names, etc)
* Make bug fixes to the original files in a different branch
* Merge those changes together on either branch in a way that makes
sense and doesn't produce conflicts (assuming that the individual
changes were not conflicting, which is often the case when one branch is
doing minor bugfixes and the other is doing more structural work)
>
> I don't mean to contradict everything you say, it's just that I
> haven't had the same experience with git as you. Using git has been
> amazing. It does everything I want, it's sophisticated, it merges code
> well, and it has some very powerful features (like rebase).
I'm glad you like git so much, alot of people do. I'm not saying I
don't like git, I'm just saying that there are a few things that I think
suck about git. That's how this discussion got started. But alot of
people defend git with great vigor if anything critical is said of it,
and I don't understand the fervor.
My impression of git is that it feels very much like what its history
suggests that it is: a tool for managing patches that grew into a source
control system. For better or worse, git feels 'messy' to me, like it
wasn't thought out ahead of time but kind of organically grew as people
realized that certain basic features could be twisted this way or that
way to add the equivalent of standard source control functionality. Git
has dozens of commands, each with dozens of subtle and tricky options;
that seems like needless complexity to me. That's just my impression,
take it for what it's worth, which is not much.
>
> By the way, Mercurial seems faster than Bazaar (though I haven't used
> either much), and both are written in Python. Mercurial might not be
> pure python though, I am unsure.
I really like what I read on the bazaar web pages; it feels more
coherently designed than git, and much simpler to use. But it worries
me that it's had significant performance problems on larger projects.
Thanks,
Bryan
>
More information about the pacman-dev
mailing list