Two Commits That Wrecked the User Experience of Git

Git didn’t have to be so obnoxious, but these two commits set a terrible, unfixable precedent

Dan Fabulich is a Principal Engineer at Redfin. (We’re hiring!)

Git takes time to learn.

Some of the challenge in learning Git is inherent to distributed version control systems, and a lot of it has to do with Git’s “stage” feature (also known as the “index” or “cache”), which some people hate, but a lot of people really like.

But some of the scars on Git’s user experience are self-inflicted. Today, I specifically want to call attention to two commits, by the lead maintainer of Git, that left the deepest scars.

image credit: https://www.theodo.fr/blog/2016/11/revert-the-revert-and-avoid-conflicts/

git checkout does too many things

The “Synopsis” section of the git checkout man page lists six distinct "forms" for the command.

git-checkout - Switch branches or restore working tree files

SYNOPSIS

git checkout [-q] [-f] [-m] [<branch>]
git checkout [-q] [-f] [-m] --detach [<branch>]
git checkout [-q] [-f] [-m] [--detach] <commit>
git checkout [-q] [-f] [-m] [[-b|-B|--orphan] <new_branch>] [<start_point>]
git checkout [-f|--ours|--theirs|-m|--conflict=<style>] [<tree-ish>] [--] <paths> ...
git checkout [-p|--patch] [<tree-ish>] [--] [<paths> ...]

The first rule of Doug McIlroy’s UNIX Philosophy says:

Make each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new “features”.

git checkout has not followed this philosophy very closely.

In fairness, the first four forms of git checkout all kinda do the same thing: they "switch branches," moving the HEAD ref to either a branch or a "detached HEAD" commit. You also have the option to create a branch while switching to it, and that's just fine.

But there is a huge difference between the forms of git checkout that accept <paths> and those that don't. Even worse, when you specify <paths>, there's a major difference in behavior depending on whether you actually supply a <tree-ish> or not.

(Note that the one-line description of git checkout needs to use an "or." "Switch branches OR restore working tree files." One command does two very different things.)

It wasn’t always like this

October 18, 2005: git-checkout: revert specific paths to either index or a given tree-ish

Prior to this commit, git checkout only switched branches. You could optionally pass -b to create the branch while switching to it. It was a nice, tightly scoped command. There was also another tightly scoped command, checkout-index, which you could use to copy files from the index (aka stage/cache) to your working tree.

But in commit 4aaa702, Git’s maintainer gave git checkout a new feature, or perhaps two or three new features, depending on your point of view.

Here, in this commit, git checkout learned to accept <paths>, which would copy files from the stage, exactly like checkout-index. Since checkout-index already existed at this point, if that were all it did, I'd like to think nobody would have bothered to implement this command.

But this commit also taught git checkout a truly new feature: you could now pass both <paths> and a "tree-ish", allowing you to copy files from any branch or commit directly into your working tree without switching branches.

This was when Git started to overload checkout. Git already had a separate command, checkout-index, and I argue that it would have been more appropriate to create another new command for this new behavior. I would have called it checkout-tree.

So git checkout after this commit is really three features, which I'll give names to for the sake of discussion:

checkout-branch: switch branches
checkout-index: copy files from the stage to the working tree
checkout-tree: copy files from a tree-ish to the working tree

Note that when this commit combined all three modes into one command, checkout, it necessitated using -- to disambiguate parameters. When you git checkout x, do you want to switch to a branch named x or copy a file named x from the stage to the working tree? git checkout x y is also ambiguous. Are bothx and y paths that should be copied from the stage, or is x a tree-ish and y the only path?

git checkout x -- switch to a branch x
git checkout -- x copy x from the stage to the working tree
git checkout -- x y copy files x and y from the stage to the working tree 
git checkout x -- y copy the file y from the branch x to the working tree 
git checkout x y -- is illegal

If instead checkout-tree were a separate command, no --would be required; the first argument would be the tree-ish, and all remaining arguments would be paths.

But wait, there’s more! checkout-tree has another hidden feature: it modifies paths in the stage as well as the working tree. Note that in 2005, git reset didn't yet accept <paths> to unstage individual changed files. To unstage anything, you had to unstage everything. Since checkout-tree does modify paths in the stage, you could use git checkout HEAD myfile to discard myfile changes in both the working tree and the stage.

The problem is that these modes for git checkout look too similar. When you're just learning Git, it's challenging to remember the difference between git checkout mybranch, git checkout myfile and git checkout HEAD myfile. Only the last command clobbers the stage, but it doesn't look like that; it looks almost identical to the checkout-index command. There's no --keep-index argument for git checkout.

When learning Git, you just have to remember this stuff.

Aside: This is when git started performing destructive actions without warning

Today, Git has a reputation for being a tool that erases your working tree without warning if you use the wrong command. ("It's like Unix!" people say. "rm doesn't warn you, either!")

But up to this point, the checkout-branch mode of checkout had always been really careful about this. Even back in 2005, if Git detected that switching branches would overwrite local changes in your working tree or your index, it would halt with a useful error.

error: Your local changes to the following files would be overwritten by checkout:
my_file
Please commit your changes or stash them before you switch branches.
Aborting

Even git checkout-index would refuse to do the only thing it's intended to do, copying files from the index onto the working tree, unless you passed a -f parameter to force it. git reset required an extra --hard parameter to clobber your working tree.

Git’s reputation for tough love started in this very commit. Both the checkout-index mode and the checkout-tree mode of checkout perform destructive behavior on your working tree without warning or -f; if you misuse it, there's no way to recover lost uncommitted work.

Et tu, git reset?

checkout-tree was the only way to unstage individual files from the index until Dec 14, 2006, when a <paths> option was added to git reset. It was documented a couple of weeks later on Dec 26.

git-reset [--mixed] <tree> [--] <paths>...
Sometimes it is asked on the list how to revert selected path in
the index from a tree, most often HEAD, without affecting the
files in the working tree. A similar operation that also
affects the working tree files has been available in the form of
"git checkout <tree> -- <paths>...".
By definition --soft would never affect either the index nor the
working tree files, and --hard is the way to make the working
tree files as close to pristine, so this new option is available
only for the default --mixed case.

This was a genuinely useful new feature. It’s great that it became possible to unstage files without clobbering the working tree.

But this commit just overloaded another command!

reset already had a well-defined function, to move a branch ref to a specified commit, rewriting the branch's history. It had an optional side effect to clobber the stage and working tree to match, but from the beginning, reset was a command intended for history rewriting, which should be done with great care.

Again, if Git were following the UNIX philosophy, unstaging files would have been a new command, perhaps git unadd (or, now that people call the cache/index a "stage," perhaps it would be git unstage). Even if it used a more jargony name like reset-index, it would still be an improvement over today's overloaded command.

reset <paths> is an everyday activity. If you run git help with no arguments, you'll find reset in the list of "everyday" commands, which makes sense, because reset is the unstage command. But if you accidentally pass reset a branch name, surprise, it's going to start rewriting the history of your branch!

You or I might have done the same thing

Why did Git’s maintainer, Junio Hamano, decide to combine wildly divergent modes into one command, not once but twice?

Well, I posted a question on the Git mailing list to ask, but the Git mailing list is so noisy that I doubt that anybody will ever answer my question, least of all Hamano. But I can speculate as to why; if I were Hamano, I probably would have made the same mistake.

For what it’s worth, here’s Jeremy Allison’s interview with Junio Hamano back in 2011, explaining why Git is so hard to use.

Hamano: Probably it comes from two reasons. One is, it’s designed to be flexible so that you can use it in any way you want and we don’t give you guidance, good guidance, “this is the one true way of using Git,” because the system is young so we don’t have one. That’s one thing.
Another thing is is because the system wasn’t really designed, but grew organically. So somebody came up with an idea of doing one thing. “Oh, this is a good idea, a good feature; let’s add it to this command as this option name.” And the option name he chooses just gets stuck, but after a few months, somebody else notices, “Oh, this is a similar mode of operation with that existing command. Why are they named differently?” That kind of thing happens all the time.
Allison: The person who wrote it didn’t even know about the other command probably.
Hamano: Yeah.

Nobody really took the time to design Git’s user experience, so simplicity of implementation is probably a significant reason for any given design choice. If you read the diff of the two commits I’ve called out in this post, I think you’ll see that the implementation was pretty straightforward for each of these new features.

But I suspect that there’s another reason Git’s maintainer may have wanted to combine these features into one command.

If you run git help --all, you'll see a message describing the “available git commands in /usr/lib/git-core” (or wherever you have installed Git). If you open that directory, you'll find a lot of files.

For a number of years, when you installed Git, you'd get each and every one of those files on your $PATH. You could type git checkout, but git would just look for git-checkout on the $PATH and run that.

(In fact, this feature still exists today. Create an executable script on your path called git-hello: you can type git hello, and it will run your script. If you want to cook up a git-checkout-tree script, just drop it in your $PATH and Git will run it for you. Have fun!)

I surmise that in 2005 and 2006, there was subtle pressure to avoid creating new commands when each command is a new script, “cluttering” your $PATH. “I have over 100 Git commands in my $PATH! Do I have to deal with all of this crap?!”

But, as Hamano says, once the damage was done, it couldn’t be undone, for two reasons.

  1. Removing features like these would break backwards compatibility.
  2. Git has too many ways to do the same thing. (When should I git checkout HEAD . and when should I git reset --hard?) Creating new “better” commands would make the problem worse.

Sadly, these commits set a tone for the rest of Git's development: packing new features in old commands is A-OK. Since then, every major feature of Git has grown a zillion variations with mutually incompatible command-line arguments. The "normal" thing to do is add new features to an existing command. Thanks to these two commits, it's now a tradition.

That’s the way we’ve always done it.

There’s still hope: EasyGit

EasyGit (eg) is a single-file wrapper for Git, designed to make Git easy to learn and use. At Redfin, we train new developers to use an internal fork of eg, which makes it quicker and easier to get started and productive using Git.

With EasyGit, you can run eg switch to switch branches; eg switch uses git checkout under the hood, but it can never clobber local modified files. You can use eg stage and eg unstage to manage files in your stage; eg unstage uses git reset but it can never rewrite history or clobber your working tree. When you do want to discard local modifications, you can use eg revert.

EasyGit has many more usability features. It’s not just a set of training wheels for Git; it’s in many ways better than raw Git. I dream that one day Git itself will adopt concepts from EasyGit.

(I have weird dreams sometimes.)

Discussion on Hacker News
Discussion on Reddit

P.S. Redfin is hiring.