Visualize Merge History with git log --graph, --first-parent, and --no-merges

Git merges can be complicated, but these arcane parameters can help.

Dan Fabulich
Code Red
Published in
12 min readAug 30, 2017

--

Dan Fabulich is a Principal Engineer at Redfin. (We’re hiring!)

git log can display surprisingly confusing results when the history contains merges. In this post, I’ll walk you through a few parameters that can help clear things up.

I’ll start by showing how the default “chronological” order of git log can mislead you when the history contains merge commits. Then I’ll show how you can visualize Git merge history using git log --graph, and how to see the “true” history of a single branch using --first-parent. I’ll end by giving an example where --first-parent doesn’t do what you’d want. In those cases, --no-merges may yield better results.

By the time I’m done, I hope not only to teach you about a few useful parameters to git log, but to deepen your understanding of Git as a whole. In my experience at Redfin, developers frequently reach out to a local Git expert when they have a confusing experience with a Git merge. (“OMG I messed up my merge and now everything’s broken!!!1!!”)

In troubled times like these, Git wizards can use advanced git log parameters to cast Magic Missile at the darkness.

Yes, I know what a rebase is: a defensive disclaimer

Before I continue, I should point out that Git makes it possible to eliminate merge commits with git rebase, and somebody’s going to read this post and run to Hacker News to say how stupid I am, because you should only use git rebase and you should never have merge commits.

Not you, dear reader, but I think we all know who it is I’m talking about.

OK, it might be you. If it is you, please try to hold your breath until the end of this section.

This post is long enough as it is, so I don’t want to use space in this blog post to discuss when to rebase and when to merge. There are a million blog posts about that already. There are some smart people who say you should never have merge commits, and some other smart people who do lots of merge commits. For example, the Linux git repo has lots of interesting merges in its history.

IMO, the general consensus is that sometimes we should merge, and sometimes we should rebase, but there isn’t always a good consensus on which cases are which. Participating effectively in this discussion requires a good understanding of how Git handles merges.

So, for the purpose of this post, I would like you to imagine, for the sake of discussion, that a group of developers might want to do a lot of merges, even though they know what rebasing means, and that sometimes they’d want to analyze some complex merge commits.

“Chronological” ordering: time is an illusion; log time, doubly so

Let’s start with an example repository. You can generate a similar repo with this script.

The script creates a repository with just a few merges, like this:

  1. We start by creating three branches off of the master branch: branch1, branch2, and branch3.
  2. Make a commit directly on master.
  3. Switch to branch1, and commit.
  4. Switch to branch2, and commit.
  5. Merge branch2 back to the master branch.
  6. Switch to branch3 and commit.
  7. Merge branch1 to master.
  8. Merge branch3 to master last.

(We sleep for one second before each commit, so each commit gets a visibly different timestamp.)

Here’s what we see when we run git log --pretty="format:%h %ar %s" (that “pretty” argument says to show the commit hash, the relative timestamp, and the commit message, all on one line per commit; despite the name, it’s not that pretty).

8aec370 0 seconds ago Merge branch 'branch3'
b7b4b7c 1 second ago Merge branch 'branch1'
f88c7ba 2 seconds ago branch 3
7b79ec5 3 seconds ago Merge branch 'branch2'
accf1ce 4 seconds ago branch 2
974b6d7 5 seconds ago branch 1
a26aed9 6 seconds ago commit directly on master
2d56476 7 seconds ago initial

As you can see, git log prefers to show the commits in chronological order (--date-order). Those are the dates that the commits were created.

But, even though we're running git log on the master branch, this is not the chronological history of master. If these timestamps were days ago instead of seconds ago, you might mistakenly believe that the “branch 1” commit 974b6d7 was in master five days ago, when in fact it only merged in yesterday. You might also think that the “branch 1” commit 974b6d7 landed on master before the “branch 2” commit accf1ce, but the reverse is true; accf1c merged to master before the “branch 1” commit 974b6d7.

Merge commits: when one parent commit loves another parent commit very, very much

git log has a tool you can use to visualize all of this merging, --graph. The output looks like this:

*   8aec370 0 seconds ago Merge branch 'branch3'
|\
| * f88c7ba 2 seconds ago branch 3
* | b7b4b7c 1 second ago Merge branch 'branch1'
|\ \
| * | 974b6d7 5 seconds ago branch 1
| |/
* | 7b79ec5 3 seconds ago Merge branch 'branch2'
|\ \
| * | accf1ce 4 seconds ago branch 2
| |/
* | a26aed9 6 seconds ago commit directly on master
|/
* 2d56476 7 seconds ago initial

This graph shows not only the commits (as asterisks *) but also their “parent” commits. Most commits—“ordinary” commits — have only one parent: the last commit of the branch you were on when the commit was created. In the above example, the initial commit 2d56476 is the only parent of commit a26aed9.

(The initial commit in a repository, the “root commit,” has no parents. It’s possible for git repositories to have multiple root commits, typically due to errors rewriting Git history. The moral of this story is to avoid time travel whenever possible.)

When you merge two branches, you’re creating a “merge” commit with two parents: the last commit of the branch you’re on, and the last commit of the branch you’re merging in. In the graph above, 8aec370 is a merge commit with two parents: b7b4b7c (the last commit on master at the time) and f88c7ba (the last commit on branch3). See how the merge commit 8aec370 has two lines sticking out of the bottom, whereas f88c7ba has only one? No? Well, scroll up and look at the graph again. This is important!

It’s also possible to perform “octopus” merges in git, which have more than two parents. There’s a commit in the Linux repo with 66 parents. Linus Torvalds said, when discussing that merge,

It’s pulled, and it’s fine, but there’s clearly a balance between “octopus merges are fine” and “Christ, that’s not an octopus, that’s a Cthulhu merge”.

That’s a lot of tentacles. image credit: flickr

Note that git log --graph does not show the commits in chronological order. The git help man pages call it --topo-order, topological ordering. “Show no parents before all of its children are shown, and avoid showing commits on multiple lines of history intermixed.”

(Did you know that “topological sorting” has essentially nothing to do with the modern mathematical definition of “topology”? I bet you do know that the definition of “topological sorting” has nothing to do with any Git problem you’re trying to solve. Git’s help pages are full of technical jargon like this, terms that are technically correct but obscure the meaning of the text rather than enlightening the reader.)

Use --graph as little as possible

Using git log --graph can help, if you know about parent commits and you know how to read it, but it’s still not very easy to understand the visualization as a whole; it would be a completely illegible mess with just a few more merges in it.

Don’t even try to visualize the entire messy history of master when there are a bunch of merges on it. Visualization is a powerful mental technique, but visual aids can only really represent a few dozen things before they become as complicated as the thing you were trying to understand in the first place.

And when analyzing messy merges, it’s not just the commits we’re trying to visualize, but the lines connecting the commits (the “edges” connecting the “nodes,” in graph-theory terms). We can only visualize a few dozen of those, and that typically means we can visualize only a handful of merge commits at a time.

If you have a bunch of merges into the master branch, you’ll find that the history of master isn’t a straight line of history; it’s more like one of those slashy fanfics in which the Amazing Spider-Man and Dr. Octopus have cybernetic octo-spider babies.

Instead of trying to understand the entire graph, it’s better to look at the history of master itself, in isolation. That’s what we’ll do in the next section.

It’s more like one of those slashy fanfics in which the Amazing Spider-Man and Dr. Octopus have cybernetic octo-spider babies.

The “first parent” is the true lineage of master (usually)

In an ideal world, you’d be able to say to Git, “Show me just the commits that were created on the master branch.” But, for legacy reasons, Git commits don’t record the name of the branch on which commits are created. (In my example, I embedded that information in the commit message to make it easier to understand.)

From Git’s perspective, by the time all of those merges are done, all of those commits are “on” the master branch. That’s why it has to show you all of them when you ask for the history of master.

But look at the graph again.

*   8aec370 0 seconds ago Merge branch 'branch3'
|\
| * f88c7ba 2 seconds ago branch 3
* | b7b4b7c 1 second ago Merge branch 'branch1'
|\ \
| * | 974b6d7 5 seconds ago branch 1
| |/
* | 7b79ec5 3 seconds ago Merge branch 'branch2'
|\ \
| * | accf1ce 4 seconds ago branch 2
| |/
* | a26aed9 6 seconds ago commit directly on master
|/
* 2d56476 7 seconds ago initial

See the commit asterisks that appear on the left-hand rail? (I’ve bolded those lines.) Those commits are the ones that were “really” on the master branch, aren’t they? How did Git know to put those commits all in a straight line like that, if all of the commits are equal in the eyes of the master branch?

It turns out that, just like real children, Git doesn’t treat a merge commit’s two parents equally; merge commits have a “first parent” and a “second parent.” The “first parent” is the branch you were already on when you typed git merge (or git pull or whatever caused the merge). The “second parent” is the branch that you were pulling in.

Here’s what it says when you git show 8aec370 in our example repository.

commit 8aec37089204c7ec5d280779cdcfe5e378026c65
Merge: b7b4b7c f88c7ba
Author: Dan Fabulich <dan.fabulich@redfin.com>
Date: Wed Mar 15 22:37:25 2017 -0700
Merge branch 'branch3'

See that “Merge” line? It’s showing you the two parents of the merge commit, in order. The first parent was b7b4b7c and the second parent was f88c7ba.

Here’s what we see when we git log --first-parent.

8aec370 0 seconds ago Merge branch 'branch3'
b7b4b7c 1 second ago Merge branch 'branch1'
7b79ec5 3 seconds ago Merge branch 'branch2'
a26aed9 6 seconds ago commit directly on master
2d56476 7 seconds ago initial

--first-parent instructs git log to log only the first parent of each commit, ignoring all other parents and their parents (their “ancestors”). Since the first parent is the parent that was already on master at the time the merge was performed, looking at the first parent can reveal the “true history” of the master branch. The first-parent lineage shows you what you would have gotten if you'd peeked at the master branch at a particular point in time.

That’s about as close as you can get to viewing the history of the master branch in isolation. (But, as we’ll see in a moment, there are problems with using --first-parent this way.)

Here’s how the Git help page describes the --first-parent parameter:

Follow only the first parent commit upon seeing a merge commit. This option can give a better overview when viewing the evolution of a particular topic branch, because merges into a topic branch tend to be only about adjusting to updated upstream from time to time, and this option allows you to ignore the individual commits brought in to your history by such a merge.

As always, Git’s help page is technically correct, but useless without a thousand words of context.

Beware: fast-forward merges can mix up parent order

Consider this sample script.

git init --bare origin
git clone origin clone1
cd clone1
echo 0 > file.txt
git add file.txt
git commit -am "initial commit"
git push origin master
git checkout -b branch1
git push origin branch1
cd ..
git clone origin clone2
cd clone1
git checkout branch1
echo 1 > file.txt
git commit -am "1 (in clone1)"
git push origin branch1
cd ../clone2
git checkout branch1
echo 2 > file2.txt
git add file2.txt
git commit -m "2 (in clone2)"
git pull --no-edit origin branch1
git push origin branch1

In this sample, we create an origin repository and create two clones clone1 and clone2. (Imagine that these repositories each belonged to a different engineer.)

In clone1, we create the initial commit and then an additional commit on branch1 and push it to the origin repository. clone2 only contains the initial commit, at first.

When we create a commit on branch1 in clone2, we pull from the origin repository, creating a merge commit, and then immediately push our repository. (This is a very common approach for teams of engineers working directly on the same feature branch, aka topic branch.)

The pull creates a merge commit with a confusing message:

Merge branch 'branch1' of /tmp/origin into branch1

Yes, we’re merging branch1 into branch1! Specifically, we’re merging origin’s branch1 into clone2’s branch1.

The push from the local master to the remote is a “fast-forward” merge. In this type of merge, Git skips creating a merge commit, and instead moves origin’s master branch pointer to point directly to the latest commit from clone2.

Now what happens when we git log --first-parent?

$ git log --first-parent --oneline
83ac7af Merge branch 'branch1' of /tmp/origin into branch1
493e104 2 (in clone2)
8a5e558 initial commit

Uh oh! That’s not the true history of origin’s branch1. Why not? Let’s look at the --graph output:

$ git log --graph --oneline
* 83ac7af Merge branch 'branch1' of /tmp/origin into branch1
|\
| * 572edba 1 (in clone1)
* | 493e104 2 (in clone2)
|/
* 8a5e558 initial commit

The first parent of the merge commit was clone2’s commit; origin’s branch1’s last commit is the second parent. That’s not what we wanted.

Typically when this happens, --graph makes it pretty clear what’s happening, if you can recognize it. (Any time I see a commit message about merging a branch into itself, e.g. Merge branch X into X, I remember “oh, yeah, that.”)

When fast-forward merges result in mixed-up parents, git log --graph may be the simplest accurate view of the history. If the team working on the branch is small, the graph output should be pretty manageable.

Alternately, if the team working on your branch doesn’t care about merges to the branch, and especially if you don’t care about the order of the commits, you might prefer to ignore merges completely with git log --no-merges.

$ git log --no-merges --oneline
493e104 2 (in clone2)
572edba 1 (in clone1)
8a5e558 initial commit

It’s in chronological order, which obscures the “true” history of the branch, but if your team doesn’t care about that, then neither do I!

In my experience,--no-merges works well only in small, shared branches. But what if your team shares master in this way, with lots of engineers regularly pulling and pushing directly to master?

As I understand it, the general consensus is that in that case, your history will be inherently hard to read, and so you should not use Git in that way. There two better approaches, depending on whether you prefer to merge or rebase:

  1. In one approach, every change going into the master branch should use a pull request, which creates a well-formed merge commit.
  2. In another approach, engineers should pull with git pull --rebase.

I don’t want to argue here for either one of these approaches, except to say that the two sides in this war both agree that the natural, obvious pattern of having everyone simply git pull and git push directly to master is, in fact, an anti-pattern. Yay Git!

Congratulations! You’re a Git wizard!

Well, maybe not.

This post is too long as it is, but there are even more tricks that a Git expert can use to analyze merge history. In an upcoming post, I’ll show how you can analyze and visualize even more complicated merge history by excluding commits from the visualization.

For now, I hope that this post has at least taught you enough Git wizardry to argue with your teammates about how to use Git.

Discussion on Hacker News
Discussion on Reddit

This image is 100% legit.

P.S. Redfin is hiring.

--

--