Git Rebasing Public Branches Works Much Better Than You’d Think

If you know what you’re doing, feel free to break the “Golden Rule of Rebasing”

Dan Fabulich
Code Red

--

Dan Fabulich is a Principal Engineer at Redfin. (We’re hiring!)

There’s a team at Redfin that rebases and force pushes constantly, even on their shared/public branches. It works fine, or at least, way better than you’d think, if you’ve read the overblown warnings in popular Git tutorials.

Rebasing Makes History Nice and Linear

There’s a cottage industry of bloggers writing about when to rebase and when to merge. Seriously, there’s like a million blog posts about this. In my opinion, the general consensus is that sometimes we should merge, and sometimes we should rebase, but there isn’t always a good consensus on which cases are which.

One big advantage of git rebase is the ability to eliminate unnecessary merge commits. Merge commits make it harder to use tools like git bisect and git log. When your history has a lot of merge commits, you’ll need to learn how to visualize merge history with git log --graph, git log --first-parent, and git log --no-merges. When you rebase, your history looks smooth and clean; you don’t need any git log tricks.

If you value linear history, you might want to rebase as much as possible. If you do, you’ll run smack into the “Golden Rule of Rebasing.”

The So-Called “Golden Rule”

Every major Git tutorial states in no uncertain terms that rebasing shared branches is a bad idea.

The Git book (Pro Git by Scott Chacon and Ben Straub) has a clear message for anyone considering rebasing public branches. Here’s the opening of the third section of chapter 3.6, “Git Branching — Rebasing.”

Ahh, but the bliss of rebasing isn’t without its drawbacks, which can be summed up in a single line:

Do not rebase commits that exist outside your repository.

If you follow that guideline, you’ll be fine. If you don’t, people will hate you, and you’ll be scorned by friends and family.

“Scorned by friends and family”?!

Atlassian’s Merging vs. Rebasing guide puts it a little more professionally, calling it “The Golden Rule of Rebasing.”

Once you understand what rebasing is, the most important thing to learn is when not to do it. The golden rule of git rebase is to never use it on public branches.

That sounds pretty serious. What does the git rebase man page have to say?

Rebasing (or any other form of rewriting) a branch that others have based work on is a bad idea: anyone downstream of it is forced to manually fix their history. This section explains how to do the fix from the downstream’s point of view. The real fix, however, would be to avoid rebasing the upstream in the first place.

There you have it! Three canonical resources that unanimously agree that anybody who rebases a shared public branch is a bad engineer.

No respectable engineers would break the Golden Rule of Rebasing, would they?

source: pixabay

Breaking the Law! Breaking the Law!

The git rebase man page explains in detail what goes wrong when you rebase a public/shared “upstream” branch and how to fix it. (It’s surprisingly readable compared to most of the other Git man pages!)

We start with a master branch, a subsystem branch, and a topic branch, like this:

Then we rebase the subsystem branch onto the master branch, without touching the topic branch. Rebasing rewrites commits E, F, and G on top of the latest commits on master, creating new commits E’, F’, and G’, like this:

Now the trouble comes in when we git merge the topic branch into the rebased subsystem branch.

The final merge commit M now contains both the old commits E, F, and G and the rewritten/rebased commits E’, F’ and G’ in its history. Those duplicate commits are a bit confusing.

That’s it!

That’s It?!

Nothing else bad results from this. There’s no data loss, and no true corruption of history; the big bad “problem” is just a handful of duplicated commits. It’s a bit of a mess, certainly, but I wouldn’t say “people will hate you, and you’ll be scorned by friends and family” over such a trivial matter. (Perhaps Scott Chacon just needs to make better friends!)

Plus, this totally overblown “problem” only happened because someone decided to merge the topic branch into the rebasedsubsystem branch. If we git rebase the topic branch onto subsystem instead, the old commits E, F, and G disappear, and the history will turn out nice and linear!

We Break the “Golden Rule” Often, and It’s Fine

Redfin has a team whose formal name is the Search Experience team, but internally, everyone calls them the Seekers. (They’re big Harry Potter fans! Also, we, ah, considered a few other plausible abbreviations for the team name, and swiftly rejected them.)

The Seekers have decided that the “Golden Rule” doesn’t really apply to their branches, and so they rebase constantly.

Here’s what they do: they all use git pull --rebase. (It might be better to use git pull --rebase=preserve, but it makes no difference, because they almost never do a git merge; they never create a merge commit.)

In fact, a number of them have set git config pull.rebase true or git config pull.rebase preserve, so they can type git pull and rebase by default. A number of them just don’t mess with git pull, and use git fetch and git rebase as separate commands.

  1. They do a rebase-pull when they switch to the master branch and pull the latest code there. (It’s normally a fast-forward merge, so it’s no different from the standard merge-based pull).
  2. When one engineer — let’s call him Dobby — wants to make a small bug-fix branch, he creates his branch off of the latest master, commits, and pushes, creating a pull request. When Dobby’s mentor approves the PR, Dobby does a git pull --rebase origin masterin his bug-fix branch, to rebase the PR onto master. Then he does a fast-forward merge into master and non-force pushes to master.
  3. When Dobby needs to cooperate on a shared feature branch with another engineer, Kreacher, each engineer does a git pull --rebase origin socks whenever he wants to access the code the other engineer has written. Then they do a non-force push to the branch to add new commits.
  4. When Dobby and Kreacher decide that they need to pull in new fixes from master, Kreacher does a git pull --rebase origin master in the socks branch, rebasing the entire shared branch, including some of Dobby’s commits, onto the latest master branch. Kreacher then force-pushes the branch to the shared repo. Meanwhile, Dobby adds a few more local commits and does a git pull --rebase origin socks the way he always does. This rebases Dobby’s new commits onto the rebased socks branch. It works just fine.

Note that in steps 1, 2, and 3, nobody broke the “Golden Rule of Rebasing.” Nobody needed to force push until Kreacher broke the rule in step 4. And when he did, there were no real problems, because both participating engineers had already agreed to only rebase and never to merge the branch!

Breaking the “Golden Rule” allows the Seekers team to maintain a linear history 100% of the time, even when they’re sharing a feature branch.

We Do It, but You Might Not Want To

The steps described above were pretty straightforward, but it turns out that this strategy of sharing branches can sometimes go awry. As a result, not every team at Redfin is as enthusiastic about rebasing as the Seekers are.

In step 4, when Dobby rebased onto Kreacher’s upstream-rebased socks branch, Dobby was “recovering” from an upstream rebase, in Git’s terminology.

The git rebase man page explains that there are two ways to recover from an upstream rebase. There’s the “easy” case where the new commits E’, F’, and G’ are exactly the same (if the “patch IDs based on the diff contents” match); in that case, Dobby can recover just by running a simple git rebase, which is what he was planning to do anyway.

When the patch IDs don’t match, that’s the “hard case.” This can happen if Kreacher encounters a conflict when rebasing socks onto master. In that case, Git doesn’t automatically know which revisions to rebase, so when Dobby does a naive git rebase, it can generate unnecessary duplicate commits. That’s not the end of the world, but it is messier than we might like. The git rebase man page suggests in that case manually constructing a revision range of commits to rebase, which is a hassle.

Investigation After the Fact Is Much Harder

In my opinion, the biggest problem isn’t the duplicated commits. The biggest problem is analyzing what happened after someone makes a mistake.

For example, git rebase often starts flagging a bunch of random old commits as conflicted when handling “hard case” rebases, confusing Dobby. (Dobby is easily confused, and when he gets confused, he bangs his head against the wall, creating a horrible ruckus.)

When you see a large number of confusing rebase conflicts, that’s a sign that somebody has made a mistake. When Dobby and Kreacher break the Golden Rule together, it can be difficult or even impossible to figure out who made a mistake, when they made it, and how to fix it, because it doesn’t show up in Git’s log. (The whole purpose of rebase is to rewrite history.)

The history of the mistake might show up in git reflog on Kreacher’s laptop, at least until Kreacher’s Git garbage collector erases the alternate timeline. But analyzing the problem after the fact by inspecting Kreacher’s laptop may be difficult or even impossible, if Kreacher is out running errands for Bellatrix Lestrange.

Plus, it requires knowing that you have to check Kreacher’s laptop, which Dobby may not even realize when faced with a bunch of weird rebase conflicts.

Break the Golden Rule, but Carefully!

I don’t recommend breaking the Golden Rule 100% of the time. Only break the Golden Rule with a few of your closest teammates whom you know and trust.

Even the Seekers don’t force push to master, and we’re probably going to keep it that way.

But hey, it’s your team, and it’s your code. If you want to break the Golden Rule of Rebasing, go knock yourself out.

Discuss on Reddit
Discuss on Hacker News

Dobby loves to learn more about Git!

P.S. Redfin is hiring.

--

--