How to solve Linus Torvalds' issues about GitHub

How to solve Linus Torvalds' issues about GitHub

Julien Danjou

While Linus Torvalds is well-known for Linux, he's also reputed for not having his tongue in his pocket. He is often complaining loudly about what he thinks is wrong or damageable to the industry.

As the creator of Git 15 years ago, it's interesting to hear what he recently stated about GitHub development workflow. There are plenty of critics to make about it, and Linus picked one that's quite interesting:

Also, I notice that you have a github merge commit in there.

That's another of those things that I *really* don't want to see - github creates absolutely useless garbage merges, and you should never ever use the github interfaces to merge anything. This is the complete commit message of that merge:

Merge branch 'torvalds:master' into master

Yeah, that's not an acceptable message. [...] github is a perfectly fine hosting site, and it does a number of other things well too, but merges is not one of those things.`

What's the problem?

The root of the problem is that the GitHub merge commit messages can be too concise and do not provide enough context about what's being merged and why. If you take a look at how Linus merge tags, you'll understand what he means by having detailed merge background:

Linus is not just pushing a merge button. He's writing a proper merge commit message for the tag he is pulling.

He's also taking care of not having cruft in its Git branch history. He wants a clean Git history, which is why he's unhappy seeing a merge commit that shouldn't be there.

How to manage a proper clean Git history

Managing a good Git history can be quite a daunting task when you're using GitHub. While the platform makes Git accessible to newcomers, it tends to stand in the way of so-called power users.

Not everybody has the exact definition of what a good clean Git history is. For the sake of this example, we'll use Linus' rules here:

  1. Have clean, logical and atomic commits in your merged branches;
  2. Do not merge branches (or tags) without meaningful and well-crafted commit messages;
  3. Do not clutter your Git history with merges from your base branch.

Having Clean Commits in your Branches

This is usually the trickiest part when using GitHub. Most developers have the habit of starting a pull request with a branch and a first commit.

They then update their branch by using git commit and git push, adding more commits to the branch. This works fine, and you can see the history of the developer progression on the pull request by using the Commits tab.

However, GitHub will merge the branch and its commit history as-is if you use the rebase or merge merge methods. Cluttering your project Git history with multiple commits whose only description is Fix tests is not really what makes your Git history enjoyable to browse.

There are mainly two ways of solving the issue:

  1. Require your contributor to only send logical commits to their pull request's branch. That means using git commit --amend or git rebase --interactive extensively. However, this is not something every developer is comfortable with.
  2. Use the squash merge method, which squashes all the commits from the pull request into a single commit on top of the base branch. This is usually the preferred method as it avoids asking the contributor to mess with its branch.

Using Good (Merge) Commit Messages

Having good merge commit messages should be easily achievable when using GitHub. When you press the merge button using the merge or squash method, GitHub actually asks for a message. By default, it will provide one which contains your pull request titles and commit messages.

This is usually far from being as clean as the changelog that Linus would write, but it's a good start. If your commit are well organized, it can be quite easy to write a good summary of why and what you're merging. Our guidelines about writing a good commit message apply here: be nice to future-you.

That being said, this does not apply if you're using the rebase merge method.

Do not Merge your Base Branch

When a pull request is opened, it can get pretty quickly out of sync with its base branch. As the project continue its progression, there's a chance the CI report on your pull request is not reflecting the reality.

It's tempting to solve this problem by merging the base branch into the pull request. There's even an Update branch button if you use GitHub branch protection that allows you do to this.

The best way of updating your pull request is actually to rebase it using git rebase. As every commit in your pull request should be logical and atomic, every commit should be reapplied on top of the new version of the base branch, and each conflicting commit should be solved on its own, rather than doing a global conflict resolution during the merge.

Unfortunately, GitHub does not provide a way to do a simple rebase using its UI. You can however, leverage Mergify to do this for you:

By simply enabling Mergify application on your repository, you can use any of its command including rebase. It will execute git rebase on the branch and force-push it to the pull request.

Solving This with Consistency

While everything we discussed above makes sense, it can actually be quite hard for an engineering team to apply this consistently. If you have no formal check on every step of this workflow, it's easy to fall in a trap where a pull request gets merged with the wrong method or an inaccurate history.

For that problem too it is actually a good idea to leverage Mergify to merge the pull request automatically for you.

  - name: automatic merge
      - "#approved-reviews-by>=1"
      - check-success=test
      - linear-history
        method: merge
        commit_message: title+body

This is a straightforward configuration which only merges a pull request when it:

  • is approved by at least one person;
  • is validated by the CI named test;
  • has a linear history (i.e., no merge).

Those rules solve point #1 and #3 of Linus' requirements. What about point #2?

Our suggestion here is to use the pull request title and body as the commit merge message — that's the commit_message setting. That makes it easy for the repository maintainer to identify and edit what will end up in the Git history.

With all of this, there's no good excuse to not having a marvelous Git history!

Next Step

As a next step, it could be quite tempting to automatise other actions such as rebasing the pull request before it's ready to get merged. This is actually what a merge queue is for: read this blog post if you're curious.