7,015 reads

Why You Can Sometimes Use git push -f: Rewriting Code Repository History

by Alexey ShepelevDecember 24th, 2020

Too Long; Didn't Read

The git push -f command removes all commits, which are not in the local version, from the server branch and writes new ones. The same commit can appear in several branches at the same time. When we try to merge a branch with rewritten history and branches where the history has been preserved, we will get a great number of conflicts (according to the number of commits) There is an unpleasant side of rewriting history: those commits that seem to be removed from the branch do not actually disappear anywhere and simply remain forever hanging in the repo.

Companies Mentioned

Coin Mentioned

featured image - Why You Can Sometimes Use git push -f: Rewriting Code Repository History

One of the first admonitions that a young Padawan gets together with access to git repositories is: “never use git push -f”. Since this is one of the hundreds of maxims that a novice software engineer needs to learn, no one takes the time to clarify why this should not be done. It’s like with babies and fire: “matches are not toys for children”, and that’s it. But we grow and develop both as people and as professionals, and one day the question “actually, why?” may arise.

I’ve heard that in some companies the ability to answer this question at an interview is a criterion for hiring for senior positions.

But to better understand the answer to it, you need to find out why rewriting history is bad?

To do this, in turn, we need a quick excursion into the physical structure of a git repository. If you are pretty sure that you know everything about the repo structure, you can skip this part. But as for me, I learned a lot of new things in the process of clarification, and some old knowledge turned out to be not quite relevant.

At the lowest level, a git repo is a collection of objects and pointers to them. Each object has its own unique 40-character hash (20 hexadecimal bytes), which is calculated based on the contents of the object.

The main object types area blob (just the contents of a file), a tree (a collection of pointers to blobs and other trees), and a commit. A commit-type object is only a pointer to a tree, to a previous commit, and service information: date/time, author, and comment.

Where are the branches and tags that we used to operate with? They are not objects, they are just pointers: a branch points to the last commit in it, and a tag points to an arbitrary commit in the repo. This means those beautifully drawn branches with commit circles on them in the IDE or GUI client are built on the fly while running along the commit chains from the ends of the branches down to the “root”. The very first commit in the repo has no previous one, there is null instead of a pointer.

An important point to understand: the same commit can appear in several branches at the same time. The commits are not copied when a new branch is created. It just starts “growing” from where HEAD was when the git checkout -b <branch-name> command was issued.

So why is rewriting the history of a repository harmful?

First, and this is obvious, when you upload a new history to the repository that an engineering team is working with, other people might just lose their changes. The git push -f command removes all commits, which are not in the local version, from the server branch and writes new ones.

For some reason, few people know that the git push command has long had a “safer” --force-with-lease, which makes the command fail if a remote repository has commits added by other users. I always recommend using it instead of -f/--force.

The second reason the git push -f command is considered harmful is that when we try to merge a branch with rewritten history and branches where the history has been preserved (or more precisely, the commits removed from the rewritten history have been preserved), we will get a great number of conflicts (according to the number of commits). There is a simple answer to this: if you carefully follow Gitflow or Gitlab Flow, then such situations most likely will not even arise.

And finally, there is an unpleasant side of rewriting history: those commits that seem to be removed from the branch, do not actually disappear anywhere and simply remain forever hanging in the repo.

This is a little frustrating. Fortunately, git developers have also foreseen this problem and introduced the garbage collection command git gc --prune. Most git hosting services, at least GitHub and GitLab, sometimes run this operation in the background.

So, after we’ve dispelled the fear of changing the repository history, we can finally move on to the main question: why is it needed and when is it reasonable to use?

In fact, I’m sure that almost every more or less active git user changed history at least once when it suddenly turned out that something went wrong in the last commit: an embarrassing typo crept into the code, or you made a commit from a different user (from personal e-mail instead of working or vice versa) or forgot to add a new file (if you like to use git commit -a as I do). Even changing the description of a commit leads to the need to rewrite it because the description is also part of hash!

But this is a trivial case. Let’s consider some more interesting ones.

Let’s say it took you several days to create a big feature. Each day you sent your work results to the server repository (4-5 commits), and then sent your changes for review. Two or three tireless reviewers showered you with all kinds of recommendations for edits or even found some bugs (4-5 more commits). Then QA detected several extreme cases that also required fixes (2-3 more commits). And finally, during the integration, some incompatibilities were detected or some autotests also needed to be fixed.

If you now click the Merge button, then a dozen and a half commits like “My feature, day 1”, “Day 2”, “Fix tests”, “Fix review”, etc. will be added to the master branch. Of course, the squash mode can be used as a remedy. Both GitHub and GitLab have it, but you need to be careful with it: firstly, it can replace the commit description with something unpredictable, and secondly, replace the author of the feature with someone who clicked the Merge button (we have a robot helping the release engineer to assemble today’s deploy).

Therefore, the easiest thing will be to use git rebase to collapse all the commits of the branch into one before the final integration into the release.

But it also happens that you have already approached the code review with a repo history resembling Olivier salad. This occurs if it took you several weeks to create a feature because it was poorly decomposed, or requirements have changed during the development, although most teams are frosty about that.

There is a way to make your life easier. Apart from preliminary work on the better decomposition of the task, after you’ve written the main code, you can bring its history into a more logical form by breaking it into atomic commits with green tests in each: “created a new service and a transport layer for it”, “built models and wrote invariant checking”, ”added validation and exception handling “, ”wrote tests“.

Each of these commits can be reviewed separately (both GitHub and GitLab can do this). You can do it at times when switching between tasks or during breaks.

To do this, run git rebase --interactive. Use the hash of the commit, from which you are going to rewrite history, as a git rebase parameter. If we are talking about the last 50 commits, as in the example in the picture, you can write git rebase --interactive HEAD~50 (substitute your number for “50”).

By the way, if you have added the master branch in the process of working on a task, then you will first need to rebase this branch so that merge commits and commits from the master do not distract you.

Armed with the knowledge of the internals of a git repository, it should be easy to understand how rebase affects the master. This command takes all the commits from our branch and changes the parent of the first one to the last commit in the master branch. See diagram:

Situation before rebase: C2 is the parent of commit C4

After rebase: C3 becomes the parent of C4

If the changes in C4 and C3 conflict with one another, then after resolving the conflicts, the C4 commit will change its content, so in the second diagram it is renamed to C4.

Thus, you will get a branch consisting only of your changes and growing from the top of the master. Of course, the master must be relevant. You can simply use the version from the server: git pull --rebase origin/master (as you know, git pull is equivalent to git fetch && git merge, and --rebase will force git to rebase instead of merge).

Let’s finally go back to git rebase --interactive. It was made by programmers for programmers. Having realised how much stress people will experience during the process, they tried to help the user save his nerves and relieve him of the need to work too hard.

The generated file opens in a text editor. Below you will find detailed information on what needs to be done. Next, in simple edit mode, you decide what to do with the commits of your branch. Everything is as easy as pie: pick – leave it as it is, reword – change the commit description, squash – merge with the previous one (the process works from the bottom up, i.e. the previous one is the line below), drop – delete, edit – and this is the most interesting thing – stop and freeze.

After git encounters the edit command, it will take the position when the changes in the commit have already been added to the staged mode. You can change whatever you want in this commit, add a few more commits to the top, and then command git rebase --continue to continue the rebase process.

Oh, and by the way, you can swap commits around. This may create conflicts, but in general, the rebase process is rarely completely conflict-free. As they say, no use crying over split milk.

If you get confused and it seems that everything is gone, use a bailout button git rebase --abort that will immediately return everything to what it was.

You can repeat rebase several times, affecting only parts of history and leaving the rest untouched with pick, thus giving your history a more finished look, as if you were a potter making his jug. As I wrote above, it is good practice to make the tests of each commit look green (to do this use edit, and squash during the next pass).

Another stunt, useful in case you need to decompose several changes in the same file into different commits is git add --patch. It can be helpful on its own, but in combination with the edit directive, it will allow you to split one commit into several, and do it at the level of individual lines, which, if I’m not mistaken, neither GUI client nor IDE allows.

After making sure everything is ok, you can finally breathe a sigh of relief and do what this tutorial has started with: git push --force. Oh, of course, I mean --force-with-lease!

At first, you will most likely spend an hour on this process (including the initial rebase to master), or even two if the feature is really sprawling. But even this is much better than to spend two days waiting for the reviewer to finally take up your request, and another couple of days until he gets through it. In the future, you will probably fit in 30-40 minutes.

The last thing I would like to warn you against is not to rewrite the branch history during the code review. Remember that a conscientious reviewer may clone your code locally so that he can see it through the IDE and run tests.

Thanks to everyone who has read to the end! I hope that this article will be useful not only for you but also for your colleagues who are going to review your code.