Hackernoon logoRewriting the History by@nisrulz

Rewriting the History

Nishant Srivastava Hacker Noon profile picture

Nishant Srivastava

Most of us are engineers and at the end of the day humans. So basically we all are bound to make mistakes. As an Engineer, we make such mistakes in code quite often and even if we do not admit it, they do exist. How many times have you published your code to a public VCS such as Github/Bitbucket with the credentials and then pulled down the repository only to reset the whole history and re-publish the new repository with the credentials cleaned out?

Pretty common, eh? From what I have experienced it turns out, that’s what most of the developers do. However, if I told you that you can clean the credentials out and still keep the git history! Sounds awesome right? That’s because it is!

So how does one do it? I will be demonstrating a very simple example of a scenario I recently came across and how I solved it.

It can, however, be applied to various other use cases. You can extend it to remove complete files even 🙄

The Problem

We all have the license block in our code when pushing code to Github (If you don’t have one, then make sure you do have one from now onwards). It looks like below

* Copyright (C) 2016 Nishant Srivastava
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
* http://www.apache.org/licenses/LICENSE-2.0
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* See the License for the specific language governing permissions and
* limitations under the License.

I use a single machine to code for my full-time job and my own open source work (..almost like everyone) and I have set my Android Studio to include the license block everywhere, but here is the catch, it is setup to add the license block of the company I work for (..that’s because it’s my full-time job). It looks like this

* Copyright (c) 2015 - 2016 CompanyName, Inc.
* All Rights Reserved
* Unauthorized copying of this file, via any medium is strictly prohibited

I tend to add my own license when working with my own open source work after work hours and I have never got it messed up, up until recently. I recently pushed one of the .java files of one my android libraries to Github 😱 😱 😱 😱

Now, that’s a bummer 😩 to see your git history messed up as it was something that was not intended and I did not want to reset my repository. (..which would mean that I would lose all stars and issue).

The Solution 🤓

One tool: BFG Repo-Cleaner, from the tool’s homepage itself

Removes large or troublesome blobs like git-filter-branch does, but faster.

Yeah, that’s all you need to clean up your repo. What this tool does is that it searches for text inside your git history and replaces it with another version and creates a new commit and removes the old commits. But the order and history are still maintained. Pretty neat, eh? I know 😉

So here is how we clean our git history of unwanted data.

  1. Download the latest release of bfg tool
  2. Once it is downloaded, navigate to the folder you have downloaded it and create a text file named replace.txt (as a matter of fact you can name it whatever you want. I like to name it as replace.txt)
  3. Edit the content of the replace.txt file created as below
Copyright (c) 2015 - 2016 CompanyName, Inc.==>Copyright (C) 2016 Nishant Srivastava

where the syntax followed is text to replace==>text to replace with
Notice that there are no spaces right before and after the operator ==>
More ways of defining how to replace text in this stackoverflow ans.

4. Now go ahead clone a mirror of your project repo.

git clone — mirror https://github.com/username/your-project.git

5. Once mirrored, now you need to run the command as below to clean your git history

java -jar github-<latest_version>.jar  --replace-text replace.txt -fi *.java  your-project.git/

6. Next, clean the reference logs and optimize the project again

cd your-project.git/ git reflog expire --expire=now --all && git gc --prune=now --aggressive

7. Next, when done, push your changes to Github

git push

DONE. That is all there is.

Ok, not really. Here is what you might encounter after you do a git push. Depending on your project size, you can get a lot of output.

Here is what a successful overwrite of a branch would look like:

+ 1289ad8…bee1ea4 master -> master (forced update)

Unfortunately, you might also see something like this:

! [remote rejected] refs/pull/53/head -> refs/pull/53/head (deny updating a hidden ref)

As mentioned in this issue,

The refs beginning ‘refs/pull' are synthetic read-only refs created by GitHub - you can't update (and therefore 'clean') them, because they reflect branches that may well actually come from other repositories - ones that submitted pull-requests to you.
So, while you’ve pushed all your real refs, the pull requests don’t get updated

In simple words, Github also saves branches after you merged, closed and deleted them. You have no way to modify those branches that are owned by Github. The only way you can really remove them is by deleting the repository. And keep in mind that you also lose all stars, issues and so on when deleting a project. Which will be a bummer, you don’t want to do that, do you?

So how do you solve this, well you need to mirror your repo without the Github Pull Request in Step 4.

In short, simply replace the catch-all refspec above with two more specific specs to just include all heads and tags, but not the pulls, and all the remote pull refs will no longer make it into your bare mirror

Goto your-project.git folder and run git config -e . Next replace

fetch = +refs/*:refs/*


fetch = +refs/heads/*:refs/heads/*
fetch = +refs/tags/*:refs/tags/*
fetch = +refs/change/*:refs/change/*

This should solve your issue.

Also be careful because everyone who cloned or forked your project still has access to the original data. So that is that 🤔

If you want to find more about bfg , go ahead and read their awesome doc.

P.S.: This post was first published at my blog Crushing C.O.D.E

If you have suggestions, please let me know in the comment section.

Till then keep crushing code 🤓

Thanks for reading! Be sure to click below to recommend this article if you found it helpful.
You can connect with me on Github, Twitter, Linkedin, Facebook, Dribbble and Google+


Join Hacker Noon

Create your free account to unlock your custom reading experience.