Hackernoon logoGit got big files or keys? Break out BFG by@mikefettis

Git got big files or keys? Break out BFG

mike fettis Hacker Noon profile picture

@mikefettismike fettis

Everybody messes up, today’s mistake was adding a big file to git before a .gitignore was in place to handle it. As a result, github is rejecting the push, even after “removing” the file from git. The reason is that the file still exists in git(history). Time to clean up the mess, break out BFG and nuke it from orbit. -Sadly this means java is involved, but necessary demons. BFG can be found below, and a java jdk needs to get installed.

First things first take a look at BFG repo-cleaner. Welcome back, hopefully there was some reading involved. BFG repo-cleaner will be used to clean up the big files, this can also be used to clean up sensitive data that someone accidentally added to a repo. “cough cough” aws keys. It does this by rewriting the git history and removing all traces of the file. Like many things git sometimes is better not to explain the wizardry and dive right in.

TLDR oh my git… just do this… black magic ensues.

Welcome back from blindly running commands found on the internet, everything worked correctly right? 
Time to break down what just happened. The prework is setting up BFG and getting it loaded into the environment. 
A folder structure is created in the home folder to store the jar. 
The jar is then downloaded and a symlink is created so that when the new version is added the old symlink can get deleted and reset. 
This is not entirely needed but it certainly helps. 
Next the folder is added to the path env variable in the bash_profile file. 
Then sourcing the bash_profile to use the new path and the new folder. It is not required to do all of this but, let’s be honest, this is going to happen more than once and it is better to have this in there for the future. 
After that the repo is cloned ( most likely it already exists so don’t worry. 
Then git garbage collection is run. 
Next move out of the directory because BFG needs to be run not in the current dir. 
Fire the BFG passing in the file or wildcard that should be nuked.
Drop back in the folder.
Expire the get reference log which cleans up some things BFG did
Finally git garbage collection to clean up the rest of the cruft.

That’s that, files have been removed and all history of them existing has been wiped. This type of process can be especially useful when combined with a git hook and a regex for specific things in files, like keys and whatnot. It can also easily be tied into a Jenkins build pipeline to protect people from themselves. Good luck and when in doubt break out the Big “Friggin” Gun

BONUS: there is a fantastic zine from julia evans that talks about some other great git things

(Links for everything mentioned:)


Join Hacker Noon

Create your free account to unlock your custom reading experience.