This is the second post in my Understanding Git series so be sure to check out the first post that deals with git’s data model before you start with this one.
Let’s start where we left off last time — at git’s data model. Only this time we will simplify it a bit by only displaying the commit objects and giving them some symbolic names instead of checksums (just to make it easier to follow), so we get a graph like this:
Git data model simplified by displaying only commit objects
Those familiar with the graph theory will notice that this is a Directed Acyclic Graph (DAG). What that means is that the connection edges between graph nodes (in git’s case commits) are directed and if you start from one node travelling through the graph and following the edges direction you can never come to the same node that you started off (there are no “round-trips” ).
It is pretty much intuitive that we can differ three branches on our example graph. We’ll mark them as red (containing commits A,B,C,D,E), blue (containing commits A,B, F ,G) and green (containing commits A,B,H,I,J).
Git data graph containing three branches
So that’s one way of defining a branch — to associate it with a list of commits it contains. However, this is not the way git does it. Git uses a simpler and cheaper solution. Instead of having a list of all the commits belonging to a branch and keeping it updated, git only keeps track of the last commit on a branch. By knowing the last commit of a branch it is quite trivial to reconstruct the whole commits list of that branch just by following the directed edges of the git commit graph. For example, to define our blue branch, we only need to know that the last commit on the blue branch is G and from there if we need a list of all commits the blue branch contains we can just follow the directed graph edges starting from G.
Knowing the last commit on the Blue branch we can easily reconstruct its whole commits list
And this is how git manages branches, by keeping pointer to commits. So let’s see it “in action”.
First, we will initialise an empty repository
git init
and take a look at .git
directory
$ tree .git/
.git/├── HEAD├── config├── description├── hooks│ ├── applypatch-msg.sample│ ├── commit-msg.sample│ ├── post-update.sample│ ├── pre-applypatch.sample│ ├── pre-commit.sample│ ├── pre-push.sample│ ├── pre-rebase.sample│ ├── pre-receive.sample│ ├── prepare-commit-msg.sample│ └── update.sample├── info│ └── exclude├── objects│ ├── info│ └── pack└── refs├── heads└── tags
This time we will focus on the refs
sub-directory. It stands for references and this is where git keeps the branch pointers.
Since we didn’t commit any changes yet, refs
directory is empty, so we will create and commit a few files.
echo "Hello World" > helloEarth.txtgit add .git commit -m "Hello World Commit"
echo "Hello Mars" > helloMars.txtgit add .git commit -m "Hello Mars Commit"
echo "Hello Saturn" > helloSaturn.txtgit add .git commit -m "Hello Saturn Commit"
If we do git branch
now we see this output
* master
meaning we are now on the master branch (that git created automatically upon our first commit).
If we take another look at .git/refs
└── refs├── heads│ └── master└── tags
we see there is a file in refs/heads
sub-directory and it is named master
just as our branch is. This is a text file so we can use cat
to take a look at it
cat .git/refs/heads/master
and we see it contains a checksum
c641e4f0d19df0570667977edff860fed8f6c05a
and if we do
git log
we see it is the checksum of our last commit:
commit c641e4f0d19df0570667977edff860fed8f6c05a (HEAD -> master)Author: zspajich <[email protected]>Date: Mon Feb 12 16:28:44 2018 +0100
Hello Saturn Commit
(Note: checksums will have different values on you computer)
So there we have it — a branch in git is just a text file containing a checksum of the last commit on that branch. In other words — a pointer to a commit.
A branch in git is just a pointer to a commit object
If we now create and checkout a new feature
branch
git checkout -b feature
and take another look at .git/refs
tree .git/refs
sure we see another file called feature
└── refs├── heads│ ├── feature│ └── master
and if we take a look at it’s checksum (pointer)
cat .git/refs/heads/feature
we see it’s the same as in the master
file (branch)
c641e4f0d19df0570667977edff860fed8f6c05a
since we didn’t do any new commits on that branch.
Creating a new branch means creating a new pointer to the current commit
So that’s how fast and cheap creating a new branch in git is. Git just creates a text file and fills it with the checksum of the current commit.
But now that we have two branches there is one question. How does git know which of these two branches we are currently checked on? Well, there is one more special pointer (whose name will probably sound familiar to you) called HEAD
. It is special because it (usually) doesn’t point to a commit object, but to a ref (branch) and git uses it to track which branch is currently checked out.
If we look inside HEAD
cat .git/HEAD
we see it currently points to the feature
ref file (branch).
ref: refs/heads/feature
Special HEAD pointer tracks current ref/branch
If we would do
git checkout master
and take a look at HEAD
cat .git/HEAD
we would see
refs: refs/heads/master
it would point to the master branch.
HEAD points to master ref after checkout on master branch
So that‘s git’s branch model. It is very simple but important to know in order to understand many git operations that operate on that graph (merge, rebase, checkout, revert …).
In our next part of this series we will look at something that we have skipped so far — git staging area. We all know we have to stage our changes before committing them, but what exactly is that staging directory or index as it is sometimes called? We’ll see in the next post.