Understanding Git — Index

Written by zspajich | Published 2018/03/01
Tech Story Tags: git | development | version-control | index | understanding-git-index

TLDRvia the TL;DR App

This is the third post in my Understanding Git series so be sure to check out the first two (Data Model and Branching) before proceeding with this one.

There are three areas where file changes can reside from git’s point of view: working directory, staging area, and the repository.

Git areas

When you work on your project making changes you are dealing with your project’s working directory. This is the project directory on your computer’s filesystem. All the changes you make will remain in the working directory until you add them to the staging area (via git add command). The staging area is best described as a preview of your next commit. Meaning, when you do a git commit, git will take the changes that are in the staging area and make the new commit out of those changes. One practical use of the staging area is that it allows you to fine-tune your commits. You can add and remove changes from staging area until you are satisfied with how your next commit will look like, at which point you can do git commit. And after you commit your changes they go into .git/objects directory where they are saved as commit, blob and tree objects (as we saw when we looked at git’s data model).

Although it is often useful to think of staging area as some real area (or directory) where git stores changes (like it does in .git/objects ) this is not entirely true. Git doesn’t have a dedicated staging directory where it puts some objects representing file changes (blobs). Instead, git has a file called the index that it uses to keep track of the file changes over the three areas: working directory, staging area, and repository. And when you add changes to your staging area, git updates the information in the index about those changes and creates new blob objects, but puts them in the same .git/objects directory with all the other blobs that belong to previous commits. This maybe sounds a bit complicated but actually it isn’t, so let’s go through a typical git workflow example to display how git uses the index.

Let’s say we are on a master branch and there is also a feature branch in our repository. If we do

git checkout feature

three things are going to happen.

First, git will move the HEAD pointer to point to the feature ref (branch). To make things more simple we will display only the last commit on the feature branch.

Git checkout first moves HEAD to feature ref

Next, git will take the content of the commit that feature is pointing to and add it to the index.

Git checkout updates the index

As we mentioned earlier index is not a directory but a file, so git is not actually storing objects (blobs) into it. Instead, git is storing information about each file in our repository:

  • mtime — is the time of last update
  • file — name of the file
  • wdir —file version in working directory
  • stage —file version in the index
  • repo — file version in the repository

File versions are marked with checksums (if two files have the same checksum then they have the same content/version).

And finally, git will make your working directory match the content of the commit that HEAD is pointing to (it will recreate the content of your project’s directory using tree and blob objects).

Git checkout makes you working directory match HEAD

So, after checkout, every file will have the same version in the working directory, staging area/index, and the repository.

If we now edit our index.php file

Updating index.php only affects the working directory

those changes will affect only our working directory. But if we now run

git status

git will first update the index with the new working directory version for index.php

Git status updates the index

and then it will see that index.php has different versions in working and staging directory.

Git status sees that index.php has different version in working directory and staging area

So, git will tell us

On branch featureChanges not staged for commit:(use "git add <file>..." to update what will be committed)(use "git checkout -- <file>..." to discard changes in working directory)

modified: index.php

no changes added to commit (use "git add" and/or "git commit -a")

that there are changes in our working directory which are not in our staging area (and therefore won’t be included in our next commit at this point).

So let’s add our index.php file to the staging area by doing

git add index.php

Two things are going to happen. First, git will create a blob object for our index.php file and store it into .git/objects directory and second, it will again update the index.

Git add creates blob file for index.php and updates the index

If we now do

git status

git will see that index.php version in staging area matches the working directory version but doesn’t match the repository version

index.php has different versions in staging area and repository

so git will tell us:

On branch featureChanges to be committed:(use "git reset HEAD <file>..." to unstage)

modified: index.php

that index.php is now staged to be committed.

And now when we commit our changes

git commit -m "Adding some code magic to index.php"

git will:

  • create a new commit object and tree object (and hook them up with the blob object that was already created with git add)
  • move the feature ref pointer to the new commit
  • update the index.

Git commit creates commit/tree objects, moves the feature ref, and updates the index

And now our index.php file again contains the same versions in all git areas.

And that is it — the mysterious git index. Now that we looked at it, it is not so complicated but it is vital to know in order to understand the commands that operate on the index (add, checkout, reset …).

In writing this post I heavily used Scott Chacon’s lecture so be sure to check it out.


Published by HackerNoon on 2018/03/01