This article is part 5 of the Blockchain train journal, start reading here: Catching the Blockchain Train . The IPFS White Paper: IPFS Design The IPFS stack is visualized as follows: or with more detail: I borrowed both images from presentations by Juan Benet (the BDFL of IPFS). The IPFS design in the white paper goes more or less through these layers, bottom-up: The IPFS Protocol is divided into a stack of sub-protocols responsible for different functionality: 1. Identities — manage node identity generation and verification. 2. Network — manages connections to other peers, uses various underlying network protocols. Configurable. 3. Routing — maintains information to locate specific peers and objects. Responds to both local and remote queries. Defaults to a DHT, but is swappable. 4. Exchange — a novel block exchange protocol (BitSwap) that governs efficient block distribution. Modelled as a market, weakly incentivizes data replication. Trade Strategies swappable. 5. Objects — a Merkle DAG of content-addressed immutable objects with links. Used to represent arbitrary data structures, e.g. file hierarchies and communication systems. 6. Files — versioned file system hierarchy inspired by Git. 7. Naming — A self-certifying mutable name system. Here’s my alternative naming of these sub-protocols: Identities: name those nodes Network: talk to other clients Routing: announce and find stuff Exchange: give and take Objects: organize the data Files: uh? Naming: adding mutability Let’s go through them and see if we can increase our understanding of IPFS a bit! Identities: name those nodes IPFS is a P2P network of clients; there is no central server. These clients are the nodes of the network and need a way to be identified by the other nodes. If you just number the nodes 1,2,3,… anyone can add a node with an existing ID and claim to be that node. To prevent that some cryptography is needed. IPFS does it like this: generate a key pair (public + private key) PKI hash the public key the resulting hash is the NodeId All this is done during the phase of a node: > the resulting keys are stored in and returns the NodeId. init ipfs init ~/.ipfs/config When two nodes start communicating the following happens: exchange public keys check if: hash(other.PublicKey) == other.NodeId if so, we have identified the other node and can e.g. request for data objects if not, we disconnect from the “fake” node The actual hashing algorithm is not specified in the white paper, read the note about that here: Rather than locking the system to a particular set of function choices, IPFS favors self-describing values. Hash digest values are stored in multihash format, which includes a short header specifying the hash function used, and the digest length in bytes. Example: <function code><digest length><digest bytes> This allows the system to (a) choose the best function for the use case (e.g. stronger security vs faster performance), and (b) evolve as function choices change. Self-describing values allow using different parameter choices compatibly. These multihashes are part of a whole family of self-describing hashes, and it is brilliant, check it out: . multiformats Network: talk to other clients The summary is this: IPFS works on top of any network (see the image above). Interesting here is the network addressing to connect to a peer. IPFS uses formatting for that. You can see it in action when starting a node: multiaddr Swarm listening on /ip4/127.0.0.1/tcp/4001 Swarm listening on /ip4/172.17.0.1/tcp/4001 Swarm listening on /ip4/185.24.123.123/tcp/4001 Swarm listening on /ip6/2a02🔢9:0:21a:4aff:fed4:da32/tcp/4001 Swarm listening on /ip6/::1/tcp/4001 API server listening on /ip4/127.0.0.1/tcp/5001 Gateway (read-only) server listening on /ip4/0.0.0.0/tcp/8080 Routing: announce and find stuff The routing layer is based on a DHT, as discussed in the , and its purpose is to: previous episode announce that this node has some data (a as discussed in the next chapter), or block find which nodes have some specific data (by referring to the multihash of a block), and if the data is small enough (=< 1KB) the DHT stores the data as its value. The command line interface and API don’t expose the complete routing interface as specified in the white paper. What does work: # tell the DHT we have this specific content:$ ipfs dht provide QmYwAPJzv5CZsnA625s3Xf2nemtYgPpHdWEz79ojWnPbdG # ask for peers who have the content:$ ipfs dht findprovs QmYwAPJzv5CZsnA625s3Xf2nemtYgPpHdWEz79ojWnPbdGQmYebHWdWStasXWZQiXuFacckKC33HTbicXPkdSi5Yfpz6QmczCvhy6unZEVC5ukR3BC3BPxYie4jBeRApTUKq97ZnEoQmPM3WzZ3q1SR3pzXuGPHD7e6z3eWEWCnAvrNw7Wegyc8oQmPKSqtkDmYsSxQbqNkZG1AmVnZSFyv5WF7fFE2YebNBFGQmPMJ3HLLLBtW53R2YixhVHmVb3pkR2bHni3qbqq23vBSvQmPNHJJphV1TB6Z99L6r9Y7bKaCBUQ67X17hicnEmsgWDJQmPNhiqGg81o2Perk2i7VNvvVuuLLUMKDxMNwVauP8r5YvQmPQJRgP3Vxi52Ho7HfnYdiCRJTRM1TXwgEnyjcwcLuKfbQmNNxr1ZoyPbwNe2CvYz1CVyvSNWsE8WNwDWQ9t9BDjnj5QmNT744VjtRFpDYB25EVLx7ha1zAVDKsd3qFjxfQLjPEXqQmNWwGRWTYeut6qvKDhJBuEJZnbqMPMfuF81MPvHvPBX89QmNZM5NmzZNPkvH2kPXDYNAB1cAeBNfxLyM9B1crgt3VeJQmNZRDzSJybdf4rmt972SH4U9TF6sEK8q2NSEJpEt7SkTpQmNZdBUV9QXytVcPjcYM8i9AG22G2qwjZmh4ZwpJs9KvXiQmNbSJ9okrwMphfjudiXVeE7QWkJiEe4JHHiKT8L4Pv7z5QmNdqMkVqLTsJWj7Ja3oKwLNWcAYUkRjSZPg22B7rvKFMrQmNfyHTzAetJGBFTRkXXHe5om13Qj4LLjd9SDwJ87T6vCKQmNmrRTP5sJMUkobujpVXzzjpLACBTzf9weND6prUjdstWQmNkGG9EZrq699KnjbENARLUg3HwRBC7nkojnmYY8joBXLQmP6CHbxjvu5dxdJLGNmDZATdu3TizkRZ6cD9TUQsn4oxY # Get all multiaddr's for a peer$ ipfs dht findpeer QmYebHWdWStasXWZQiXuFacckKC33HTbicXPkdSi5Yfpz6/ip4/192.168.1.14/tcp/4001/ip6/::1/tcp/4001/ip4/127.0.0.1/tcp/4001/ip4/1.2.3.4/tcp/37665 and only work for ipns records in the API. Maybe storing small data on the DHT itself was not implemented (yet)? ipfs put ipfs get Exchange: give and take Data is broken up into , and the exchange layer is responsible for distributing these blocks. It looks like BitTorrent, but it's different, so the protocol warrants its own name: BitSwap. blocks The main difference is that wherein BitTorrent blocks are traded with peers looking for blocks of the same file (torrent swarm), in BitSwap blocks are traded cross-file. So one big swarm for all IPFS data. BitSwap is modeled as a marketplace that incentivizes data replication. The way this is implemented is called the BitSwap Strategy, and the white paper describes a feasible strategy and also states that the strategy can be replaced by another strategy. One such a bartering system can be based on a virtual currency, which is where comes in. FileCoin Of course, each node can decide on its own strategy, so the generally used strategy must be resilient against abuse. When most nodes are set up to have some fair way of bartering it will work something like this: when peers connect, they exchange which blocks they have ( ) and which blocks they are looking for ( ) have_list want_list to decide if a node will actually share data, it will apply its BitSwap Strategy this strategy is based on previous data exchanges between these two peers when peers exchange blocks they keep track of the amount of data they share (builds credit) and the amount of data they receive (builds debt) this accounting between two peers is kept track of in the BitSwap Ledger if a peer has credit (shared more than received), our node will send the requested block if a peer has debt, our node will share or not share, depending on a deterministic function where the chance of sharing becomes smaller when the debt is bigger a data exchange always starts with the exchange of the ledger, if it is not identical our node disconnects So this is set up kind of cool I think: game theory in action! The white paper further describes some edge cases like what to do if I have no blocks to barter with? The answer is simply to collect blocks that your peers are looking for, so you have something to trade. Now let’s have a look how we can poke around in the innards of the BitSwap protocol. The command-line interface has a section and a section ; those sound relevant :) blocks bitswap To see bitswap in action, I’m going to request a large file which is a video (download it to see what video!): Qmdsrpg2oXZTWGjat98VgpFQb5u1Vdw5Gun2rgQ2Xhxa2t # ask for the file$ ipfs get Qmdsrpg2oXZTWGjat98VgpFQb5u1Vdw5Gun2rgQ2Xhxa2t # in a seperate terminal, after requesting the file, I inspect the "bitswap wantlist"$ ipfs bitswap wantlistQmYEqofNsPNQEa7yNx93KgDycmrzbFkr5oc3NMKXMxx5ffQmUmDEBm9a8MYyqRdb3YQnoqPmqAo4cEWdKQErirFJdSWDQmY5VJPbsRZzFCTMrFBx2qtZiyyeLhsjBysyfC1fx2gE9SQmdbzYgyhqUNCNL8xU2HTSKwao1ck2Gmi5U1ygjQuJd92bQmbZDe5Dcv9mJr8fiqp5aJL2cbyu64tgzwCS2Vy4P3krCLQmRjzMzVeYRE5b6tDF3sTXMV1sTffno92uL3WwuFavBrWQQmPavzEJQw8atvErXQis6C6GF7DRFbb95doAaFkHe9M38uQmY9fs1Pkr3nV7RkbGdfGh3q8HuKtMMCCUp22AAbwPYnrSQmUtxZkuJuyydd124Z2cfx6jXMAMpcXZRF96QMAsXc2y6cQmbYDTJkmLqMm6ojdL6pLP7C8mMVfVPnUxn3yp8HzXDcXfQmbW9MZ7cwn8svpixosAuC7GQmUXDTZRuxJ8dJp6HyJzCSQmdCLGWsYQFhi9y3BmkhUreX2S799iWGyJqvnbK9dzB55cQmc7EvnBPf2mPCUCfvjcsaQGLEakBbUN9iycnyrLF3b2orQmd1mNnDQPf1BAjFqDHjiLe4g4ZFPAheQCniYkbQPosjDEQmPip8XzQhJFd487WWw7D8aBuGLwXtohciPtUDSnxpvMFRQmZn5NAPEDtptMb3ybaMEdcVaoxWHs7rKQ4H5UBcyHiqTZ... # find a node where we have debt$ ipfs dht findprovs Qmdsrpg2oXZTWGjat98VgpFQb5u1Vdw5Gun2rgQ2Xhxa2tQmSoLMeWqB7YGVLJN3pNLQpmmEk35v6wYtsMGLzSr5QBU3QmSoLnSGccFuZQJzRadHn95W2CrSFmZuTdDWP8HXaHca9zQmUh2KnjAvgEbJFSd5JZws4CNvt6LbC4C1sRpBgCbZQiqDQmc9pBLfKSwWboKHMvmKx1P7Z738CojuUXkPA1dsPrvSw2QmZFhGyS2W833nKKkbqZAU2uSvBbWUytDJkKBHimwRmhd6QmZMxNdpMkewiVZLMRxaNxUeZpDUb34pWjZ1kZvsd16ZicQmbut9Ywz9YEDrz8ySBSgWyJk41Uvm2QJPhwDJzJyGFsD6 # try one to see if we have downloaded from that node$ ipfs bitswap ledger QmSoLMeWqB7YGVLJN3pNLQpmmEk35v6wYtsMGLzSr5QBU3Ledger for <peer.ID SoLMeW>Debt ratio: 0.000000Exchanges: 11Bytes sent: 0Bytes received: 2883738 Thank you ; what a generous peer you are! QmSoLMeWqB7YGVLJN3pNLQpmmEk35v6wYtsMGLzSr5QBU3 Now, have a look at the commands: block # Let's pick a block from the wantlist above$ ipfs block stat QmYEqofNsPNQEa7yNx93KgDycmrzbFkr5oc3NMKXMxx5ffKey: QmYEqofNsPNQEa7yNx93KgDycmrzbFkr5oc3NMKXMxx5ffSize: 262158 $ ipfs block get QmYEqofNsPNQEa7yNx93KgDycmrzbFkr5oc3NMKXMxx5ff > slice_of_a_movie# results in a binary file of 262 KB We’ll have another look at how blocks fit in in the next chapter. The three layers of the stack we described so far (network, routing, exchange) are implemented in . libp2p Let’s climb up the stack to the core of IPFS… Objects: organize the data Now it gets fascinating. You could summarize IPFS as: Distributed, authenticated, hash-linked data structures. These hash-linked data structures are where the Merkle DAG comes in (remember our previous episode?). To create any data structure, IPFS offers a flexible and powerful solution: organize the data in a graph, where we call the nodes of the graph objects these objects can contain data (any sort of data, transparent to IPFS) and/or links to other objects these links — - are simply the cryptographic hash of the target object Merkle Links This way of organizing data has a couple of useful properties (quoting from the white paper): 1. Content Addressing: all content is uniquely identified by its multihash checksum, including links. 2. Tamper resistance: all content is verified with its checksum. If data is tampered with or corrupted, IPFS detects it. 3. Deduplication: all objects that hold the exact same content are equal, and only stored once. This is particularly useful with index objects, such as git trees and commits, or common portions of data. To get a feel for IPFS objects, check out this example. objects visualization Another nifty feature is the use of unix-style paths, where a Merkle DAG has the structure: /ipfs/<hash-of-object>/<named-path-to-object We’ll see an example below. This is really all there is to it. Lets see it in action by replaying some examples from the : quick-start $ mkdir foo$ mkdir foo/bar$ echo "baz" > foo/baz$ echo "baz" > foo/bar/baz$ tree foo/foo/├── bar│ └── baz└── baz$ ipfs add -r fooadded QmWLdkp93sNxGRjnFHPaYg8tCQ35NBY3XPn6KiETd3Z4WR foo/bar/bazadded QmWLdkp93sNxGRjnFHPaYg8tCQ35NBY3XPn6KiETd3Z4WR foo/bazadded QmeBpzHngbHes9hoPjfDCmpNHGztkmZFRX4Yp9ftKcXZDN foo/baradded QmdcYvbv8FSBfbq1VVSfbjLokVaBYRLKHShpnXu3crd3Gm foo # the last hash is the root-node, we can access objects through their path starting at the root, like:$ ipfs cat /ipfs/QmdcYvbv8FSBfbq1VVSfbjLokVaBYRLKHShpnXu3crd3Gm/bar/bazbaz # To inspect an object identified by a hash, we do$ ipfs object get /ipfs/QmdcYvbv8FSBfbq1VVSfbjLokVaBYRLKHShpnXu3crd3Gm{"Links":[{"Name":"bar","Hash":"QmeBpzHngbHes9hoPjfDCmpNHGztkmZFRX4Yp9ftKcXZDN","Size":61},{"Name":"baz","Hash":"QmWLdkp93sNxGRjnFHPaYg8tCQ35NBY3XPn6KiETd3Z4WR","Size":12}],"Data":"\u0008\u0001"} # The above object has no data (except the mysterious \u0008\u0001) and two links # If you're just interested in the links, use "refs":$ ipfs refs QmdcYvbv8FSBfbq1VVSfbjLokVaBYRLKHShpnXu3crd3GmQmeBpzHngbHes9hoPjfDCmpNHGztkmZFRX4Yp9ftKcXZDNQmWLdkp93sNxGRjnFHPaYg8tCQ35NBY3XPn6KiETd3Z4WR # Now a leaf object without links$ ipfs object get /ipfs/QmdcYvbv8FSBfbq1VVSfbjLokVaBYRLKHShpnXu3crd3Gm/bar/baz{"Links":[ ],"Data":"\u0008\u0002\u0012\u0004baz\n\u0018\u0004"} # The string 'baz' is somewhere in there :) The Unicode characters that show up in the data field are the result of serialization of the data. IPFS uses for that I think. Correct me if I’m wrong :) protobuf At the time I’m writing this there is an experimental alternative for the commands: : ipfs object ipfs dag $ ipfs dag get QmdcYvbv8FSBfbq1VVSfbjLokVaBYRLKHShpnXu3crd3Gm{"data":"CAE=","links":[{"Cid":{"/":"QmeBpzHngbHes9hoPjfDCmpNHGztkmZFRX4Yp9ftKcXZDN"},"Name":"bar","Size":61},{"Cid":{"/":"QmWLdkp93sNxGRjnFHPaYg8tCQ35NBY3XPn6KiETd3Z4WR"},"Name":"baz","Size":12}]} $ ipfs dag get /ipfs/QmdcYvbv8FSBfbq1VVSfbjLokVaBYRLKHShpnXu3crd3Gm/bar/baz{"data":"CAISBGJhegoYBA==","links":[ ]} We see a couple of differences there, but let’s not get into that. Both outputs follow the IPFS object format from the white paper. One interesting bit is the “Cid” that shows up; this refers to the newer . Content IDentifier Another feature that is mentioned is the possibility to objects, which results in storage of these objects in the file system of the local node. The current go implementation of ipfs stores it in a database under the directory. We have seen pinning in action in a previous post. pin leveldb ~/.ipfs/datastore The last part of this chapter mentions the availability of object level encryption. This is not implemented yet: (Work in Progress; I had to look it up as well). The project page is here: . status wip ipfs keystore proposal The command hints to something new... ipfs dag Intermission: IPLD If you studied the images at the start of this post carefully, you are probably wondering, what is IPLD and how does it fit in? According to the white paper, it doesn’t fit in, as it isn’t mentioned at all! My guess is that IPLD is not mentioned because it was introduced later, but it more or less maps to the Objects chapter in the paper. IPLD is broader, more general, than what the white paper specifies. Hey Juan, update the white paper will ya! :-) If you don’t want to wait for the updated white paper, have a look here: the (Inter Planetary Linked Data), the and the . IPLD website IPLD specs IPLD implementations And this video is an excellent introduction: . Juan Benet: Enter the Merkle Forest But if you don’t feel like reading/watching more: IPLD is more or less the same as what is described in the “Objects” and “Files” chapters here. Moving on to the next chapter in the white paper… Files: uh? On top of the Merkle DAG objects IPFS defines a Git-like file system with versioning, with the following elements: : there is just data in blobs and it represents the concept of a file in IPFS. No links in blobs blob : lists are also a representation of an IPFS file, but consisting of multiple blobs and/or lists list : a collection of blobs, lists and/or trees: acts as a directory tree : a snapshot of the history in a tree (just like a git commit). commit Now I hear you thinking: aren’t these blobs, lists, and trees the same things as what we saw in the Mergle DAG? We had objects there with data, with or without links, and nice Unix-like file paths. I heard you thinking that because I thought the same thing when I arrived at this chapter. After searching around a bit I started to get the feeling that this layer was discarded and IPLD stops at the “objects” layer, and everything on top of that is open to whatever implementation. If an expert is reading this and thinks I have it all wrong: please let me know, and I’ll correct it with the new insight. Now, what about the file type? The title of the white paper is "IPFS - Content Addressed, Versioned, P2P File System", but the versioning hasn't been implemented yet . commit it seems There is some brainstorming going on about versioning and . here here That leaves one more layer to go… Naming: adding mutability Since links in IPFS are content addressable (a cryptographic hash over the content represents the block or object of content), data is immutable by definition. It can only be replaced by another version of the content, and it, therefore, gets a new “address”. The solution is to create “labels” or “pointers” (just like git branches and tags) to immutable content. These labels can be used to represent the latest version of an object (or graph of objects). In IPFS this pointer can be created using the Self-Certified Filesystems I described in the previous post. It is named IPNS and works like this: The root address of a node is /ipns/<NodeId> The content it points to can be changed by publishing an IPFS object to this address By publishing, the owner of the node (the person who knows the secret key that was generated with ) cryptographically signs this "pointer". ipfs init This enables other users to verify the authenticity of the object published by the owner. Just like IPFS paths, IPNS paths also start with a hash, followed by a Unix-like path. IPNS records are announced and resolved via the DHT. I already showed the actual execution of the command in the post . ipfs publish Getting to know IPFS This chapter in the white paper also describes some methods to make addresses more human-friendly, but I’ll leave that in store for the next episode which will be hands-on again. We gotta get rid of these hashes in the addresses and make it all work nicely in our good old browsers: . Ten terrible attempts to make IPFS human-friendly Let me know what you think of this post by tweeting to me or leave a comment below! @pors