Are Ethereum Contracts Vulnerable to Hash Table Poisoning Attacks?

Written by keredson | Published 2017/06/05
Tech Story Tags: ethereum | solidity | blockchain | security | cryptocurrency

TLDR<strong>TL;DR</strong>: No. But it’s good to understand why, and convincing myself of it taught me more about Ethereum.via the TL;DR App

TL;DR: No. But it’s good to understand why, and convincing myself of it taught me more about Ethereum.

At my former employer there was a long running gag. At some point, some new hire would complain about Java’s HashMap class being non-deterministic, and my co-worker would chime in saying “yeah, sorry about that, that’s my fault” with a little grin, and we’d all have a good chuckle. The joke is my co-worker had pioneered DoS attacks against hash tables and the like during his PhD, which led Sun to make changes to Java’s implementation (which as a side effect made iteration over a HashMap non-deterministic).

But despite that discovery being ~15 years old now, it’s still somewhat esoteric. (As evidenced by frequency of the gag working, and few companies can claim to be as picky as TS about new hires.) So when I saw that Solidity (the scripting language most used for writing Ethereum contracts) had a built in mapping type, with nary a mention of implementation details, I wondered “are Solidity contracts that use maps vulnerable to these kind of poisoning attacks?” Google returned bupkis on the topic, so I dug into it while my son was watching Nature Cat.

What is a Hash Table Poisoning Attack?

As everyone knows, hash tables have an average lookup time of O(1+ε), but hash table collisions can degrade performance to O(n). If you know the hashing algorithm, you can feed it specific inputs that would cause its performance to degrade. This could be very dangerous in an Ethereum contract. From a naive observer’s point of view, either you could cause a contract to prematurely hit the gas limit, possibly permanently DoSing any action in the contract that uses a map, or if the gas limit doesn’t take into the true cost of the map lookup, DoSing the mining itself.

Typically languages use very weak hashing algorithms in their mapping implementations, usually revolving around the Mersenne prime 2³¹–1. (Prime numbers are always multiplicatively generative inside circular sets like fixed precision integer numbers.) This is necessary for speed. Language implementations commonly protect themselves from this by randomizing a salt for the hashing algorithm (hence the non-deterministic nature of Java’s HashMap class). But how could this work in Ethereum? Contract execution has to be purely deterministic.

One could imagine Ethereum attempting the solve this problem simply by using a better hashing algorithm, such as SHA3, at the expense of execution speed. But this would still be exploitable. Virtually all hash table implementations take the hash value modulus the size of the internal storage array to get the internal position. Even with SHA3, if there was a default capacity or say 16, you’d have a 1 in 16 chance of generating a SHA3 hash that would put an entity in a specific bucket. While the difficulty goes up with capacity, there is almost certainly a sweet spot of capacity size vs. hashing cost to DoS a contract that’s still economically feasible. Let’s say conservatively it takes 200 CPU cycles per byte to calculate SHA3 over a 8 byte nonce (src). On my modest desktop that’s still ~2M SHA3/s. This would allow me to create a 100,000 entry hash table where all the entries were in the same slot in approx. 1 hour. Surely making an Ethereum contract do an O(100,000) operation where an O(1) was expected would be exploitable! Time to find Solidity’s hash table implementation.

How Does “mapping” Work in Solidity?

The documentation gives you nothing. Let’s look at the code:

$ git clone https://github.com/ethereum/solidity$ grep -Ri mapping solidity

Damn, not that easy. ;) I’ll spare you the details, but it’s not clear at all where (or if) Solidity is implementing a hash table inside the Solidity code base. Instead let’s try looking at the assembly generated by a small contract:

pragma solidity ^0.4.0;contract TestMapping {  mapping(uint => uint) aMap;  function doit() { aMap[10] = 20; }}

Gives you:

JUMPDEST 		function doit() { aMap[10] = 2...PUSH 14			20PUSH 0			aMapPUSH 0			aMap[10]PUSH A			10DUP2 			aMap[10]MSTORE 			aMap[10]PUSH 20			aMap[10]ADD 			aMap[10]SWAP1 			aMap[10]DUP2 			aMap[10]MSTORE 			aMap[10]PUSH 20			aMap[10]ADD 			aMap[10]PUSH 0			aMap[10]SHA3 			aMap[10]DUP2 			aMap[10] = 20SWAP1 			aMap[10] = 20SSTORE 			aMap[10] = 20POP

Well that doesn’t look anything like a hash table store operation! What’s going on here? It looks like we’re hashing the key with SHA3, but not doing any sort of lookup, just relying on SSTORE. What is the EVM doing?

I had not considered EVM memory management until this point. At some level I guess I had just assumed it worked like any other class in any other VM. A struct + function table in some tiny but grow-able address space. And that variable sized constructs like lists or maps had a default fixed size and grew automatically by some power function with some ε cost that was averaged over or eaten by the mining code. But that’s not true at all. Your contract has access to a huge 2²⁵⁶ bit virtual address space. SSTORE is taking the SHA3 of your key, appending it to the SHA3 of your contract, and taking SHA3 of that, which then becomes the memory storage location of your value. This is amazing, as it’s the first mapping implementation I’ve ever seen that gives a true O(1) lookup in all cases (viewed by the EVM bytecode), by trading off of a near infinite memory address space. The entire EVM memory space becomes an incredibly sparse (and globally shared) hash table array!

This also explains the other oddities about Solidity mappings. For instance, you can’t iterate over a mapping in Solidity, which judging by Google/StackExchange is a common point of confusion. Iterating over a hash table typically involves iterating over it’s internal array, skipping empty cells. This works fine when your table is ~25–75% full; you’re at most iterating over 4x the number of memory locations as you have objects. But iterating over 2²⁵⁶ addresses is obviously infeasible. Likewise, the fact that mappings of structs are assumed to always have the struct initiated makes much more sense in this context. The EVM isn’t keeping pointers to some heap in some contiguous address space — the struct is just overlaid at an address initiated to all zeros, and its size doesn’t matter.

So How Does the EVM Give You a 2²⁵⁶ Memory Space?

Well it obviously isn’t mallocing it. ;) I didn’t investigate too deeply here, just enough to know that the EVM tracks memory using its own internal associative array. This makes sense and is a very common technique to store large/sparse matrices, and the EVM’s memory space can be thought of as a giant 256 x 2²⁵⁶ bit matrix.

All this convinced me that individual Ethereum contracts are not vulnerable to hash table poisoning DoS attacks. By Using a secure hashing function and not reducing the hash space down to a fixed array size, finding a collision in the hash table would be of equal difficulty as finding a collision in SHA3 itself.

Can the EVM’s Internal Associative Array be Exploited?

This I don’t know for sure. I don’t know if the EVM is using it’s own internal hash table to store data in that giant address space in its own internal mapping. But even if it was, it would be very difficult in practice. Obviously the 2²⁵⁶ bit memory space is being shrunken down to internally on your PC. I couldn’t find any stats on the global Ethereum state size, but it’s clearly large enough that spending enough gas to write enough data to poison it would be cost-prohibitive. Possibly it’s not — I could see a strong case for a radix tree here. But that’s outside the scope of this inquiry.

Anyway, that’s it!


Written by keredson | Chief Technology Officer
Published by HackerNoon on 2017/06/05