Note: This is an article from ancient times. For an up-to-date introduction to Mythril, read this article instead. Unless you’ve been living under a rock for the past three years, you have surely taken notice of an industry buzzword that has been giving “machine learning” a run for its money: . Blockchain Ethereum is one of the most successful implementations of the concept. In contrast to Bitcoin, which offers limited scripting capabilities, Ethereum provides a Turing-complete virtual machine. State transitions in the network (such as a changes in account balance of a particular token) are regulated by code running in the virtual machine, a.k.a. “smart contracts”. An ancient security saying goes: “With great flexibility comes great potential for vulnerabilities”. It doesn’t help that the semantics of Ethereum’s most popular high-level programming language Solidity are often counter-intuitive, creating many possibilities for developers to mess up. A great example for this is the , which allowed an unknown attacker to . Parity multisig wallet bug withdraw 153,037 Ether (worth more that USD 30 million) after their tinder date turned out to be a real creep The Parity debacle shows that implementation errors can remain undetected for months, even when the contract is deployed on the mainnet and its source code is openly available. One can only speculate what kind of vulnerabilities might be hidden in the thousands contracts deployed on the chain, many of which are black-boxes (in the sense that the source code isn’t published on ). Etherscan Not surprisingly, such a rich source of potential vulnerabilities with a monetary payout doesn’t escape the attention of security folks of the “white-hat” and “black-hat” varieties. It’s all over again — only this time there is real profit (*note that not only does the EVM have a stack, it also doesn’t have registers, so almost every instruction uses the stack). smashing the stack* for fun and profit When I started looking into Ethereum a few weeks ago, I found quite a few useful tools for analyzing contracts on the mainnet. and allow researchers to conveniently browse, disassemble and debug contracts in the web browser. The can (to a certain extent) restore source code from a given bytecode. and make it easy to compile and debug Solidity code. Etherscan remix Porosity decompiler Truffle testrpc However, I also found that there were many things I couldn’t do efficiently: Most notably, searching the blockchain for interesting contracts and scripting static/dynamic analysis in Python (I haven’t quite jumped on the doing-everything-in-JavaScript train yet). Therefore, I started writing up a collection of Python modules and a command line tool. And crucially, I polished the thing to a point where others might be able to use it. The outcome is , an Ethereum disassembler / blockchain exploration & analysis tool. A brief tutorial follows. Mythril Setup Originally, I was hoping to run and directly access the state in its LevelDB. Unfortunately, PyEthApp seems to have suffered a lack of maintenance and development for quite some time and doesn’t sync with the Ethereum mainnet. Mythril therefore needs RPC access to a fully connected node. and start your node as follows: PyEthApp go-ethereum Install go-ethereum $ geth --rpc --rpcapi eth,debug --syncmode fast console 2>/dev/null Note that Mythril uses , so some while it should work with other Ethereum clients, some functions won’t be available. non-standard go-ethereum debug APIs Mythril itself can be installed via Pypi: $ pip install mythril This will install both the Python modules and the command line tool. myth Database Initialization Mythril enables search operations like those described in the in minutes instead of days. To achieve this, it creates a snapshot of the contracts deployed on the mainnet. Run the following command to initialize the database: legendary “Mitch Brenner” blog post $ myth --init-db The whole process takes some time (to be honest it’s not very efficient, I hope to provide a better implementation at some point). If you don’t want to sync the whole chain right away, you can hit at any point, and syncing will auto-resume the next time you run mythril with the flag. ctrl+c --init-db Command line Usage Once you have some contracts in your database, you can run search commands to look for function signatures and opcode sequences. The expression syntax is as follows: func#[function signature]# code#[opcodes]# For example, the command below will output all contracts that have a function named : changeMultisig(address) $ myth --search “func#changeMultisig(address)#”Matched contract with code hash 2bfa6e34330ac57501bd0f6c84d50fcdAddress: 0x3665f2bf19ee5e207645f3e635bf0f4961d661c0, balance: 4999600000000000000Matched contract with code hash 98623854d849f0d97c55b98e0238eb7bAddress: 0x2d36cb89a977209703c1d6304f23198c22b7a498, balance: 63686800960937000000 The search feature supports simple boolean expressions. The following command above prints all contracts that contain both a function named and the opcode sequence : changeMultisig(address) PUSH1 0x50, POP $ myth --search “func#changeMultisig(address)# and code#PUSH1 0x50,POP#” Disassembler The disassembler is invoked with the flag. It accepts either a bytecode string via the argument or a contract address via (this will download the contract code from your Ethereum node). -d -c -a ADDRESS Mythril tries to resolve function names using a built-in signatures file originally obtained from the . If you end up using Mythril, you are very welcome to . Ethereum Signature Database commit updates to that file $ myth -d -a 0x2d36cb89a977209703c1d6304f23198c22b7a4980 PUSH1 0x602 PUSH1 0x404 MSTORE(…)212 — FUNCTION changeMultisig(address) -213 CALLVALUE214 ISZERO Call graph One of Mythril’s “killer features” is the call graph generator. Adding the argument will cause Mythril to save a graph in HTML format: -g OUTPUT_FILE myth -g ~/Desktop/graph.html -a 0x2d36cb89a977209703c1d6304f23198c22b7a498 Open the resulting file in the web browser to view the graph. Usually, you can get a pretty good overview of available execution paths (fortunately, smart contracts aren’t all that complex). Using the call graph together with execution tracing to gradually reverse engineer a contract has been working well for me, although it would be nice to have a GUI-based SVG editor to annotate (if you know one, please let me know in the comments). Finding cross-references It is often useful to identify other contracts referenced by a particular contract. Let’s assume you want to search for contracts that use the instruction in their fallback function, as was the case in the Parity Bug. You can do this using dynamic analysis: Simply run every contract in the PyEthereum VM without any inputs, and check if the instruction is executed. The Mythril repo contains an showing how to do this. It should output something like the following: DELEGATECALL DELEGATECALL example script $ python examples/find-fallback-dcl.pyDELEGATECALL in fallback function: Contract 0x07459966443977122e639cbf7804c446DELEGATECALL in fallback function: Contract 0x17c9e5b7f2bfd8307d628f2d9fcc9352DELEGATECALL in fallback function: Contract 0x17f9db8b6ffa854335b319d01f09ba39 (…) As the name implies, the instruction delegates execution to a different contract, so naturally you’ll be interested which contract is called. You can print the addresses of referenced contracts with the option: DELEGATECALL --xrefs $ myth --xrefs 0x07459966443977122e639cbf7804c4460x5b9e8728e316bbeb692d22daaab74f6cbf2c4691 Instead of using the command line tool, you can also follow the cross-references programmatically and run further analysis on the referenced contracts ( contains an example for this as well). find-fallback-dcl.py Advanced Usage While the command line tool is neat, only with custom code may you unlock the full power of Mythril. In addition to the contract database, disassembler and EVM tracing modules, Mythril also includes modified version of , allowing you to deploy and trace code on a node. By combining all this you can piece together some decent static and dynamic analysis. ethjsonrpc testrpc Search To open the contract database from a Python program use the function. This will return a object (by default, the database lives in , but you can override this in the constructor). Call the method to start a search: get_persistent_storage ContractStorage [your-home]/.mythril search(expression, callback) from mythril.ether.contractstorage import get_persistent_storage contract_storage = get_persistent_storage() contract_storage.search("FUNC#getOwner()#", myCallback) The callback function passed in the second argument will be called for every search result. It receives the following arguments: The hash key identifying the contract in Mythril’s database An object containing the current contract ETHContract A list of addresses at which the contract lives in the blockchain A list balances of the each of the deployed contracts def myCallBack(contract_hash, contract, addresses, balances): Do something… A useful pattern is searching for some particular type of contract, and then performing a set of analysis task on each result. Let’s have a look at a . second example doing just that Tracing Execution Let’s assume you want to scan the contract database for conditions akin to the Parity bug, but in a generic way. One idea is to look for function that, when passed either no argument, an address, or a list of addresses, ends up writing your address to storage with the instruction. Of course this doesn’t necessarily mean that you’re overwriting an important state variable such as or , but it’s definitely the kind of behavior you want to investigate further. any SSTORE owner owners In the , we saw how code can be traced in the PyEthereum VM. For a more advanced analysis that also incorporates state (such as available accounts, contract storage, calling the constructor, etc.) it is better to deploy the contract on . In my test environment, I have geth running on port 8545 and a testrpc instance on port 8546, which allows me to move contracts from the real network to testrpc instantly. To run the , start testrpc as follows: prior example testrpc example code $ testrpc --port 8546 --gasLimit 0xFFFFFFF --account \0x0b6f3fd29ca0e570faf9d0bb8945858b9c337cd2a2ff89d65013eec412a4a811,500000000000000000000 --account \0x2194ac1cd3b9ca6cccc1a90aa2c6f944994b80bb50c82b973adce7f288734d5c,500000000000000000000 We want to look at contracts in the database, so we can either use a search term that matches every contract, or simply iterate over the contracts: all for k in contract_keys: contract = contract_storage.contracts[k] This will return objects that store both the code of the contract ( ) and the code of the transaction that created the contract ( ). ETHContract contact.code contract.creation_code To re-create the contract in your own private chain or on testrpc, replay the contract creation transaction using Mythril’s : JSON RPC client from mythril.rpc.client import EthJsonRpc testrpc = EthJsonRpc(“localhost”, 8546) # Deploy on testrpc creator_addr = "0xadc2f8617191ff60a36c3c136170cc69c03e64cd" ret = testrpc.eth_sendTransaction(from_address=creator_addr, gas=5000000, value=0, data=contract.creation_code)receipt = testrpc.eth_getTransactionReceipt(ret)contract_addr = receipt[‘contractAddress’] This should return a transaction receipt containing the contract address. Note that testrpc “mines” a new block whenever it receives a transaction, so your contract is deployed instantaneously. The class lets you access the list of instructions, formatted easm code, cross-references and functions of a contract. It takes a single constructor argument, the contract bytecode: Disassembly disas = Disassembly(contract.code) The Disassembly object has two lists, and , that contain mappings between function names and addresses. You can iterate over to get the signature of each function (note that unidentified functions are labeled as “UNK_[address]”). func_to_addr addr_to_func func_to_addr for function_selector in disas.func_to_addr: do something with the function signature. E.g.: "changeOwner(address)" "deposit()" "UNK_0x5b980628" In the , every available function is called multiple times with various arguments (e.g. no argument, an address, a list of addresses). I won’t explain all of that in detail here — please have a look at the code to see how to encode the call data and send the transaction. example script Finally, to trace execution of a function call, use the RPC method: traceTransaction tx = testrpc.eth_sendTransaction(to_address=contract_addr, from_address=addr_schnupper, gas=5000000, value=0, data=data) trace = testrpc.traceTransaction(tx) This will return a dictionary containing every instruction executed, along with the stack at each point of execution. We are only interested in instructions that have our target address on the second-to-top position on the stack (i.e. the “attacker’s” address is written to storage). We can search the instruction list as follows: SSTORE for t in trace[‘structLogs’]:if t['op'] == 'SSTORE':if addr_schnupper[2:] in t['stack'][-2]:return True Possible next steps could include running further static and dynamic analysis to determine the effects of the overwritten address, or dumping a callgraph for manual analysis. The usage scenarios detailed here are only the tip of the iceberg: You can build almost arbitrarily complex blockchain scanners on top of Mythril’s APIs. However, note that many of Mythril’s components such contract storage, search expressions, and others still have a lot of room for improvement. You are welcome to contribute better implementations and additional analysis scripts on the . GitHub repository About Mythril and MythX is a free and open-source smart contract security analyzer. It uses symbolic execution to detect a variety of security vulnerabilities. Mythril is a cloud-based smart contract security service that seamlessly integrates into smart contract development environments and build pipelines. It bundles multiple bleeding-edge security analysis processes into an easy-to-use API that allows anyone to create purpose-built smart contract security tools. MythX is compatible with Ethereum, Tron, Vechain, Quorum, Roostock and other EVM-based platforms. MythX
Share Your Thoughts