Introduction to ARM

Introduction

ARM stands for Advanced RISC Machines and is a family of CPUs used in various smartphones, IoT devices, Smart gadgets, routers, etc. They are developed by Arm Ltd. ARM, initially stood for Acorn RISC Machine, and then it was changed to Advanced RISC Machine. A RISC (Reduced Instruction Set Computer) is a microprocessor that has very optimized instructions.RISC devices might require more instructions when compared to CISC computers because the instruction is simple and are not complex as CISC. CISC is commonly used in PC and laptops.

The ARM machines are smaller in size and are energy efficient that's why it is used in devices like smartphones. ARM is known as a load-store architecture which means that it can't directly perform operations on data in the memory it should be loaded in the appropriate registers before performing any operations.

The RISC Machines use a technique called pipelining to speed up the execution. It is the process of fetching the next instruction while the current instruction is being executed. RISC Machines use more registers compared to CISC Machines. It is much simpler when compared to CISC Machines.

As the usage of smart gadgets, IoT devices, and smartphones is increasing day by day we need to learn about their security aspects too. ARM has a 32-bit architecture and 64-bit architecture. Most of the smartphones manufactured these days are using ARM64.So it's important to learn both.

In this chapter, we will introduce you to ARM32 and its registers, instructions, modes, etc. About 70% of Exploitation is based on ARM32 in this book. But the same can be applied to ARM64.

Registers

Let's start off with registers.

Registers are high-speed memory areas in the CPU. They can accept, store, and transfer data. A register can hold an instruction, address, or any data. The number of registers depends on the machine depends on its ARM version.

There are roughly around 37 registers in ARM32. Of these 37 registers, the 11 registers from R0 to R10 are called general purpose registers. The registers from r11 to r15 are called special purpose registers. We should be aware of the general purpose registers and special purpose registers.

We will better understand these registers when we go through the examples.

Instruction Format

The instruction Format shows the layout of the instructions. The instructions are divided into two parts:

Opcode: Part of the instruction that is executed by the CPU

Operand: The data or memory location used to execute that instruction

Instructions are used to perform a specified operation on given data. For example addition, subtraction, etc. In ARM 32 bits are used to represent both Opcode and Operands.

ARM states

Arm processors can execute instructions in 32-bit mode and 16-bit mode. They are called ARM mode and Thumb mode. The ARM mode instructions are 32-bit in width and the Thumb mode instructions are 16-bit in width. The processor can switch between ARM mode and Thumb mode using the Branch and Exchange instruction. The Thumb bit (T ) indicates in the CPSR determines the processor’s current state: 0 for the ARM state (default) or 1 for Thumb. As the thumb instructions are smaller, it requires more instruction to perform an operation when compared to ARM state instructions. A single instruction to perform a task in the ARM state may require multiple instructions in the Thumb state. Similarly, all the instructions in the ARM state mode won't have support on the Thumb state. The instructions are 2-byte aligned in thumb mode and 4-byte aligned in ARM mode.

CPSR

The Current Program Status Register (CPSR) contains various flags like the thumb, fast, interrupt, overflow, carry, zero, and negative. These flags represent various bits in the CPSR and are set according to different operations. They can be used in conditional execution, loops, switching states, etc.

Endianness

This refers to the order of bytes represented in the computer memory. ARM can operate either in little-endian, or big-endian. But most implementations of ARM are in little-endian. Little-endian is the default memory format for ARM processors. In little-endian machines, the last byte of the binary representation of the data is stored first. But in Big-endian machines, the first byte of the binary representation of the data is stored in the first location itself.

Let's see an example.

Consider the number hex value 0x01234567 represented in the big-endian and little-endian machines.

As we go over the buffer overflow sections we will be using little endian formats. If you want to read more about this. I recommend going through this link.

Instructions

Instructions are used to perform a specified operation on given data. There are different types of instructions like arithmetic, logical, etc. we won't be covering all of the instructions because that is not the scope of this book.

rd is the destination register and rn is the source register.

Let's debug a simple program to get a better understanding.

Boot up your azeria labs vm and login into your emulated raspi using the ssh console. You don't need to write any assembly code here. we just need to learn what these instructions do.

.data   @data section where you define data

.text   @text section is where the code is written
.global _start

_start:                @Here is where the program starts
        mov r0, #1
        mov r1, #2
        add r3,r1,r2
        sub r1,r3,r0
        mov r7, #1          @Exit syscall
        swi 0                    



 

Assemble and link this program using the 'as' and 'ld' commands.

as asm.s -o asm.o && ld asm.o -o asm

Load the program inside gdb.

gdb ./asm

Let's do the disassembly of the program to see our instructions.

disass _start

Put a break at the first instruction to pause the execution.

b _start

The reason we are putting a breakpoint at the "_start" is that the execution starts from the label "_start".

Now run the program using the "r" command in gdb.

The program hit the breakpoint and the execution paused.

if we look at the registers we can see all the registers except "pc" and "sp" are initialized with zero.

The pc points to the first instruction because it is the instruction to be executed. The stack pointer (sp) points to the top of the stack.

Let's now execute the instructions one by one. we can use the "si" (step into) or "ni" (next instruction for this.

0x10054 <_start+0> mov r0, #1

This is the first instruction to be executed. R0 is zero now. The value 1 will be copied to the register r0. After executing this instruction r0 will get the value 1.

The pc is also updated to point to the next instruction to be executed. Similarly, the pc is incremented by 4 bytes in ARM mode.

Now let's step over the second instruction

0x10058 <_start+4> mov r1, #2

This will copy the value #2 into r2.

0x1005c <_start+8> add r3, r1, r2

This will add r1 with r2 and store the result in r3.

So, r3 = r1 + r2 = 2 + 0 = 2.

0x10060 <_start+12> sub r1, r3, r0

The "sub" instruction will subtract r0 from r3 and store the result in r1.

r1 = r3 - r0 = 2 -1 = 1

0x10064 <_start+16>      mov    r7,  #1
0x10068 <_start+20>      svc    0x00000000

These two instructions are used to exit from the program similar to exit() in c. The r7 contains the syscall number. "1" is used for exit. svc is used for Supervisor Calls (SVC). So that the user can trigger an exception.

if we step over these two instructions, the program will exit gracefully.

Load and store instructions

As ARM is said to be a load and store architecture, we can't directly operate on data in memory. We need to load the data from the memory to registers for doing operations and store the resultant data back in the memory. This can be done with the help of load and store instructions.

Load instructions can be used to load the data from the memory.

LDR R1, [R2] 

r2: Contains the memory location from which the data is loaded.

r1: is the destination register.

[]: The brackets are used to represent the register which contains the memory address to load the value.

This will load the value from the memory location pointed out by r2 into the r1 register.

Similarly, store instructions are used to store the data in the memory location from a register.

STR R1, [R0]

r1: Contains the value that should be stored in the memory location.

r0: Contains the memory location where the data should be stored to. This is the destination.

[]: Represents the register holding the destination memory location.

This stores the value in the r1 register to the memory pointed to by the r0 register.

There are different types of addressing modes used in this instruction. if you want to a detailed explanation I highly recommend going through the azerialabs post. if you this article you can skip the rest of the section and read the next section.

Now let's go with a simple example program from the azerialabs.

.data          /* the .data section is dynamically created and its addresses cannot be easily predicted */
var1: .word 3  /* variable 1 in memory */
var2: .word 4  /* variable 2 in memory */

.text          /* start of the text (code) section */ 
.global _start

_start:
    ldr r0, adr_var1  @ load the memory address of var1 via label adr_var1 into R0 
    ldr r1, adr_var2  @ load the memory address of var2 via label adr_var2 into R1 
    ldr r2, [r0]      @ load the value (0x03) at memory address found in R0 to register R2  
    str r2, [r1]      @ store the value found in R2 (0x03) to the memory address found in R1 
    bkpt             

adr_var1: .word var1  /* address to var1 stored here */
adr_var2: .word var2  /* address to var2 stored here */

Let's assemble and link this using the "as" and "ld" command.

pi@raspberrypi:~/practice $ as asm0x2.s -o asm0x2.o
pi@raspberrypi:~/practice $ ld asm0x2.o -o asm0x2

Load the program into gdb and put a bp at " _start".

gef> b _start
Breakpoint 1 at 0x10074
gef> 

Start the execution using the "r" command.

The program has hit the bp. Let's examine the first instruction.

ldr r0, [pc, #12]

This will copy the value from the memory location : [pc + 12] into the r0 register. This is called PC-relative addressing. The immediate value should be added to the pc to get the memory address. Gdb shows a comment at the end of the instruction. This represents the memory location [pc + 12]. So the memory location is 0x10088. Let's examine that memory location using the examine command.

gef> x/x 0x10088
0x10088 <adr_var1>:	0x00020090
gef> 

The data in the memory location is 0x00020090. it is actually an address.

Let's execute that instruction using the "ni" command.

As expected r0 is loaded with the address 0x0020090 which points to a value of 0x3. gef highlighting is very useful in these situations so that you don't need to examine the location each time to know the data in it, gef will automatically highlight the value in that location.

0x10078 <_start+4>       ldr    r1,  [pc,  #12]	; 0x1008c <adr_var2>

The second instruction will load a value from [pc + 12] to the r1 register. we can use the gef highlighting here to determine the location.

After stepping over the instruction r1 is filled with the memory address 0x0020094 which points to the value 4.

Now, look at the next two instructions.

0x1007c <_start+8>       ldr    r2,  [r0]
0x10080 <_start+12>      str    r2,  [r1]

The first instruction will load the value in [r0] into the r2 register. The memory location in r0 points to the value 3 so 3 will be loaded into the r2 register.

The second instruction will store the value in r2 to the memory location pointed by r1. r2 now contains 0x3 so this will be loaded into the memory location: 0x00020094. So the value 4 will be updated with 3.

The final "bkpt" instruction is used to put a breakpoint.

Stack

Stack is one of the important data structures used in computers. it is a linear data structure that follows the LIFO principle. LIFO stands for "Last In First Out". In LIFO structures, the element that is added first can be removed last and vice versa.

A pile of books arranged on top of each other is an example of a stack.

The stack can be used to store local variables, return addresses, arguments, etc. The topmost element of the stack represents the top of the stack. The stack pointer always points to the top of the stack. Insertion and deletion of elements happen at the top of the stack. Inserting an element in the stack is called a push operation. Deleting an element from the stack is called a pop operation. The stack grows from high memory to low memory.

There is push and pop instruction in ARM to insert and delete elements from the stack. Let's look at an example

push {r1}
pop {r3}

In the above example, the push instruction will push the value of the r1 register into the stack. Similarly, the pop instruction removes an element from the top of the stack and copies it to the r3 register.

Let's debug a simple program to understand it better.

.data
.text
.global _start
_start:
	mov r1,#1
	mov r2,#2
	mov r3,#3
	mov r4,#4
	push {r1}
	push {r1,r2}
	push {r3}
	pop {r3}
	pop {r3,r4}
	mov r7,#1
	svc 0

Let's assemble and link it.

as push-pop.s -o push-pop.o && ld push-pop.o -o push-pop

Load the program into gdb and put a breakpoint at "_start" and start the execution.

The first 4 mov instructions will copy the values 1,2,3,4 to the registers r1,r2,r3, and r4.

Let's also see the stack before the push instructions.

The "push {r1}" push the value in r1 into the top of the stack.

Let's step over the instruction using the "si" command.

Now the value 0x1 from the register r1 has been added to the top of the stack. The stack is decremented as the stack grows from a high address to a low address.

The "push {r1, r2}" will insert two values into the top of the stack.

r1 is 0x00000001 and r2 is 0x00000002. Let's see this.

The final "push {r3}" will insert the value in r3 which is 3 into the top of the stack.

The "pop {r3}" will remove one value from the top of the stack and copies it to r3.

Look at the stack state before execution.

Let's step over the instruction.

Now, r3 contains 0x3 and the top of the stack is incremented.

The Final "pop {r3, r4}" will pop the values 0x1 and 0x2 into the r3 and r4 register.

Now we can use the "c" command to continue to execution to exit the program.

Functions

Functions contain a group of statements to execute a particular task. In ARM, the first four arguments to a function are passed from registers r0 to r3. if there are more arguments they are passed through the stack. The return value is usually passed via the r0 register.

After executing all the instructions in the function/subroutine the program execution should return to the caller to continue its further execution, this is done using a reference to the caller. This reference is called a return address. In ARM, the "lr" register is used to hold the return address.

Let's see a simple example of a function.

#include <stdio.h>

int add(a,b){

return a+b;

}
void main(){

int a = 3; 
int b = 3;
add(a,b);

}

Compile this using gcc.

gcc functions.c -o function

Let's load the program into gdb.

Put a bp at the main and disassemble the main function.

b main

The first "push {r11,lr}" instruction will push the values of r11 (frame pointer) and lr into the stack. we will see a "push" instruction almost in all the functions. This is done to save the value of the registers into the stack so that the registers can be reused in the function. These values are restored back at the end of the function.

The "add r11, sp,#4" instruction updates the frame pointer for the main function. The r11 is set to sp + 4 bytes.

The "sub sp, sp,#8" is used to allocate 8 bytes of space for the two local variables.

These first three instructions are called the function prologue.

   0x00010418 <+0>:	push	{r11, lr}
   0x0001041c <+4>:	add	r11, sp, #4
   0x00010420 <+8>:	sub	sp, sp, #8

As there are two variables in the program.

int a = 3; 
int b = 3;

These will be stored in the memory location using the below instructions.

The first value 3 is copied into the r3 register and is stored in the memory location : [r11 - 8]. Similarly, the second value 3 is also copied to the r3 register and is stored in the adjacent memory location : [r11 -12].

We can step over these instructions and confirm these using the examine command.

Next, we can see two "ldr" instructions. After this "ldr" instruction we can see a "branch" instruction. So these "ldr" instructions are used to load the arguments to the "add" function. These instructions will load our two values from the memory location into r0 and r1 registers. As per the ARM calling convention, the first four arguments will be passed through the registers r0 to r4. So our two values (3) will be passed via r0 and r1.

Also remember, the arguments are in the same order as in the source program. They are passed from left to right.

For example, Let's consider a function "abc" with three parameters.

#include <stdio.h>
int abc(int a,int b, int c){
return a;
}
void main(){
abc(1,2,3);
}

So the arguments to the function are 1,2 and 3. This will be passed from left to right meaning that 1 will be passed via r0, 2 via r1, and 3 via r2.

Let's focus back on our branch instruction seen in the program.

0x1043c <main+36> bl 0x103e8

So what are branch instructions?

Branch instructions are used to change the flow of execution. They can jump from one part of the code into another part of the code. This can be labels, functions, etc.

There are different types of branching instructions like b,blx,bl, etc. You can go through the ARM documentation to learn more about it.

The "b" instruction will jump to a specified label or location.

For example :

b label

The "bl" instruction is used to call a subroutine. This will also copy the address of the next instruction into the r14 (LR). So that "lr" will be used to hold the return address.

This branch instruction will copy the address of the next sub-instruction into LR. This is also known as the return address. We can step inside the add function using the "si" command.

Let's do that.

Now we are inside the "add" function.

If we look at the "lr" register. it is updated with the address of the next instruction after the branching instruction.

The first "push{r11}" will push the value of r11 into the top of the stack so that r11 can be reused without losing its value. The second instructions are the same instructions we saw in the main function. it sets up the frame pointer and allocates the space to store the passed arguments.

      0x103f4 <add+12>         str    r0,  [r11,  #-8]
      0x103f8 <add+16>         str    r1,  [r11,  #-12]
      0x103fc <add+20>         ldr    r2,  [r11,  #-8]
      0x10400 <add+24>         ldr    r3,  [r11,  #-12]

These above four instructions stores the arguments into the specified locations and loads them again into different registers. After executing these instructions r2 and r3 will contain the value 3.

The next "add r3, r2, r3" will add r2 and r3 stores the result in r3.

So r3 = 3 + 3 = 6

The "mov r0, r3" will copy the result into the r0 register.The r0 register usually holds the return value.

      0x1040c <add+36>         sub    sp,  r11,  #0
      0x10410 <add+40>         pop    {r11}		; (ldr r11,  [sp],  #4)
      0x10414 <add+44>         bx     lr

The three instructions are called function epilogue.

The "sub rp,r11, #0" will clean up the stack. it clears the allocated space for the function.

The "pop{11}" will pop the value from the top of the stack which was saved at the beginning of the function.

Lastly, the "bx lr" instruction will branch to the return address in the link register to return back to the main function. The pc will be updated to point to the next instruction after the branch instruction.

Let's step over to return back to the main function.

Now we are back at the main function. The pc points to the next instruction to be executed.

The "sub sp, r11, #4" will clean the stack and update the stack pointer.

The last pop instruction will pop two values from the stack to r11 and pc. The second value that is popped into pc is the exit function.

Let's enter the "c" command to continue the execution to exit the program.

Conclusion

This chapter was an introduction to ARM32 architecture. we briefly discussed the Architecture of ARM32. We looked into registers, instructions, Stack, function, etc. From the next chapter onwards we will start our exploitation side. We will highly require the concepts learned in this chapter in the next chapters.

Last updated