How variable sized arrays work: non-fixed size data on the stack.

Table of Contents

Let’s first give some background on why I wanted to create this post. Local variables are stored on the stack, these variables are usually fixed size, which makes addressing them very easy. The rbp register stores a pointer to an address on the stack, called the stack frame. Local variables are offset by some constant value relative to the stack frame.

address
10		<– rsp, growing stack pointer
11	local_var3
12	local_var2
13	local_var1
14		<– rbp, the stack frame

💡 Note that the stack grows downwards, meaning the address of the top of the stack decreases as variables are stored on the stack.

Addressing these local variables can be done using a single instruction, and thus is very fast. For example moving local_var1 and local_var3 into registers and adding them can be done like this:

mov    r1, [rbp-1]
mov    r2, [rbp-3]
add    r1, r2

For addressing like this to work the variables on the stack need to be fixed size, because the offsets relative to the stack frame are hard coded at compile time.

Variable sized arrays in C

Having heard in some places that variables on the stack are actually exclusively fixed size, I was left very confused after discovering variable sized arrays in C. Variable sized arrays can be created like this:

int arr[n];

This creates a local variable containing an array of size n, stored on the stack.¹

Well how does this work? Wouldn’t this mess with the fixed offset of the local variables? Indeed, treating it like any other fixed size local variable would not work.

address
10		<– rsp, growing stack pointer
11	local_var2
12	array_item1
13	array_item2
14	array_item3
15	array_item4
16	local_var1
17		<– rbp, the stack frame

💡 Note that arrays are addressed upward, so as the index increases the memory address increases, so in the opposite direction to which the stack grows.

How would we address local_var2 in this case? Here we could simply do rbp-6. However, this would stop working once we create a different sized array.

To solve this we can, for example, store the offset of the array in another local variable, and calculate the address using that offset.

address
10		<– rsp, growing stack pointer
11	local_var2
12	array_item1
13	array_item2
14	array_item3
15	array_item4
16	local_var1
17	array_offset	(-6)
18		<– rbp, the stack frame

If we now wanted to, for example, add array_item1 and local_var2 we could do it like this.

; move address of array_item1 into r1
mov    r1, [rbp-1]
add    r1, rbp
; set r2 to array_item1
mov    r2, [r1]
; subtract 1 from address of array_item 1, so address of local_var2
sub    r1, 1
; set r3 to local_var2
mov    r3, [r1]
; add array_item1 and local_var2
add    r2, r3

In practice

The above example is somewhat oversimplified, how it is actually done in practice is quite different. So let’s have a look at how an actual compiler compiles this simple c program:

#include <stdio.h>

void function(unsigned size){
    unsigned arr[size];
    for(unsigned i = 0; i<size; i++){
        arr[i]=i;
    }
    printf("arr: First: %d, Last: %d\n", arr[0], arr[size-1]);
    unsigned arr2[size];
    for(unsigned i = 0; i<size; i++){
        arr2[i]=i*2;
    }
    printf("arr2: First: %d, Last: %d\n", arr2[0], arr2[size-1]);
}

int main() {
    function(3);
    return 0;
}

It was compiled using gcc on an x86_64 machine:

> gcc var.c -O0

The general strategy goes like this: the fixed size variables are located at the start of the stack frame, along with an additional variable for each variable sized array storing the starting position of the array. These variables can still be addressed using a single instruction. The variable sized arrays are then just put on top of that on the stack, they can be addressed by using the additional variable.

Let’s have a look at the what the array allocation compiles to.

unsigned arr[size];

We simply grow the stack by size * 4 as a single unsigned int is 4 bytes. There is just some extra fancy arithmetic to assure the stack pointer stays 16-byte aligned.²

401157: lea    rdx,[rax*4+0x0] ; rdx = rax * 4 = size * 4
40115f: mov    eax,0x10 ; eax = 16
401164: sub    rax,0x1 ;  rax = 16 - 1
401168: add    rax,rdx ;  rax = size * 4 + 16 - 1
40116b: mov    edi,0x10 ; edi = 16
401170: mov    edx,0x0  ; edx = 0
401175: div    rdi ; rax = rax / rdi = (size * 4 + 16 - 1) / 16
401178: imul   rax,rax,0x10 ; rax = rax * 16
40117c: sub    rsp,rax ; grow stack by previously calculated size

Next we save the starting address of the array, which is on the newly allocated stack space. Once again we have some fancy arithmetic here, this time it is simply rounding up to the nearest 4 bytes, in order to make it 4 byte aligned.

40117f: mov    rax,rsp
401182: add    rax,0x3 ; rax = rsp + 3
401186: shr    rax,0x2 ; shift right by two, equivalent to dividing by 4
40118a: shl    rax,0x2 ; shift left by two, equivalent to dividing by 4 
40118e: mov    QWORD PTR [rbp-0x28],rax ; save address on the stack

Our stack now looks like this:

Address	Content	Value
rbp - 0x58	arr begin		<– rsp
rbp - 0x44	size	3
rbp - 0x28	address of arr	rbp - 0x58
rbp			<– rbp

With the memory for the list allocated now, assigning values to the array is quite trivial. At the address arr_begin + i * (size of int) the value is written.

for(unsigned i = 0; i<size; i++){
    arr[i]=i;
}

; Loop initialization 
401192:	mov    DWORD PTR [rbp-0x14],0x0   ; i = 0
401199:	jmp    4011ac <function+0x76>     ; jump to loop condition check (4011ac)
; Loop body
40119b:	mov    rax,QWORD PTR [rbp-0x28]
40119f:	mov    edx,DWORD PTR [rbp-0x14]
4011a2:	mov    ecx,DWORD PTR [rbp-0x14]
4011a5:	mov    DWORD PTR [rax+rdx*4],ecx ; [arr_begin + i * (size of int)] = arr[i] = i
4011a8:	add    DWORD PTR [rbp-0x14],0x1  ; i++
; Loop condition
4011ac:	mov    eax,DWORD PTR [rbp-0x14]
4011af:	cmp    eax,DWORD PTR [rbp-0x44]  ; i < size
4011b2:	jb     40119b <function+0x65>    ; jump to loop body if i < size

Our stack now looks like this:

Address	Content	Value
rbp - 0x58	arr[0]	0	<– rsp
rbp - 0x54	arr[1]	1
rbp - 0x50	arr[2]	2
rbp - 0x44	size	3
rbp - 0x28	address of arr	rbp - 0x58
rbp - 0x14	i	2
rbp			<– rbp

The second allocation of the array works identically to the first one:

unsigned arr2[size];

4011e9:	lea    rdx,[rax*4+0x0] ; rdx = rax * 4 = size * 4
4011f1:	mov    eax,0x10 ; eax = 16
4011f6:	sub    rax,0x1 ;  rax = 16 - 1
4011fa:	add    rax,rdx ;  rax = size * 4 + 16 - 1
4011fd:	mov    ecx,0x10 ; ecx = 16
401202:	mov    edx,0x0  ; edx = 0
401207:	div    rcx ; rax = rax / rcx = (size * 4 + 16 - 1) / 16
40120a:	imul   rax,rax,0x10  ; rax = rax * 16
40120e:	sub    rsp,rax ; grow stack by previously calculated size
401211:	mov    rax,rsp
401214:	add    rax,0x3 ; rax = rsp + 3
401218:	shr    rax,0x2 ; shift right by two, equivalent to dividing by 4
40121c:	shl    rax,0x2 ; shift left by two, equivalent to dividing by 4 
401220:	mov    QWORD PTR [rbp-0x38],rax ; save address on the stack

And the second for loop is also pretty much identical to the first one:

for(unsigned i = 0; i<size; i++){
    arr2[i]=i*2;
}

; Loop initialization 
401224:	mov    DWORD PTR [rbp-0x18],0x0
40122b:	jmp    401241 <function+0x10b>
; Loop body
40122d:	mov    eax,DWORD PTR [rbp-0x18]
401230:	lea    ecx,[rax+rax*1] ; ecx = i * 2
401233:	mov    rax,QWORD PTR [rbp-0x38]
401237:	mov    edx,DWORD PTR [rbp-0x18]
40123a:	mov    DWORD PTR [rax+rdx*4],ecx ; [arr_begin + i * (size of int)] = arr[i] = i*2
40123d:	add    DWORD PTR [rbp-0x18],0x1
; Loop condition
401241:	mov    eax,DWORD PTR [rbp-0x18]
401244:	cmp    eax,DWORD PTR [rbp-0x44]
401247:	jb     40122d <function+0xf7>

In the end the final stack looks like this:

Address	Content	Value
rbp - 0x68	arr2[0]	0	<– rsp
rbp - 0x64	arr2[1]	2
rbp - 0x60	arr2[2]	4
rbp - 0x58	arr[0]	0
rbp - 0x54	arr[1]	1
rbp - 0x50	arr[2]	2
rbp - 0x44	size	3
rbp - 0x38	address of arr2	rbp - 0x68
rbp - 0x28	address of arr	rbp - 0x58
rbp - 0x18	i (second loop)	2
rbp - 0x14	i (first loop)	2
rbp			<– rbp

Wrapping up

I hope this clarifies the concept of variable sized data on the stack. I certainly really enjoyed looking at the disassembled code and figuring out how it works, I learned a lot about assembly and compilers in the process. If you have any questions feel free to reach out to me! ³

Local variables being stored on the stack is technically an implementation detail, the c standard only calls of local variables (auto variables) to be freed once there out of scope. In practice this is always done using the stack. ↩︎
For more details on why it should be 16 byte aligned see the following stack overflow post: https://stackoverflow.com/questions/49391001/why-does-the-x86-64-amd64-system-v-abi-mandate-a-16-byte-stack-alignment. ↩︎
Reaching out through mastodon is probably your best bet, I am also planning on adding a comment section eventually, but haven’t gotten around to it yet. ↩︎