deception
In the previous episode of alloca shenanigans (which is not required reading for this post), I said this.
The stack is like a stack of plates. You can't change things in the middle or the whole thing falls down
I lied! Well... sort of. You see... the stack is just a bunch of bits, which means we don't have to treat the stack like a stack at all! In this second episode of alloca shenanigans we'll be implementing a custom data-structure in stack memory, which will allow us to resize things in the middle of the stack. I'll end it in a grand crescendo where we implement a resizing hashmap in stack memory with a custom programming language!
You can see the project on the Github.
the structure
We will be using a list-like data-structure for our custom stack, each element will contain data followed by the size of the data. For example, a list with two elements will look like this:
< DATA > < DATA > < DATA > < 3 > < DATA > < 1 >
|
rsp
The stack pointer will point to the last element in the list. The size section of each data element is named "the size tag" and it is stored as an 8-byte .
implementation
We'll start off implementing our stack in assembly.
We need to write some primitives for stack allocation, , and
.
Let's start off implementing because it's the simplest.
Because of our unique constraints, we aren't allowed to use
, , or any assembly instruction that modifies the stack.
Instead we can use instructions and store the return address in a register.
Here is the assembly code:
alloc :
; allocate size bytes
; set the last 8 bytes to size
sub rsp , rbx
mov qword [ rsp ] , rbx
; jump back to the caller
jmp rax
The return address is given in and the amount (in bytes) to allocate is given in
. Note the and how the last 8-bytes of the allocation are used for the size. The amount to allocate in has to include an extra 8-bytes for the size tag.
And here is the implementation for .
fn pop ( ) {
rsp - = * rsp ;
}
pop :
add rsp , [ rsp ]
jmp rax
Now, implementing is going to be a little more complicated. For the sake of brevity I'll
just show the pseudo-code, but you can imagine how this would translate to assembly.
fn resize ( ptr : * const void , amount : i64 ) {
// the stack moves downwards
// we start at i=rsp and move up to ptr
for ( let i = rsp ; i < ptr ; i + = 8 ) {
* ( i - amount ) = * i ;
}
// update the size and rsp
* ( ptr - amount ) = * ptr + amount ;
rsp - = amount ;
}
We'll start at and move backwards
until we reach . We'll move all the data we meet along the way down by the given
.
< DATA > < DATA > < SIZE = 2 > < DATA > < SIZE = 1 >
| |
ptr rsp
< DATA > < DATA > < SIZE = 2 > < DATA > < SIZE = 1 >
| |
ptr rsp
Then we set the new size and update rsp.
< DATA > < DATA > < DATA > < SIZE = 3 > < DATA > < SIZE = 1 >
| |
( ptr - amount ) rsp
Nice! Dynamic stack memory! But, there is still one more issue, and that is pointers. If we resize memory before a pointer, then everything get's pushed around and our pointer no longer points to the right object. Oh No!
< DATA > < SIZE = 1 > < DATA > < SIZE = 1 >
|
ptr ( 2nd item )
< DATA > < DATA > < DATA > < SIZE = 3 > < DATA > < SIZE = 1 >
|
ptr ( invalidated )
The solution? Instead of storing the address of our data, we can
store the index. Then we can convert that index into an
address when we need to use it. For example, instead of we store for the first item in the stack.
We can convert an index into an address by calculating how many steps backwards our data is from the end of the stack with . For example, if our stack has elements and we want to find the data at index we would need to jump step backwards from the end of the stack. Here is the pseudo-code for converting an index into an address.
fn idx2addr ( idx : u64 ) - > u64 {
let ptr = rsp ;
for ( let i = 0 ; i < length - idx - 1 ; i + + ) {
ptr - = * ptr ;
}
return ptr ;
}
Note that this means we have to keep track of the length of our list in a register somewhere. Since we
aren't using for anything else, I've designated as our length register.
We'll update and to increment/decrement .
For deeply nested data-structures like , we'll have to store the reference as a
list of indicies. This means that references to deeply nested data-structures can get quite long, but that's a sacrifice I'm willing to make.
You can download an ASM code example here, but you may need an exotic debugger to be able to visualise how stack memory changes.
a language
We have an issue. Writing assembly is painful, but no language supports our funky stack, they all expect the stack to behave in a certain way. This is where the custom programming language comes in. It doesn't need to be fancy, just a basic wrapper around assembly. Like C but for stack shenanigans.
I'll be glossing over the whole "making a language" part in this blog post. I highly suggest you try following Make a Lisp and LLVM Kaleidoscope if you're feeling up for a challenge. They'll teach you a lot about writing your own languages. You can see my own implementation of "Make a Lisp", here.
For simplicity, the language will compile to NASM assembly. First we have a tokenizer. It'll convert a stream of text like:
u32 inc ( u32 a ) {
return a + 1 ;
}
Into a list of tokens, like:
[
Symbol ( "u32" ) , Symbol ( "inc" ) ,
Operand ( "(" ) , Symbol ( "u32" ) , Symbol ( "a" ) , Operand ( ")" ) ,
Operand ( "{" ) ,
Return , Symbol ( "a" ) , Operand ( "+" ) , Number ( 1 ) , Operand ( ";" ) ,
Operand ( "}" ) ,
]
I decided to give the language C-like syntax because C is familiar and easy to parse. Now we have a stream of tokens, it'll make the next step a little bit easier.
We need to convert our stream of tokens into an abstract syntax tree (AST). For example, the abstract syntax
tree of would look something like:
Sym ( + )
/ \
/ \
/ \
Num ( 2 ) Sym ( * )
/ \
/ \
/ \
Num ( 3 ) Num ( 4 )
Parsing simple expressions like into an AST is pretty easy, but for more complicated expressions like you have to start worrying about operator precedence. There's a wonderful section in LLVM Kaleidoscope and the first part of this youtube video that both show examples of a binary expression parser.
If you want to try implementing a parser yourself and are a beginner. I highly suggest using lisp-like syntax since it is much easier to parse. There is a section in Make a Lisp that might be helpful for you.
In summary, a binary expression parser that uses operator-precedence parsing will take the following steps. First, it parses a "primary" expression, we will call it for "left-hand-side". A primary expression is any expression without an operand. That means numbers, or booleans, or strings, but also any expression wrapped in parenthesis.
Then we can write the function signature for our binary operator parser. The parameter will be useful later.
fn parse_binop ( minimum : usize , lhs : Expression ) ;
And we can pass in the we parsed earlier.
parse_binop ( 0 , lhs ) ;
Inside the body of , it peeks at the right-hand-side for an operand (like , , etc.)
lhs operand
| |
2 +
If the operand has lower precedence than the , we just return and allow some other function higher in the call stack to parse the rest of it.
fn parse_binop ( minimum , lhs ) {
if peek_operand ( ) . precedence < minimum {
return lhs ;
}
. . .
If the operand has higher precedence than the , then parse the next primary and next operand.
let prev = parse_operand ( ) ;
let rhs = parse_primary ( ) ;
let next = peek_operand ( ) ;
Our example now looks like this:
prev next
| |
2 [ + ] 3 [ * ]
| |
lhs rhs
If the next operand has higher precedence than the previous then recursively call the parser with the "minimum" set to the previous operand's precedence.
if prev < next {
rhs = parse_binop ( prev . precedence , rhs ) ;
}
Then we join the left-hand-side and right-hand-side.
lhs = Ast ( lhs , prev , rhs )
And recursively call again with the same "minimum" value.
return parse_binop ( minimum , lhs ) ;
Here is the full pseudo-code:
fn parse_expression ( ) {
return parse_binop ( 0 , parse_primary ( ) ) ;
}
fn parse_binop ( minimum , lhs ) {
if peek_operand ( ) . precedence < minimum {
return lhs ;
}
let prev = parse_operand ( ) ;
let rhs = parse_primary ( ) ;
let next = peek_operand ( ) ;
if prev . precedence < next . precedence {
rhs = parse_binop ( prev . precedence , rhs ) ;
}
lhs = Ast ( lhs , prev , rhs )
return parse_binop ( minimum , lhs ) ;
}
Now we have an AST that can represent function definitions, variable assignments,
expressions, and control flow operations (if, while, for). We need to write a code-generator that generates
our NASM assembly. It'll take in our AST and output NASM. Then we can call the command
line tool directly to
compile it into a binary.
The code-generator was the hardest part to do. It is so hard to debug thousand-line assembly files where no debugger works because your stack is too screwed, which is why I wrote my own (wink wink nudge nudge).
Let's rewrite some of our assembly functions in our code-generator. For our function, we'll make sure to keep track of the stack length and return the index of the allocated data in the stack.
struct Codegen {
stack_len : usize ,
}
fn alloc ( codegen : & Codegen , size : u64 ) - > usize {
let idx = codegen . stack_len ;
// allocate size bytes
// set the last 8 bytes to size
write_asm ! ( "sub rsp, {size}" ) ;
write_asm ! ( "mov qword [rsp], {size}" ) ;
codegen . stack_len + = 1 ;
return idx ;
}
And the function follows in the same vein.
fn pop ( codegen : & Codegen ) {
write_asm ! ( "add rsp, [rsp]" ) ;
codegen . stack_len - = 1 ;
}
Then we can implement our function. It will take in an index and return the address in a register.
fn idx2addr ( codegen : & Codegen , idx : usize , reg : & str ) {
write_asm ! ( "mov {reg}, rsp" ) ;
for ( let i = 0 ; i < codegen . stack_len - idx - 1 ; i + + ) {
write_asm ! ( "add {reg}, [{reg}]" ) ;
}
}
Now we can start with code-generating a simple number. We'll need to allocate some space on the stack and then put that number in it.
fn codegen_number ( codegen : & Codegen , number : Number ) - > usize {
let idx = codegen . alloc ( 16 ) ;
codegen . idx2addr ( idx , "rbx" ) ;
write_asm ! ( "mov qword [rbx], {number}" ) ;
return idx ;
}
Now let's codegen an add function.
fn codegen_add ( codegen : & Codegen , a : Expression , b : Expression ) - > usize {
// codegen the expressions
let a = codegen . codegen_expression ( a ) ;
let b = codegen . codegen_expression ( b ) ;
// dereference the indicies
codegen . idx2addr ( a , "rbx" ) ;
codegen . idx2addr ( b , "rcx" ) ;
write_asm ! ( "mov rbx, [rbx+8]" ) ;
write_asm ! ( "mov rcx, [rcx+8]" ) ;
// add the numbers
write_asm ! ( "add rbx, rcx" ) ;
// allocate and return the result
return codegen . codegen_number ( "rbx" ) ;
}
Now we'll write the expression generator.
fn codegen_expression ( codegen : & Codegen , expression : Expression ) - > usize {
match expression {
Number ( num ) = > codegen . codegen_number ( num ) ,
Add ( a , b ) = > codegen . codegen_add ( a , b ) ,
}
}
And from here it's easy to imagine how the rest of the language was implemented.
Of course, our code-generator will be generating assembly that utilises our funky stack.
void main ( ) {
1 + 2 ;
}
Gets converted roughly into:
_start :
; allocate 'a' on stack
sub rsp , 16
mov qword [ rsp ] , 16
mov qword [ rsp + 8 ] , 1
; allocate 'b' on stack
sub rsp , 16
mov qword [ rsp ] , 16
mov qword [ rsp + 8 ] , 2
; dereference indicies
mov rbx , rsp
mov rcx , rsp
add rcx , [ rcx ]
mov rbx , [ rbx + 8 ]
mov rcx , [ rcx + 8 ]
; add numbers
add rbx , rcx
; allocate result on the stack
sub rsp , 16
mov qword [ rsp ] , 16
mov qword [ rsp + 8 ] , rbx
; exit(1)
mov rax , 0
mov ebx , 1
int 0x80
This part (making the language) was the most hardest part by far. Writing the hash-map is a cake walk in comparison.
a cake walk (in comparison)
hash-maps are simple actually, they're just arrays with clever indexing. We take a key, hash it, modulo it over the length of the array, and that's our index. If there was a hash collision (two keys share the same hash) then resize the array and try again.
struct Bucket {
i32 key ,
i32 value ,
}
struct HashMap = [ ] Bucket ;
void hashmap_insert ( & HashMap map , i32 key , i32 value ) {
// get a random index
u64 i = hashfunction ( key ) % map . size ;
if ( map [ i ] . key = = key or map [ i ] . is_empty ( ) ) {
// if the key matches or the entry is empty.
// then insert!
map [ i ] . key = key ;
map [ i ] . value = value ;
} else {
// if the key doesn't match and the entry isn't empty.
// then we have a hash collision! let's resize the hashmap and try again.
hashmap_resize ( map , map . size * 2 ) ;
hashmap_insert ( map , key , value ) ;
}
}
The function is just going to create a new hashmap and reinsert all the items:
void hashmap_resize ( & HashMap map ) {
HashMap new_map = new HashMap ( ) ;
for ( i32 i = 0 ; i < map . size ; i + + ) {
if ( ! map [ i ] . is_empty ( ) ) {
hashmap_insert ( & new_map , map [ i ] . key , map [ i ] . value ) ;
}
}
* map = new_map
}
There are many different techniques that can avoid hash collisions and excessive reallocation by using adjacent slots or more "spread-out" hash functions. Honorable mention: Fibonacci Hashing
Conclusion
That's it! This is the end of the post. Obviously, this is completely useless and no one would want to use a stack like this because of the *awful* performance it gives. I mean look how many instructions a simple stack allocation takes... and oh god, the time complexity! It was fun though!