1000 lines of code: making a processor architecture from scratch

Monday, May 6th 2024

I implemented a basic processor, including an emulator, assembler and disassembler, in under 1k lines of C++ (not counting tests). It can't do much.

Check it out on GitHub: pascalpuffke/processor

It is assumed that you have at least some understanding of common technical terms -- I'm not going to explain what a register is, or how a stack functions. Dozens of other people have done that before way better than I ever could.

My history with system emulation

Back in 2020 I made a terrible mistake: I decided to learn the C++ programming language. As a first exercise I followed along to this excellent NES emulation series and in the process learned about how the legendary MOS 6502 microprocessor worked with its few registers, instructions and flags. I found that to be quite interesting, and it was super exciting to finally have the first program run on it, together with some memory, cartridge mappers and graphics!

In addition, videos like this one by jdh on YouTube are a fascinating watch. I can't wait to repeat all the same mistakes again!

Of course we can't forget the brilliant "Build a 65c02-based computer from scratch" series. While it's not about coming up with a novel architecture, it's a valuable resource to learn 6502 assembly programming, device I/O and how simple protocols can be implemented.

After all that, one might think that my project is basically a 6502 clone. Not quite. My last hands-on experience with it was three to four years ago, after all.

And how does it work?

The theory is super simple, making old designs like the 6502 or my custom abomination easy to implement in software! It boils down to these three operations:

Fetch an instruction from memory, pointed to by the program counter
Decode said instruction
Execute the instruction and modify flag registers if needed

If you're thinking this is just a big switch statement in an endless loop, you're correct!

What's actually difficult is coming up with an instruction set and being able to encode all instructions in whatever many bits you decide to limit your design to. I went with 16-bit instructions for mine, similar to other old microprocessors. More modern architectures like RISC-V or ARM use 32-bit instructions, making them much more flexible. In fact, x86-64 instructions can be up to 15 bytes long!

With that in mind, here's the current instruction encoding for my fictional processor arch, split into 4-bit segments:


	TYPE: 4-bit instruction type
	REG{1,2,3}: 4-bit register index
	IMM{1,2}: 8-bit immediate

	0bTYPE'0000'0000'0000 (no arguments)

	0bTYPE'REG1'0000'0000 (single reg argument, like 'push r0')

	0bTYPE'REG1'REG2'0000 (double reg argument, like 'ldr r0, r1')

	0bTYPE'REG1'REG2'REG3 (triple reg argument, like 'add r0, r1, r2')

	0bTYPE'REG1'IMM1'IMM2 (single reg + 8-bit immediate argument, like 'ldi r0, #255')

It's far from the most efficient, and I have to find a way to make it smarter. The 4-bit instruction type limitation especially causes issues. If there's no other way, maybe implementing macros in the assembler could help.

And here's all the different instructions it supports:

ldr [dst],[src] : load value at reg src into reg dst
ldi [dst],#imm : load immediate into reg dst
ldm [dst],[lb],[hb] : load value from memory at lb<<8|hb into reg dst
st [lb],[hb],[src] : store value from reg src into memory at lb<<8|hb
push [src] : push value of reg to top of stack
pop [dst] : pop value from bottom of stack to reg
add [dst],[lhs],[rhs] : put result of lhs+rhs into reg dst
sub [dst],[lhs],[rhs] : put result of lhs-rhs into reg dst
mul [dst],[lhs],[rhs] : put result of lhs*rhs into reg dst
div [dst],[lhs],[rhs] : put result of lhs/rhs into reg dst
jp [lb],[hb] : jump to addr at lb<<8|hb unconditionally
jz [lb],[hb] : jump to addr at lb<<8|hb if zero flag set
and [dst],[lhs],[rhs]
or [dst],[lhs],[rhs]
xor [dst],[lhs],[rhs]
done set 'kill' bit and stop execution

Is it turing complete? I have no idea! Probably not!

Another common problem of ancient computers is the hugely limited address space. My design and many others from the early 80s are limited to 16-bit addresses, meaning there is at most 64KB of memory to work with. That's not just RAM; it also has to somehow address read-only memory (programs), the stack has to go somewhere, memory-mapped hardware, ... Back in the day, banking was commonly implemented to work around this problem. I didn't feel like introducing such complexity, so 64K it is. The stack top is mapped starting at address 0xFF and is 256 bytes big, growing downwards. Everything else is just general-purpose memory, there's no RAM/ROM split - programs are poked into memory, and the processor starts executing whatever code is located beginning at address 0xFF00.

What can I do with it?

Not much! Because it is not emulating some other well-known CPU, there aren't any programs made for it. You can kind of think of it like a PlayStation 3 - It's neat, but there aren't any games. It's clunky and a pain to develop for. Anyway, here's a program that counts down from 10 to 0:


        ldi r0, #10
        ldi r1, #1
        ldi r7, #0xFF
        ldi r3, #0x10
        ldi r5, #0x0A
        sub r0, r0, r1
        jz r7, r3
        jp r7, r5
        done

Most of the code is just loading a bunch of registers with constant values used as memory locations for the 'jz' (jump if zero) and 'jp' (unconditional jump) instructions. The 'sub' instruction will set the zero flag once the value in the target register, r0, reaches 0. The following 'jz' does its thing and goes to the 'done' instruction previously skipped by the unconditional jump inbetween.

The Issues

I'm sure you can tell there are many. Some of the more obvious ones are listed in the README file on the Git repo.

The most annoying one might be specifying 16-bit memory addresses using only 8-bit registers, with no support for assembling 16-bit immediates. Just look at the example code from the previous section - it's unreadable. Labels in the assembly code would be a huge help.

Did you notice that I mentioned interrupts exactly zero times? There aren't any! Truly nothing can stop your code! Neither did I mention device support - memory mapping them seems like the obvious thing to do, but I haven't thought about it at all until just now.

... Why?

To which I ask you back: Why not? It's a fun learning experience, and I definitely recommend trying low-level projects like these out yourself! Immediately there's a million things I would now do differently if I were to start over again, which is a great sign of successful learning. Implementing simple processor designs in software is genuinely not at all difficult, though you should start testing your code as early as possible to catch bugs early on. Your ISA doesn't have to feature a thousand different instructions, programs can get by fine with just a couple tens - but these have to be perfect and bug-free.