When I was a student, I did an intro to computer architecture course that was instructed by Peter Strazdins. I was struck by his enthusiasm and his apparent thirst to pass along everything he knew, and while I would never have called us friends I certainly was on very good terms with him and had more than a few very geeky chats with him after lectures and tutorials. My time learning under him certainly shaped a lot of current software engineering interests, particularly around VMs and low level coding.
One of the components of this course was an introduction to assembler, which started with a learning machine known as PeANUt, a stripped back RISC machine with simple but very limited functionality. The original Tcl/Tk application and accompanying PDF specification seems to have been lost to history, but I did find another spin on the same idea by another of my lecturers, Eric McCreath, who took over the course at some point. I’ve been playing around at implementing my own Rust version as a home project, which is mostly working albeit without a UI or assembler, which I may talk about at a later date once I feel the code is ready. If you want to see a relatively simple and ready to go VM in action without having to deal with my code, have a look at that one.
All code I’m going to put below will be on Github
How to start writing a VM
To start writing a VM, we need to understand the basics:
- Memory - Where we store data and program code. Data is addressable by a plain unsigned integer,
in this day and age that’s usually a 64-bit number though our machine will use 32-bit numbers
instead as they’re a little easier for humans to read and write without mistakes. We will use
two types of memory in our machine:
- Registers: Single number, fast access memory which can be read and written to directly by the CPU
- RAM: Random access memory, slightly slower but much larger memory. Values in here must be loaded into registers before the CPU can read them
- The CPU - The heart of our computer. It will execute instructions loaded into a register
(conveniently called the Instruction Register), perform calculations, and read from and write to
memory. It contains two major parts:
- The Control Unit - This decides what we’re going to execute. It typically contains at least a program counter register and an instruction register, the former pointing to an address in memory and the latter holding the currently executing instruction.
- The Algorithmic/Logic unit - This part of the CPU will generally have a few registers of its own for holding data that’s being processed, and will do our calculations for us
- The Bus - How we interact with the world outside the CPU. When we start writing virtual devices, the bus will be responsible for making sure the address we pass it is correctly written to or read from
- The Instruction Set - How we tell the computer what to do. We will craft our instructions in such a way that both we and our virtual machine can understand how that is to be interpreted
This is the 10000ft overview, but I would very much suggest reading the Wiki article on Von Neumann Architecture before going much further.
What Our VM Is Going to Look Like
Initially, our computer is going to be very, very simple. We will have a control unit with a program counter (PC) and an instruction register (IR), and an ALU with 4 registers of its own to use for data. We will also give the machine a memory space of 16 bits, or 0xFFFF of readable/writable memory. Only 0x0000-0xFFE0 of this will be RAM, though, we’ll be saving the rest of the address space for later
The basic structure of our running program will be:
pc = 0x0100
loop:
if not halted:
ir = read_bus(pc)
pc = pc + 1
execute(pc)
Where halted
will be a flag that tells us if we’re going to continue executing or not.
We will start with five very basic instructions now, and add more in the next post.
Anatomy of an instruction
Our computer is a 32-bit computer, meaning our control unit executes instructions that are 32 bits in length. We will use hexadecimal here to simplify reading.
We break our instructions up into four parts:
- The Opcode - One byte for the instruction to execute, for example an add or multiply operation
- Input 1 - 4 bits for the first input register, we will elaborate on how to address these later
- Input 2/Destination - 4 bits for the second input register, which will double as a destination value.
- Immediate data - 2 bytes for any immediate data. More on this later
Our first supported opcodes will be as follows
Hex Value | Mnemonic | Description |
---|---|---|
0x00 | HALT | Halt the machine |
0x01 | READ | Read from the memory address in I1 |
0x02 | WRITE | Write to the memory address in D |
0x03 | COPY | Copy from register I1 to D |
0x04 | ADD | Add I1 to I2 and store in D |
Were this a real CPU, each instruction here would incur a different CPU cost. We won’t model that in our VM, but it’s important to be aware of should you want to write something like a NES or Gameboy emulator.
Addressing Registers
Each register is addressed with 4 bits of data, meaning we can potentially refer to 16 different registers. We will use the value 0xF to refer to the immediate data in an instruction
Hex Value | Mnemonic | Description | Initial Value |
---|---|---|---|
0x0 | R0 | First ALU register | 0x00 |
0x1 | R1 | Second ALU register | 0x00 |
0x2 | R2 | Third ALU register | 0x00 |
0x3 | R4 | Fourth ALU register | 0x00 |
0xD | PC | Program counter | 0x100 |
0xE | IR | Instruction register | 0x00 |
0xF | #{n} | Immediate data | N/A |
We will add some more registers later, but for now this will help us get the basics up and going
Example of each instruction
HALT
Halt will be simple, usually appearing as a 0. As long as the first byte is 0x00 our machine will interpret it as a halt command.
0x00000000
0x00111111
What we throw into the I1, I2 and immediate data values doesn’t matter
READ
Read will load the address in I1 and place it into D, for example
0x01010000
Will load the address in R0 and load it into R1. Similarly, we can use the immediate value to read from a memory address:
0x01F00300
This will load the immediate value of 0x0300 into R0
WRITE
Write is just the opposite of read, where we take the value in I1 and write it into the memory address in I2
0x02010000
This will write the value in R0 into the address in R1
COPY
Copy will set D to the value in I1
0x03F00400
This will load the value 0x0400 from the immediate data into R0
ADD
Add sums the values in I1 and I2 and places them into D
0x04010000
Will add the values in R0 and R1 and place the result in R1
The Fun Part: Implementation
We’ll start with the most straight forward part: The memory
Memory implementation
We’ll do a little bit of preparation for future changes by implementing a device which makes it easy to “plug in” new hardware to our VM
// BusDevice is an interface for devices we attach to the bus
type BusDevice interface {
// MemoryRange gives the memory range of the device
MemoryRange() *MemoryRange
// Read takes an address and returns the value at address
Read(address uint32) (uint32, error)
// Write writes value to address
Write(address, value uint32) error
}
Next we’ll do the memory. This will conform to the BusDevice
interface, and will
really just be a slice of uint32s under the hood. We don’t need anything more complex right
now
type Memory struct {
mem [0xFFE0]uint32
}
func (m *Memory) MemoryRange() *MemoryRange {
return &MemoryRange{
Start: 0x0000,
End: 0xFFE0,
}
}
func (m *Memory) Read(address uint32) (uint32, error) {
if address > 0xFFE0 {
return 0, fmt.Errorf("address %x out of range", address)
}
return m.mem[address], nil
}
func (m *Memory) Write(address, value uint32) error {
if address > 0xFFE0 {
return fmt.Errorf("address %x out of range", address)
}
m.mem[address] = value
return nil
}
func NewMemory() *Memory {
return &Memory{
mem: [0xFFE0]uint32{},
}
}
Bus Implementation
The next part we need to implement is the bus. It’s relatively simple, and will hold a number of devices which will be queried for their address range on reads and writes.
type Bus struct {
devices []BusDevice
}
func (b *Bus) Read(address uint32) (uint32, error) {
for _, d := range b.devices {
memRange := d.MemoryRange()
if memRange.Start <= address && address <= memRange.End {
return d.Read(address)
}
}
return 0, fmt.Errorf("bus read: unmapped address %x", address)
}
func (b *Bus) Write(address, value uint32) error {
for _, d := range b.devices {
memRange := d.MemoryRange()
if memRange.Start <= address && address <= memRange.End {
return d.Write(address, value)
}
}
return fmt.Errorf("bus write: unmapped address %x", address)
}
func NewBus(devices ...BusDevice) *Bus {
return &Bus{
devices: devices,
}
}
Register and Register Bank
Our registers are similar to memory, and will require Get
and Set
methods.
We’ll also create a RegisterBank
type to make it easier for us to look them up
package internal
import "fmt"
const (
R0 uint8 = iota
R1
R2
R3
__reserved1
__reserved2
__reserved3
__reserved4
__reserved5
__reserved6
__reserved7
__reserved8
__reserved9
PC
IR
IMMEDIATE
)
type Register struct {
value uint32
}
type RegisterBank struct {
registerMap map[uint8]*Register
}
func (r *RegisterBank) GetRegister(name uint8) (*Register, error) {
if reg, ok := r.registerMap[name]; ok {
return reg, nil
}
return nil, fmt.Errorf("no such register")
}
func NewRegisterBank() *RegisterBank {
return &RegisterBank{
registerMap: map[uint8]*Register{
R0: &Register{0x00},
R1: &Register{0x00},
R2: &Register{0x00},
R3: &Register{0x00},
PC: &Register{0x100},
IR: &Register{0x00},
},
}
}
Those __reserved
constants will come in handy later, when we add some more
registers.
The CPU
We’ll keep this simple for now, I have a couple of ideas to make it maybe a little easier to extend but they’ll require a bit of experimentation to get right.
The first thing we’ll need is a CPU
type, which will hold a bus pointer and a
register bank. We’ll also add a halt
flag, but this will be temporary until
the next part.
type CPU struct {
registers *RegisterBank
bus *Bus
halted bool
}
func NewCPU(registers *RegisterBank, bus *Bus) *CPU {
return &CPU{
registers: registers,
bus: bus,
halted: false,
}
}
Next, we’ll implement the individual instructions. You’ll notice on an error we’ll just halt the machine for now. We’ll also add some constants to make it easier to see where we’re looking for those opcodes
const (
HALT = iota
READ
WRITE
COPY
ADD
)
func (c *CPU) halt(_, _ *Register) {
c.halted = true
}
func (c *CPU) read(i1, i2 *Register) {
val, err := c.bus.Read(i1.value)
if err != nil {
c.halted = true
return
}
i2.value = val
}
func (c *CPU) write(i1, i2 *Register) {
err := c.bus.Write(i2.value, i1.value)
if err != nil {
c.halted = true
return
}
}
func (c *CPU) copy(i1, i2 *Register) {
i2.value = i1.value
}
func (c *CPU) add(i1, i2 *Register) {
i2.value = i1.value + i2.value
}
Finally, we’ll add Tick
and executeInstruction
methods. These will load an
instruction into the IR and execute it if the machine is not halted. We will
decode our instruction into the opcode, the register indicies and the immediate
data and pass this along to the instruction.
func (c *CPU) executeInstruction(instruction uint32) error {
opcode := uint8(instruction >> 24)
regIndex1 := uint8((instruction & 0x00F00000) >> 20)
regIndex2 := uint8((instruction & 0x000F0000) >> 16)
imm := instruction & 0x0000FFFF
var i1, i2 *Register
var err error
if regIndex1 == 0xF {
i1 = &Register{value: imm}
} else {
i1, err = c.registers.GetRegister(regIndex1)
if err != nil {
return err
}
}
if regIndex2 == 0xF {
i2 = &Register{value: imm}
} else {
i2, err = c.registers.GetRegister(regIndex2)
if err != nil {
return err
}
}
switch opcode {
case HALT:
c.halt(i1, i2)
case READ:
c.read(i1, i2)
case WRITE:
c.write(i1, i2)
case COPY:
c.copy(i1, i2)
case ADD:
c.add(i1, i2)
}
return nil
}
func (c *CPU) Tick() error {
if c.Halted {
return fmt.Errorf("cannot tick on a Halted machine")
}
ir, err := c.registers.GetRegister(IR)
if err != nil {
return err
}
pc, err := c.registers.GetRegister(PC)
if err != nil {
return err
}
ir.value, err = c.bus.Read(pc.value)
if err != nil {
return err
}
pc.value++
err = c.executeInstruction(ir.value)
if err != nil {
c.Halted = true
}
return err
}
Just to wrap this up nicely, we’ll add in some tests to make sure each type
of instruction can decode properly. Notice that we load the instruction at
0x0100
, which is where the program counter will reset to on initialisation
func TestCPU(t *testing.T) {
t.Run("test halt", func(t *testing.T) {
registers := NewRegisterBank()
bus := NewBus(NewMemory())
cpu := NewCPU(registers, bus)
bus.Write(0x100, 0x00)
err := cpu.Tick()
assert.NoError(t, err)
assert.True(t, cpu.Halted)
})
t.Run("test read", func(t *testing.T) {
registers := NewRegisterBank()
bus := NewBus(NewMemory())
cpu := NewCPU(registers, bus)
registers.registerMap[R0].value = 0x1000
bus.Write(0x100, 0x01010000)
bus.Write(0x1000, 0xFFFF)
err := cpu.Tick()
assert.NoError(t, err)
assert.False(t, cpu.Halted)
assert.Equal(t, uint32(0xFFFF), registers.registerMap[R1].value)
})
t.Run("test write", func(t *testing.T) {
registers := NewRegisterBank()
bus := NewBus(NewMemory())
cpu := NewCPU(registers, bus)
registers.registerMap[R0].value = 0xFFFF
registers.registerMap[R1].value = 0x1000
bus.Write(0x100, 0x02010000)
err := cpu.Tick()
assert.NoError(t, err)
assert.False(t, cpu.Halted)
val, err := bus.Read(0x1000)
assert.NoError(t, err)
assert.Equal(t, uint32(0xFFFF), val)
})
t.Run("test copy", func(t *testing.T) {
registers := NewRegisterBank()
bus := NewBus(NewMemory())
cpu := NewCPU(registers, bus)
registers.registerMap[R0].value = 0xFFFF
bus.Write(0x100, 0x03010000)
err := cpu.Tick()
assert.NoError(t, err)
assert.False(t, cpu.Halted)
assert.Equal(t, uint32(0xFFFF), registers.registerMap[R1].value)
})
t.Run("test add", func(t *testing.T) {
registers := NewRegisterBank()
bus := NewBus(NewMemory())
cpu := NewCPU(registers, bus)
registers.registerMap[R1].value = 0x10
bus.Write(0x100, 0x04F10003)
err := cpu.Tick()
assert.NoError(t, err)
assert.False(t, cpu.Halted)
assert.Equal(t, uint32(0x13), registers.registerMap[R1].value)
})
}
The Main Function
Finally, let’s write a little program to check to make sure our machine is working correctly. We’ll add 2 numbers together and store them at a memory address
func main() {
// Load this in at mem[0x100]
// This program will copy the value 0x05 into R0, 0x10 into
// R1, and 0x1000 into R2, before executing an add on R0 & R1
// and writing the contents into the address in R2. If all is
// working, at the end of the program we should see the value
// 0x15 at memory address 0x1000
program := []uint32{
0x03F00005,
0x03F10010,
0x03F21000,
0x04010000,
0x02120000,
0x00000000,
}
registers := internal.NewRegisterBank()
mem := internal.NewMemory()
for i, p := range program {
err := mem.Write(uint32(i+0x100), p)
if err != nil {
fmt.Printf("error writing to memory location 0x%x: %v\n",
i, err)
return
}
}
bus := internal.NewBus(mem)
cpu := internal.NewCPU(registers, bus)
for !cpu.Halted {
cpu.Tick()
}
val, err := mem.Read(0x1000)
if err != nil {
fmt.Printf("could not read memory at 0x1000: %v\n", err)
return
}
fmt.Printf("the value at address 0x1000 is 0x%x\n", val)
}
If we’re good (and I know we are), we should see the following output:
The value at address 0x1000 is 0x15
Conclusion
We have a very limited VM now. Next time we’ll add some more arithmetic, and a status register to track status changes in the machine. Hope someone out there found this useful!