Writing a VM Part One

Filed under vms on November 15, 2022

When I was a student, I did an intro to computer architecture course that was instructed by Peter Strazdins. I was struck by his enthusiasm and his apparent thirst to pass along everything he knew, and while I would never have called us friends I certainly was on very good terms with him and had more than a few very geeky chats with him after lectures and tutorials. My time learning under him certainly shaped a lot of current software engineering interests, particularly around VMs and low level coding.

One of the components of this course was an introduction to assembler, which started with a learning machine known as PeANUt, a stripped back RISC machine with simple but very limited functionality. The original Tcl/Tk application and accompanying PDF specification seems to have been lost to history, but I did find another spin on the same idea by another of my lecturers, Eric McCreath, who took over the course at some point. I’ve been playing around at implementing my own Rust version as a home project, which is mostly working albeit without a UI or assembler, which I may talk about at a later date once I feel the code is ready. If you want to see a relatively simple and ready to go VM in action without having to deal with my code, have a look at that one.

All code I’m going to put below will be on Github

How to start writing a VM

To start writing a VM, we need to understand the basics:

Memory - Where we store data and program code. Data is addressable by a plain unsigned integer, in this day and age that’s usually a 64-bit number though our machine will use 32-bit numbers instead as they’re a little easier for humans to read and write without mistakes. We will use two types of memory in our machine:
- Registers: Single number, fast access memory which can be read and written to directly by the CPU
- RAM: Random access memory, slightly slower but much larger memory. Values in here must be loaded into registers before the CPU can read them
The CPU - The heart of our computer. It will execute instructions loaded into a register (conveniently called the Instruction Register), perform calculations, and read from and write to memory. It contains two major parts:
- The Control Unit - This decides what we’re going to execute. It typically contains at least a program counter register and an instruction register, the former pointing to an address in memory and the latter holding the currently executing instruction.
- The Algorithmic/Logic unit - This part of the CPU will generally have a few registers of its own for holding data that’s being processed, and will do our calculations for us
The Bus - How we interact with the world outside the CPU. When we start writing virtual devices, the bus will be responsible for making sure the address we pass it is correctly written to or read from
The Instruction Set - How we tell the computer what to do. We will craft our instructions in such a way that both we and our virtual machine can understand how that is to be interpreted

This is the 10000ft overview, but I would very much suggest reading the Wiki article on Von Neumann Architecture before going much further.

What Our VM Is Going to Look Like

Initially, our computer is going to be very, very simple. We will have a control unit with a program counter (PC) and an instruction register (IR), and an ALU with 4 registers of its own to use for data. We will also give the machine a memory space of 16 bits, or 0xFFFF of readable/writable memory. Only 0x0000-0xFFE0 of this will be RAM, though, we’ll be saving the rest of the address space for later

The basic structure of our running program will be:

pc = 0x0100
loop:
  if not halted:
    ir = read_bus(pc)
    pc = pc + 1
    execute(pc)

Where halted will be a flag that tells us if we’re going to continue executing or not.

We will start with five very basic instructions now, and add more in the next post.

Anatomy of an instruction

Our computer is a 32-bit computer, meaning our control unit executes instructions that are 32 bits in length. We will use hexadecimal here to simplify reading.

We break our instructions up into four parts:

The Opcode - One byte for the instruction to execute, for example an add or multiply operation
Input 1 - 4 bits for the first input register, we will elaborate on how to address these later
Input 2/Destination - 4 bits for the second input register, which will double as a destination value.
Immediate data - 2 bytes for any immediate data. More on this later

Our first supported opcodes will be as follows

Hex Value	Mnemonic	Description
0x00	HALT	Halt the machine
0x01	READ	Read from the memory address in I1
0x02	WRITE	Write to the memory address in D
0x03	COPY	Copy from register I1 to D
0x04	ADD	Add I1 to I2 and store in D

Were this a real CPU, each instruction here would incur a different CPU cost. We won’t model that in our VM, but it’s important to be aware of should you want to write something like a NES or Gameboy emulator.

Addressing Registers

Each register is addressed with 4 bits of data, meaning we can potentially refer to 16 different registers. We will use the value 0xF to refer to the immediate data in an instruction

Hex Value	Mnemonic	Description	Initial Value
0x0	R0	First ALU register	0x00
0x1	R1	Second ALU register	0x00
0x2	R2	Third ALU register	0x00
0x3	R4	Fourth ALU register	0x00
0xD	PC	Program counter	0x100
0xE	IR	Instruction register	0x00
0xF	#{n}	Immediate data	N/A

We will add some more registers later, but for now this will help us get the basics up and going

Example of each instruction

HALT

Halt will be simple, usually appearing as a 0. As long as the first byte is 0x00 our machine will interpret it as a halt command.

0x00000000
0x00111111

What we throw into the I1, I2 and immediate data values doesn’t matter

READ

Read will load the address in I1 and place it into D, for example

0x01010000

Will load the address in R0 and load it into R1. Similarly, we can use the immediate value to read from a memory address:

0x01F00300

This will load the immediate value of 0x0300 into R0

WRITE

Write is just the opposite of read, where we take the value in I1 and write it into the memory address in I2

0x02010000

This will write the value in R0 into the address in R1

COPY

Copy will set D to the value in I1

0x03F00400

This will load the value 0x0400 from the immediate data into R0

ADD

Add sums the values in I1 and I2 and places them into D

0x04010000

Will add the values in R0 and R1 and place the result in R1

The Fun Part: Implementation

We’ll start with the most straight forward part: The memory

Memory implementation

We’ll do a little bit of preparation for future changes by implementing a device which makes it easy to “plug in” new hardware to our VM

// BusDevice is an interface for devices we attach to the bus
type BusDevice interface {
	// MemoryRange gives the memory range of the device
	MemoryRange() *MemoryRange
	// Read takes an address and returns the value at address
	Read(address uint32) (uint32, error)
	// Write writes value to address
	Write(address, value uint32) error
}

Next we’ll do the memory. This will conform to the BusDevice interface, and will really just be a slice of uint32s under the hood. We don’t need anything more complex right now

type Memory struct {
	mem [0xFFE0]uint32
}

func (m *Memory) MemoryRange() *MemoryRange {
	return &MemoryRange{
		Start: 0x0000,
		End:   0xFFE0,
	}
}

func (m *Memory) Read(address uint32) (uint32, error) {
	if address > 0xFFE0 {
		return 0, fmt.Errorf("address %x out of range", address)
	}
	return m.mem[address], nil
}

func (m *Memory) Write(address, value uint32) error {
	if address > 0xFFE0 {
		return fmt.Errorf("address %x out of range", address)
	}
	m.mem[address] = value
	return nil
}

func NewMemory() *Memory {
	return &Memory{
		mem: [0xFFE0]uint32{},
	}
}

Bus Implementation

The next part we need to implement is the bus. It’s relatively simple, and will hold a number of devices which will be queried for their address range on reads and writes.

type Bus struct {
	devices []BusDevice
}

func (b *Bus) Read(address uint32) (uint32, error) {
	for _, d := range b.devices {
		memRange := d.MemoryRange()
		if memRange.Start <= address && address <= memRange.End {
			return d.Read(address)
		}
	}
	return 0, fmt.Errorf("bus read: unmapped address %x", address)
}

func (b *Bus) Write(address, value uint32) error {
	for _, d := range b.devices {
		memRange := d.MemoryRange()
		if memRange.Start <= address && address <= memRange.End {
			return d.Write(address, value)
		}
	}
	return fmt.Errorf("bus write: unmapped address %x", address)
}

func NewBus(devices ...BusDevice) *Bus {
	return &Bus{
		devices: devices,
	}
}

Register and Register Bank

Our registers are similar to memory, and will require Get and Set methods. We’ll also create a RegisterBank type to make it easier for us to look them up

package internal

import "fmt"

const (
	R0 uint8 = iota
	R1
	R2
	R3
	__reserved1
	__reserved2
	__reserved3
	__reserved4
	__reserved5
	__reserved6
	__reserved7
	__reserved8
	__reserved9
	PC
	IR
	IMMEDIATE
)

type Register struct {
	value uint32
}

type RegisterBank struct {
	registerMap map[uint8]*Register
}

func (r *RegisterBank) GetRegister(name uint8) (*Register, error) {
	if reg, ok := r.registerMap[name]; ok {
		return reg, nil
	}
	return nil, fmt.Errorf("no such register")
}

func NewRegisterBank() *RegisterBank {
	return &RegisterBank{
		registerMap: map[uint8]*Register{
			R0: &Register{0x00},
			R1: &Register{0x00},
			R2: &Register{0x00},
			R3: &Register{0x00},
			PC: &Register{0x100},
			IR: &Register{0x00},
		},
	}
}

Those __reserved constants will come in handy later, when we add some more registers.

The CPU

We’ll keep this simple for now, I have a couple of ideas to make it maybe a little easier to extend but they’ll require a bit of experimentation to get right.

The first thing we’ll need is a CPU type, which will hold a bus pointer and a register bank. We’ll also add a halt flag, but this will be temporary until the next part.

type CPU struct {
	registers *RegisterBank
	bus       *Bus
	halted    bool
}

func NewCPU(registers *RegisterBank, bus *Bus) *CPU {
	return &CPU{
		registers: registers,
		bus:       bus,
		halted:    false,
	}
}

Next, we’ll implement the individual instructions. You’ll notice on an error we’ll just halt the machine for now. We’ll also add some constants to make it easier to see where we’re looking for those opcodes

const (
	HALT = iota
	READ
	WRITE
	COPY
	ADD
)

func (c *CPU) halt(_, _ *Register) {
	c.halted = true
}

func (c *CPU) read(i1, i2 *Register) {
	val, err := c.bus.Read(i1.value)
	if err != nil {
		c.halted = true
		return
	}
	i2.value = val
}

func (c *CPU) write(i1, i2 *Register) {
	err := c.bus.Write(i2.value, i1.value)
	if err != nil {
		c.halted = true
		return
	}
}

func (c *CPU) copy(i1, i2 *Register) {
	i2.value = i1.value
}

func (c *CPU) add(i1, i2 *Register) {
	i2.value = i1.value + i2.value
}

Finally, we’ll add Tick and executeInstruction methods. These will load an instruction into the IR and execute it if the machine is not halted. We will decode our instruction into the opcode, the register indicies and the immediate data and pass this along to the instruction.

func (c *CPU) executeInstruction(instruction uint32) error {
	opcode := uint8(instruction >> 24)
	regIndex1 := uint8((instruction & 0x00F00000) >> 20)
	regIndex2 := uint8((instruction & 0x000F0000) >> 16)
	imm := instruction & 0x0000FFFF

	var i1, i2 *Register
	var err error
	if regIndex1 == 0xF {
		i1 = &Register{value: imm}
	} else {
		i1, err = c.registers.GetRegister(regIndex1)
		if err != nil {
			return err
		}
	}
	if regIndex2 == 0xF {
		i2 = &Register{value: imm}
	} else {
		i2, err = c.registers.GetRegister(regIndex2)
		if err != nil {
			return err
		}
	}

	switch opcode {
	case HALT:
		c.halt(i1, i2)
	case READ:
		c.read(i1, i2)
	case WRITE:
		c.write(i1, i2)
	case COPY:
		c.copy(i1, i2)
	case ADD:
		c.add(i1, i2)
	}

	return nil
}

func (c *CPU) Tick() error {
	if c.Halted {
		return fmt.Errorf("cannot tick on a Halted machine")
	}
	ir, err := c.registers.GetRegister(IR)
	if err != nil {
		return err
	}
	pc, err := c.registers.GetRegister(PC)
	if err != nil {
		return err
	}
	ir.value, err = c.bus.Read(pc.value)
	if err != nil {
		return err
	}
	pc.value++
	err = c.executeInstruction(ir.value)
	if err != nil {
		c.Halted = true
	}
	return err
}

Just to wrap this up nicely, we’ll add in some tests to make sure each type of instruction can decode properly. Notice that we load the instruction at 0x0100, which is where the program counter will reset to on initialisation

func TestCPU(t *testing.T) {
	t.Run("test halt", func(t *testing.T) {
		registers := NewRegisterBank()
		bus := NewBus(NewMemory())
		cpu := NewCPU(registers, bus)

		bus.Write(0x100, 0x00)
		err := cpu.Tick()
		assert.NoError(t, err)
		assert.True(t, cpu.Halted)
	})
	t.Run("test read", func(t *testing.T) {
		registers := NewRegisterBank()
		bus := NewBus(NewMemory())
		cpu := NewCPU(registers, bus)

		registers.registerMap[R0].value = 0x1000
		bus.Write(0x100, 0x01010000)
		bus.Write(0x1000, 0xFFFF)

		err := cpu.Tick()
		assert.NoError(t, err)
		assert.False(t, cpu.Halted)
		assert.Equal(t, uint32(0xFFFF), registers.registerMap[R1].value)
	})
	t.Run("test write", func(t *testing.T) {
		registers := NewRegisterBank()
		bus := NewBus(NewMemory())
		cpu := NewCPU(registers, bus)

		registers.registerMap[R0].value = 0xFFFF
		registers.registerMap[R1].value = 0x1000
		bus.Write(0x100, 0x02010000)

		err := cpu.Tick()
		assert.NoError(t, err)
		assert.False(t, cpu.Halted)
		val, err := bus.Read(0x1000)
		assert.NoError(t, err)
		assert.Equal(t, uint32(0xFFFF), val)
	})
	t.Run("test copy", func(t *testing.T) {
		registers := NewRegisterBank()
		bus := NewBus(NewMemory())
		cpu := NewCPU(registers, bus)

		registers.registerMap[R0].value = 0xFFFF
		bus.Write(0x100, 0x03010000)

		err := cpu.Tick()
		assert.NoError(t, err)
		assert.False(t, cpu.Halted)
		assert.Equal(t, uint32(0xFFFF), registers.registerMap[R1].value)
	})
	t.Run("test add", func(t *testing.T) {
		registers := NewRegisterBank()
		bus := NewBus(NewMemory())
		cpu := NewCPU(registers, bus)

		registers.registerMap[R1].value = 0x10
		bus.Write(0x100, 0x04F10003)

		err := cpu.Tick()
		assert.NoError(t, err)
		assert.False(t, cpu.Halted)
		assert.Equal(t, uint32(0x13), registers.registerMap[R1].value)
	})
}

The Main Function

Finally, let’s write a little program to check to make sure our machine is working correctly. We’ll add 2 numbers together and store them at a memory address

func main() {
	// Load this in at mem[0x100]
	// This program will copy the value 0x05 into R0, 0x10 into
	// R1, and 0x1000 into R2, before executing an add on R0 & R1
	// and writing the contents into the address in R2. If all is
	// working, at the end of the program we should see the value
	// 0x15 at memory address 0x1000
	program := []uint32{
		0x03F00005,
		0x03F10010,
		0x03F21000,
		0x04010000,
		0x02120000,
		0x00000000,
	}

	registers := internal.NewRegisterBank()
	mem := internal.NewMemory()
	for i, p := range program {
		err := mem.Write(uint32(i+0x100), p)
		if err != nil {
			fmt.Printf("error writing to memory location 0x%x: %v\n",
				i, err)
			return
		}
	}
	bus := internal.NewBus(mem)
	cpu := internal.NewCPU(registers, bus)

	for !cpu.Halted {
		cpu.Tick()
	}
	val, err := mem.Read(0x1000)
	if err != nil {
		fmt.Printf("could not read memory at 0x1000: %v\n", err)
		return
	}
	fmt.Printf("the value at address 0x1000 is 0x%x\n", val)
}

If we’re good (and I know we are), we should see the following output:

The value at address 0x1000 is 0x15

Conclusion

We have a very limited VM now. Next time we’ll add some more arithmetic, and a status register to track status changes in the machine. Hope someone out there found this useful!