I’m not 100% proud of this assembler, but here we go. I’ve hacked one together that works well enough, but I don’t think I’ll show much code here. Code is here for the curious.
Source Material
Originally I was completely lost and thought I might need to write a full tokeniser and lexer, but I found an easier way after skimming through Assemblers and Loaders by David Saloman.
My Assembly Language
My assembly language is as simple as I could make it. I’d like to add macros at some point, but it’s not really up the top of my priority list right now, so can wait.
Each line could be one of three things:
- An instruction, which will compile directly to machine code, eg
ADD 0x123 R0
- A directive, which the assembler will dynamically figure out and generate binary for
- A comment, which is ignored
When a line is looked at by the assembler, the line is split on spaces. The assembler will then look at the first ‘word’ and do a lookup against a pregenerated instruction and directive table to see if there’s a match.
Most instructions will have the form
{LABEL} {MNEMONIC} {I1} {I2}
As I mentioned in my last post, I wish I had made the instructions a little more flexible, as it stands the destination can only really be another register, and only one of these input values can be an immediate value. As such this is a little clunky, as everything needs to be loaded into a register before being used and the ASM falling out of that design is a bit verbose and unwieldy.
The Basics
The assembler I’ve written is a two pass assembler, where the first pass goes over the provided file and any imports to build a symbol table. It then just translates these symbols into a single loadable file in its second pass.
Loadable files
The first thing I attacked was my ideal goal state. I came up with a rudimentary file format that was easy to convert to and from binary:
// LoadableFile represents a file we can load into memory
type LoadableFile struct {
BlockCount uint32
Flags uint32
Blocks []*MemoryBlock
}
type MemoryBlock struct {
Address uint32
BlockSize uint32
Words []uint32
}
func (l *LoadableFile) Save(w io.ByteWriter) error {
err := writeWords(w, l.BlockCount, l.Flags)
if err != nil {
return err
}
for bi, b := range l.Blocks {
err = writeWords(w, b.Address, b.BlockSize)
if err != nil {
return fmt.Errorf("error writing block %d: %v", bi, err)
}
err = writeWords(w, b.Words...)
if err != nil {
return fmt.Errorf("error writing block %d: %v", bi, err)
}
}
return nil
}
// Load loads a loadable file from a binary stream
func Load(bs io.ByteReader) (*LoadableFile, error) {
blockCount, err := nextWord(bs)
if err != nil {
return nil, fmt.Errorf("error reading block count: %v", err)
}
flags, err := nextWord(bs)
if err != nil {
return nil, fmt.Errorf("error reading flags: %v", err)
}
blocks, err := loadBlocks(blockCount, bs)
if err != nil {
return nil, fmt.Errorf("error loading blocks: %v", err)
}
return &LoadableFile{
BlockCount: blockCount,
Flags: flags,
Blocks: blocks,
}, nil
}
// loadBlocks from a stream
func loadBlocks(blockCount uint32, bs io.ByteReader) ([]*MemoryBlock, error) {
if blockCount == 0 {
return nil, nil
}
blocks := make([]*MemoryBlock, blockCount)
var err error
for i := 0; i < int(blockCount); i++ {
blocks[i], err = loadBlock(bs)
if err != nil {
return nil, fmt.Errorf("error loading block %d: %v", i, err)
}
}
return blocks, nil
}
// loadBlock from a stream
func loadBlock(bs io.ByteReader) (*MemoryBlock, error) {
address, err := nextWord(bs)
if err != nil {
return nil, fmt.Errorf("error reading address: %v", err)
}
blockSize, err := nextWord(bs)
if err != nil {
return nil, fmt.Errorf("error reading block size: %v", err)
}
words := make([]uint32, blockSize)
for i := 0; i < int(blockSize); i++ {
words[i], err = nextWord(bs)
if err != nil {
return nil, fmt.Errorf("error loading word %d: %v", i, err)
}
}
return &MemoryBlock{
Address: address,
BlockSize: blockSize,
Words: words,
}, nil
}
// nextWord gets the next word in a binary stream
func nextWord(bs io.ByteReader) (uint32, error) {
b0, err := bs.ReadByte()
if err != nil {
return 0, fmt.Errorf("error reading first byte: %v", err)
}
b1, err := bs.ReadByte()
if err != nil {
return 0, fmt.Errorf("error reading second byte: %v", err)
}
b2, err := bs.ReadByte()
if err != nil {
return 0, fmt.Errorf("error reading third byte: %v", err)
}
b3, err := bs.ReadByte()
if err != nil {
return 0, fmt.Errorf("error reading fourth byte: %v", err)
}
ret := uint32(b0) << 24
ret |= uint32(b1) << 16
ret |= uint32(b2) << 8
ret |= uint32(b3)
return ret, nil
}
func wordToBytes(w uint32) []byte {
return []byte{byte((w >> 24) & 0xFF), byte((w >> 16) & 0xFF), byte((w >> 8) & 0xFF), byte(w & 0xFF)}
}
func writeWords(w io.ByteWriter, ws ...uint32) error {
for _, word := range ws {
for _, b := range wordToBytes(word) {
err := w.WriteByte(b)
if err != nil {
return fmt.Errorf("could not write byte: %v", err)
}
}
}
return nil
}
Each memory block has an address associated with it, so theoretically when I add interrupts I’ve already got this part covered.
When we load this into our memory, we simply loop through the memory blocks, see where they start, and set the appropriate memory addresses.
Opcode and Directive tables
There are two static tables, one describing opcodes and another describing assembler directives. This makes it easy to build our list of symbols and do rudimentary error checking after our first pass.
// eg:
var opcodeTable = opcodeTableType{
"HALT": {
mnemonic: "HALT",
opcode: 0x00,
hasI1: false,
hasI2: false,
},
//...
}
var directiveTable = directiveTableType{
"WORD": {
mnemonic: "WORD",
sizeCalc: func(_ string) uint32 {
// Always one line long
return 1
},
assembleFunc: func(sourceLine string, _ symbols) ([]uint32, error) {
spl := strings.Split(sourceLine, " ")
col := 1
if len(spl) > 2 {
col = 2
}
if len(spl) <= col {
return nil, fmt.Errorf("not enough arguments to WORD directive")
}
arg := spl[col]
p, err := parseLiteral(arg)
if err != nil {
return nil, err
}
return []uint32{p}, nil
},
},
//...
}
These hold objects that conform to the following interface:
type assemblable interface {
calculateSize(sourceLine string) uint32
assemble(sourceLine string, symbolTable symbols) ([]uint32, error)
}
I noticed pretty early on that while directives will always require some special assembly logic, operations don’t require
that. This is reflected by the different assemble
implementations between the two types.
Intermediate File Format
The intermediate file format contains a symbol table and a list of symbols:
type firstPassFile struct {
symbolTable symbols
records []*symbol
}
Where a symbol
is defined as such
type symbol struct {
symbolType symbolType
label string
relativeLineNumber uint32
sourceLine string
assemblyLink assemblable
}
I pulled this source pretty much straight out of the assembler book, where there’s a symbol type, label, relative line number, source field, and a pointer to the opcode or directive table.
Assembler First Pass
The first pass’ responsibility is to build up a coherent symbol table before the second pass put everything together. This could be done in a more flexible way, but it’s not the main focus of this little project, so I won’t be going too far into it.
func firstPass(sourceFile io.Reader, lineNum uint32) (*firstPassFile, error) {
ln := lineNum
src := bufio.NewReader(sourceFile)
reloc := newFirstPassFile()
for {
line, _, err := src.ReadLine()
if err != nil {
if err == io.EOF {
break
}
return nil, err
}
rec, err := firstPassLine(ln, string(line))
if rec.label != "" {
_, ok := reloc.symbolTable[rec.label]
if ok {
reloc.symbolTable[rec.label] = &symbol{
label: rec.label,
symbolType: MTDF,
relativeLineNumber: ln,
sourceLine: string(line),
}
} else {
reloc.symbolTable[rec.label] = rec
}
}
reloc.records = append(reloc.records, rec)
if rec.assemblyLink != nil {
ln += rec.assemblyLink.calculateSize(string(line))
}
}
return reloc, nil
}
As you can see, we call the calculateSize
function on each opcode/directive link to calculate the address of each
symbol for insertion into the table.
Assembler Second Pass
The second pass simply builds up a list of 32 bit words and packs it into our executable format, and as such is pretty simple:
func secondPass(firstPass *firstPassFile) (*executable.LoadableFile, error) {
ret := &executable.LoadableFile{
BlockCount: 0x1,
Flags: 0,
Blocks: nil,
}
b := &executable.MemoryBlock{
Address: 0x100,
BlockSize: 0,
Words: nil,
}
for _, rec := range firstPass.records {
if rec.assemblyLink == nil {
continue
}
words, err := rec.assemble(firstPass.symbolTable)
if err != nil {
return nil, err
}
b.Words = append(b.Words, words...)
}
ret.Blocks = append(ret.Blocks, b)
b.BlockSize = uint32(len(b.Words))
return ret, nil
}
A Budget Terminal Device
I decided to created a small, memory mapped output device. This attaches itself to a memory range, and will just
print chars if something is put into the memory address 0xFFE1
or 0xFFE2
.
import (
"bufio"
"fmt"
"os"
)
// TerminalDevice is a bus device that backs directly onto a real terminal
type TerminalDevice struct {
consoleReader *bufio.Reader
}
const (
TERMINAL = uint32(0xFFE1) + iota
TERMINAL_INT
TERMINAL_X
TERMINAL_Y
__terminal_reserved1
)
func (t *TerminalDevice) MemoryRange() *MemoryRange {
// Addresses:
// * 0xFFE1 - Write a character to terminal or read a character
// * 0xFFE2 - Write a number to the terminal
// * 0xFFE3 - Cursor X position
// * 0xFFE4 - Cursor Y position
// * 0xFFE5 - reserved
return &MemoryRange{
Start: 0xFFE1,
End: 0xFFE5,
}
}
func (t *TerminalDevice) Read(address uint32) (uint32, error) {
// By default, Go doesn't provide a way to get unbuffered input from the console.
// Will leave this to the UI when I get to that
return 0, nil
}
func (t *TerminalDevice) Write(address, value uint32) error {
switch address {
case TERMINAL:
fmt.Printf("%c", rune(value))
case TERMINAL_INT:
fmt.Printf("%d", value)
}
return nil
}
func NewTerminal() *TerminalDevice {
return &TerminalDevice{
consoleReader: bufio.NewReader(os.Stdin),
}
}
Go is a little annoying, because ideally I think I would have wanted a getch
type functionality but all I have is
buffered input. Without a platform independent way of switching that of, I’ve just left it be for now and will build
something more full-featured when I add a GUI
Seeing the Whole thing in Action
I also altered the main file to take in a .bs
assembly file to assemble and run. I have the following assembly file
IMPORT term
ADDRESS HELLO R0
CALL PRINTSTRING
HALT
HELLO STRING Hello, world!
With an imported file that looks like the following on my search path:
; PRINTSTRING will print the string starting at the address in R0
PRINTSTRING READ R0 R1
EQ R1 0x00
RETURN
WRITE R1 0xFFE1
ADD 0x01 R0
JMP PRINTSTRING
Running it will produce the following:
$ go run ./cmd/main.go run -file ./examples/print_string.bs
Begin execution
-------
Hello, world!
-------
Machine has halted
Things to fix
My machine can assemble and run code, though I’d like to fix the following
- Import search paths are a bit clunky
- When I add interrupts, I’m going to need a way to make sure interrupt directives have a way to better alter the
memory addresses lower than
0x100
- The assembler itself is very simple, I would like to add some macros but that’s probably going to be a bit more complex than I really want to go right now
- As mentioned before, Go’s buffered console I/O makes it a pain to do a VGA terminal-like interaction. This will need to be addressed when I add a GUI for the machine
Conclusion
The assembler was a fun little holiday project, not something that I’ve done in the past but it was good learning how they worked. I originally thought it would be similar to a compiler, but was pleasantly surprised to find it was much less unwieldy. Code is available here