An interpreter, a compiler
and a virtual machine
prerequisites: Basic experience with Coq or similar
In this post, we show how to implement an interpreter and a compiler for a small arithmetic language, in the Coq Proof Assistant, along with a virtual machine for running the output of the compiler. We implement the language in Coq such that we can later prove an equivalence relation between evaluation with an interpreter and a compiler, in a follow-up blog post.
We start by introducing the small arithmetic language in Section 2. Having defined our language, we introduce an interpreter in Section 3, which implements the operational semantics of the language. Before we introduce the corresponding compiler, we first have to build a virtual machine in Section 4 onto which we can execute the bytecode output of the compiler. Finally, we introduce the compiler in Section 5 and conclude in Section 6.
The first step towards implementing an interpreter is to define the language
which we want to interpret. Thus, we define a small arithmetic language with
which we can perform addition and multiplication of natural numbers. We define
the language as a formal grammar
by stating that a program consists of an expression,
e, which can either be:
- a literal,
Lit, that takes a natural number,
- an addition expression,
Plus, that takes two arithmetic expressions,
- a multiplication expression,
Mult, that takes two arithmetic expressions,
The above description yields the following grammar:
which we in turn can translate into an
Inductive type in Coq:
1 2 3 4 5 6 7 8 Inductive arithmetic_expression : Type := | Lit : nat -> arithmetic_expression | Plus : arithmetic_expression -> arithmetic_expression -> arithmetic_expression | Mult : arithmetic_expression -> arithmetic_expression -> arithmetic_expression.
Here, we state that any instance of the type
arithmetic_expression is either a
literal, an addition expression, or a multiplication expression, as described
above. Furthermore, we can construct instances of this type by applying the
three constructors of the definition. For example, if we want to create an
instance of the
arithmetic_expression type that corresponds to the expression
, we write the following:
1 Compute (Mult (Plus (Lit 2) (Lit 1)) (Lit 5)).
where the keyword
Compute is simply used to evaluate the expression. Note that
we have not yet computed the result of the expression, as we have not yet
implemented the semantics of the language.
Now that we have defined the grammar of our arithmetic expression language, we are ready to define an interpreter for the language in the next section.
Having defined our language as an inductive algebraic data type, we can now introduce a function that, given an element of this type, recursively traverses the structure of such an element and returns the result of evaluating the arithmetic expression corresponding to the element. The semantics of our arithmetic language are pretty straight forward:
- A literal expression,
Lit n, evaluates to the natural number,
- an addition expression,
Plus e e, evaluates to the sum of the two sub-expressions,
- a multiplication expression,
Mult e e, evaluates to the product of the two sub-expressions,
If we translate the description above into Coq, we get the following
1 2 3 4 5 6 Fixpoint interpret (e : arithmetic_expression) : nat := match e with | Lit n => n | Plus e1 e2 => (interpret e1) + (interpret e2) | Mult e1 e2 => (interpret e1) * (interpret e2) end.
interpret function pattern matches on the structure of the
e, and recursively evaluates its sub-expressions. If we
want to evaluate the expression , we first
translate it into the
arithmetic_expression language and then pass it to the
1 2 Compute (interpret (Mult (Mult (Lit 2) (Lit 5)) (Plus (Lit 1) (Lit 3)))).
When evaluated, the above expression yields the result
40 : nat as expected.
Thus, we have now implemented an interpreter – in six lines of code – which can
evaluate any expression of our arithmetic language. However, in order to
implement a corresponding compiler, we first have to take a look a virtual
4. Virtual machine
Before we can build a compiler for our arithmetic expression language, we first need a machine onto which we can execute the compiled source code. Thus, we construct a minimal stack machine with the following three bytecode instructions:
1 2 3 4 Inductive bytecode_instruction : Type := | PUSH : nat -> bytecode_instruction | ADD : bytecode_instruction | MUL : bytecode_instruction.
Furthermore, we define a bytecode program to be a list of byte code instructions:
1 Definition bytecode_program := list bytecode_instruction.
and the data stack of our virtual machine to be a list of natural numbers:
1 Definition data_stack := list nat.
With these three definitions taken care of, we move on to define the semantics
of the three byte code instructions and translate them into a corresponding
function. We define the instruction
PUSH to take a natural number which it
then pushes onto the stack, while the
MUL instructions each pop the
two topmost elements of the stack and adds or multiplies them, respectively. For
the sake of simplicity, we define the effects of executing
MUL on a
stack with less than two elements to be an unchanged stack. The above semantics
result in the function
execute_bytecode_instruction, which takes a
bytecode_instruction and a
data_stack and returns a new
capturing the effect of evaluating a
bytecode_instruction with respect to a
1 2 3 4 5 6 7 8 9 10 11 12 Fixpoint execute_bytecode_instruction (s : data_stack) (bc : bytecode_instruction) : data_stack := match bc, s with | PUSH n, s' => n :: s' | ADD, nil => nil | ADD, n :: nil => n :: nil | ADD, n1 :: n2 :: s' => n1 + n2 :: s' | MUL, nil => nil | MUL, n :: nil => n :: nil | MUL, n1 :: n2 :: s' => n1 * n2 :: s' end.
An example application of
execute_bytecode_instruction multiplies the two
elements on top of the stack
1 Compute (execute_bytecode_instruction (5 :: 3 :: 2 :: nil) MUL).
and returns the resulting stack
The last step we need, in order to finish our virtual machine, is to wrap the
execute_bytecode_instruction in a function that takes a whole bytecode program
and runs it on an initial data stack. This can be achieved by simply traversing
the list of bytecode instructions and executing them one-by-one, like so:
1 2 3 4 5 6 7 8 9 Fixpoint execute_bytecode_program (s : data_stack) (bcp : bytecode_program) : data_stack := match bcp, s with | nil, s' => s' | bci :: bcp', s' => execute_bytecode_program (execute_bytecode_instruction s' bci) bcp' end.
Now we can execute a whole bytecode program on our virtual stack machine by
execute_bytecode_program with a
bytecode_program and a
1 2 Compute (execute_bytecode_program nil (PUSH 2 :: PUSH 3 :: ADD :: PUSH 5 :: MUL :: PUSH 1 :: nil)).
If we step through the execution of the above program, the major steps are as follows:
- First we push
3onto the empty stack, giving us the stack
[3, 2], then
- we pop the two elements and push their result onto the stack, resulting in the
- we push a
5onto the stack and multiply the two elements such that we get the stack
- we push
1onto the stack, at which point we have run our whole bytecode program and return the final stack,
With our virtual machine implemented and tested, we can finally move on to construct our compiler.
Having introduced the needed set of bytecode instructions and seen how these can be executed on a virtual machine, we are ready to define a compiler that takes expressions of our arithmetic language as its input and generates a bytecode program as its output.
Returning to the constructors of our
arithmetic_expression language, we can
turn these into bytecode instructions:
- The literal constructor,
Lit n, can be directly translated into the act of returning the bytecode program consisting of the
PUSH ninstruction, while
- the addition expression,
Plus e1 e2, corresponds to the result of first compiling the second expression,
e2, followed by concatenating the result of compiling the first expression,
e1, and then concatenating the bytecode program consisting of the
- the multiplication expression,
Mult e1 e2, is identical to the compilation of the addition expression except for the use of the
MULinstruction rather than the
This brings us to the following definition of our compiler:
1 2 3 4 5 6 Fixpoint compile (e : arithmetic_expression) : bytecode_program := match e with | Lit n => PUSH n :: nil | Plus e1 e2 => (compile e2) ++ (compile e1) ++ (ADD :: nil) | Mult e1 e2 => (compile e2) ++ (compile e1) ++ (MUL :: nil) end.
which takes an
arithmetic_expression as its input and produces a
bytecode_program as its output. If we want to compile an
arithmetic_expression corresponding to , we pass it to
compile like so:
1 Compute (compile (Plus (Lit 5) (Mult (Lit 3) (Lit 2)))).
which results in the following
1 [PUSH 2, PUSH 3, MUL, PUSH 5, ADD].
Note that because we are working with a stack machine, the outputted program both flattens and reverses the compiled expression.
Now that we have finally defined our compiler, we can demonstrate the equivalence relation between interpretation of an arithmetic expression and compilation of an arithmetic expression followed by execution of the compiled bytecode program:
1 2 3 Compute let e := Plus (Lit 5) (Mult (Lit 3) (Lit 2)) in (interpret e :: nil, execute_bytecode_program nil (compile e)).
In this post, we have shown how to implement an interpreter, a compiler, and a virtual machine for a small arithmetic language. Furthermore, we have also shown how the two types of evaluation relate to each other.
Lastly, the language has been implemented in the Coq Proof Assistant such that we can prove an equivalence relation between interpretation of an arithmetic expression and compilation followed by execution of an arithmetic expression, which is exactly the topic of the next blog post.
We use the notation
(1 :: 2 :: 3 :: nil)interchangeably to denote the content of a
listin Coq. ↩