The ZINC not-so-abstract machine
The [ZINC] machine was specially designed to build less closures.
—Xavier Leroy
(Other names are ZINC environment machine and ZAM.)
We specify the execution model using an abstract machine. Examples of abstract machines for strict functional languages are [SECD], [FAM], and [CAM].
ZAM, the ZINC abstract machine, has a requirement that multiple application to \(k\) arguments should be efficient, almost as efficient as applying a \(k\) -ary function. This is not the case with the machines referenced above , since they build \(k\) - 1 intermediate closures to evaluate this multiple application.
Why care? Maybe this bad behaviour is unavoidable and these closures are the price to pay for the additional flexibility of curried functions.
Krivine Machine
The Krivine Machine 1 has three major components:
term,
stack,
environment.
The stack and the environment belong to the same recursive data structure. More precisely, the environment and the stack are lists of pairs <term, environment>, called closures.
The first element of the environment is the closure associated with the index 0, the second element corresponds to the closure associated with index 1 etc.
If the machine has to evaluate an index, it fetches there the pair <term, environment>, the closure that yields the term to be evaluated and the environment in which this term must be evaluated.
The rules explain how the machine transforms a state into another state, after identifying the patterns among the states.
The initial state aims to evaluate a term t, it is the state t, \(\square\), \(\square\) in which the term is t and the stack and the environment are empty.
The final state (in absence of error) is of the form \(\lambda t\), \(\square\), e, in other words, the resulting terms is an abstraction together with its environment and an empty stack.
This machine performs reduction to weak head normal form following the standard (leftmost-outermost) strategy. However, it represents \(\lambda\) -terms with closures, hence it does not perform substitutions on the fly, but delays them until it reduces variables.
It has but three instructions: \(\textbf{Access}\), \(\textbf{Push}\), and \(\textbf{Grab}\). A term in de Bruijn’s notation is compiled as follows:
The machine is equipped with
a code pointer,
a register holding the current environment (a list of closures, that is, pairs of code pointers and environments), and
a stack of closures.
The transition function is as follows:
The transition \(\text{App}\) removes the parameter of an application and put it on the stack for further evaluation
The transition \(\text{Abs}\) removes the \(\lambda\) of the term and pop up the closure from the top of the stack and put it on the top of the environment. This closure corresponds to the de Bruijn index 0 in the new environment
The transition \(\text{Zero}\) takes the first closure of the environment. The term of this closure becomes the current term and the environment of this closure becomes the current environment
The transition \(\text{Succ}\) removes the first closure of the environment list and decreases the value of the index
At all times the stack represents the spine of the term being reduced (that is, the them whose code is in the code pointer). The \(\textbf{Push}\) instruction performs one step of unrolling, and \(\textbf{Grab}\) corresponds to one step of \(\beta\) -reduction, that is it records the substitution in the environment.
Example
Evaluation of the term (\(\lambda\) 0 0) (\(\lambda\) 0),
Multiple applications
In the example above, note that each time a \(\textbf{Grab}\) instruction is encountered, that is each time an abstraction is reduced, one of the arguments is popped and added to the environment, then evaluation proceeds with the remaining code, that is the body of the abstraction. Those closures are built to “freeze” the arguments, and these are unavoidable with a lazy strategy.
Krivine’s machine with marks on the stack
A machine analogous to Krivine’s, but performing strict evaluation instead of lazy evaluation.
To perform strict evaluation with some variant of Krivine’s machine, we need first to be able to reduce some subterms of a given term to weak head normal form. The problem with Krivine’s machine is that it does not stop until the stack is empty.
What we need is a way to stop reduction even if there are arguments available on the stack. To this end, let’s put a mark on some of the closures awaiting in the stack; this mark says:
“Don’t put me in the environment, stop reducing, and resume another reduction”.
The modified Krivine’s machine has a fourth instruction, \(\textbf{Reduce}(c)\), to force reduction of \(c\) to weak head normal form, and a different semantics for \(\textbf{Grab}\).
In the following, marked closures are written \(\langle c, e \rangle\) instead of \((c, e)\).
The ZINC machine
The ZAM is
Krivine’s machine with marks specialised to call-by-value only, and
Extended to handle constants
Stack-based calling convention where functions may not consume all their arguments, but then their result must be applied to the remaining arguments.
|
code pointer |
|
stack pointer for the argument stack (grows downward) |
|
stack pointer for the return stack (grows downward) |
|
pointer to the current trap frame |
|
remaining part (heap-allocated) of the environment |
|
number of entries in the volatile part of the environment |
|
accumulator to hold intermediate results |
Krivine’s machine split into two stacks,
The argument stack: holds arguments to function calls, that is sequences of values, separated by marks
The return stack: holds (unallocated) closures, that is pairs of a code pointer and an environment
Two compilation schemes: one, written \(\mathcal{T} \textlbrackdbl E \textrbrackdbl\), is only valid for expressions \(E\) in tail-call position, that is expressions whose value is the value of the function body being evaluated; the other, written \(\mathcal{C} \textlbrackdbl E \textrbrackdbl\), is always valid, but usually less efficient.
The transitions of the ZAM corresponding to the generated instructions are given below. The first line is the state before the transition, the second one is the state after the transition.
Accessing local variables
The compilation scheme for the local variable of index \(n\) is:
The \(Access\) instruction has the following semantics:
Application
Tail applications are treated as in Krivine’s machine, since there is no need to allocate a new argument stack by pushing a mark. The \(Appterm\) instruction takes care of consing the first argument with the environment of the closure; this way, we do not have to put a \(Grab\) instruction at the beginning of each function. For other applications, we must push a mark on the argument stack to separate the “new” arguments and force reduction to weak normal form.
Abstractions
In tail-cal position, the \(\textbf{Grab}\) instruction simply pops one argument from the argument stack, and puts it in front of the environment. If all arguments have already been consumed, that is if there is a mark at the top of the stack, it builds the closure of the current code with the current environment and returns it to the called, while popping the mark.
Otherwise, we could push a mark, to allocate a new argument stack, and then do the same thing. Of course, \(\textbf{Grab}\) would always fail and return immediately the desired closure. To save pushing a mark, and then immediately test it, we use the cheaper \(\textbf{Cur}\) instruction, in this case.
The \(\textbf{Return}\) instruction that terminates the body of a function does not simply jump back to the caller. It is actually the symmetric of \(\textbf{Grab}\): it has to check if the argument stack is “empty” (i.e. if the top of stack is a mark). If this is the case, it destroys the mark and returns to the caller. But otherwise, it applies the result of the function (necessarily a closure, if the original program is well-typed) to the remaining arguments. This situation is the converse of partial application: a single function is given more argument than it can use. This is the case of the identity function in the following example:
Local declarations
The special case of \(\texttt{let}\), that is \(((\lambda x.M) N)\), is so common that it deserves a faster and simpler compilation scheme than actually applying an abstraction. It is enough to evaluate \(N\) and add its value to the environment, using the \(\textbf{Let}\) instruction, then to evaluate \(M\) in this modified environment; then, the \(\textbf{Endlet}\) instruction restores the original environment, if needed.
For recursive definitions, use the same trick suggested for the [CAM]: first, a dummy value is added to the environment (instruction \(\textbf{Dummy}\)), and \(N\) is evaluated in this modified environment; the dummy value is then physically updated with the actual value of \(N\) (instruction \(\textbf{Update}\)). This may fail to reach a fixpoint, since the physical update may be impossible (in case of an unboxed value, an integer for instance). However, it works fine for the most commonly used case: when \(M\) is an abstraction \(\lambda .P\).
Primitives
We write \(\textbf{Prim} (p)\) for the instruction associated with the primitive operation \(p\) (e.g. \(+\), \(=\), \(\texttt{car}\)). This instruction takes its first argument in the accumulator, the remaining arguments in the argument stack, and puts its result in the accumulator.
Environment representation
The ZINC machine was designed to build less closures. This opens the way for less costly (in terms of heap allocation) representations of environments.
When we don’t have to build any closures, the current environment does not have to survive the evaluation of the current function body. We can store it, or part of it, in some volatile location (stack or registers) that will be automatically reclaimed when the current function returns. We can go even further: assuming few closures are built, a sensible policy is to systematically put values being added to the environment in one of these volatile locations, and to copy them back to persistent storage (i.e. in the heap) when a closure is built.
In this approach, the environment \(0 \leftarrow a_0, \cdots , n \leftarrow a_n\) is represented by a persistent part \(a_k, \cdots , a_n\), which is the environment part of the closure most recently applied or built, and a volatile part \(a_0, \cdots , a_{k-1}\), which holds values added to the environment since then.
The linker and the runtime system
\(n\) |
a small integer (the size of an opcode) |
\(ofs\) |
an offset for a relative branch, relative to the address where it is stored; it uses two bytes |
\(tag\) |
the tag of a block (one byte) |
\(header\) |
a well-formed block header (four bytes) |
\(int_8\) |
a small integer constant (one byte) |
\(int_{16}\) |
a medium integer constant (two bytes) |
\(int_{32}\) |
a large integer constant (four bytes) |
\(float\) |
a floating-point number (four, eight or ten bytes, depending on the hardware) |
\(string\) |
a character string, stored as if it was in the heap |
Constants and literals
\(\textbf{Constbyte}(int_{ 8})\), \(\textbf{Constshort}(int_{ 16})\), \(\textbf{Constlong}(int_{ 32})\) |
Put an integer constant in the accumulator. \(\textbf{Constlong}\) allows loading any constant, as long as it is not a pointer in the heap. |
\(\textbf{Atom}(n)\), \(\textbf{Atom0}\), \(\cdots\), \(\textbf{Atom9}\) |
Put a pointer to a zero-sized block tagged \(n\) in the accumulator. |
\(\textbf{GetGlobal}(int_{ 16})\), \(\textbf{SetGlobal}(int_{ 16})\) |
Load (resp. store) the accumulator from the global variable number \(int_{16}\). |
Function handling
\(\textbf{Push}\), \(\textbf{Pushmark}\) |
Push the accumulator (resp. a mark) on the argument stack. |
\(\textbf{Apply}\), \(\textbf{Appterm}\) |
Call (resp. jump to) the closure contained in the accumulator. |
\(\textbf{Return}\) |
If there is a mark on top of the argument stack, pop it and return to the caller; otherwise, jump to the closure contained in the accumulator. |
Environment handling
\(\textbf{Access}(n)\), \(\textbf{Access0}\), \(\cdots\), \(\textbf{Access5}\) |
Fetch the \(n^{th}\) slot of the environment, and put it in the accumulator. |
\(\textbf{Let}\) |
Put the value of the accumulator in front of the environment. |
\(\textbf{Endlet}(n)\), \(\textbf{Endlet1}\) |
Throw away the first \(n\) local variables from the environment. |
\(\textbf{Dummies}(n)\) |
Put \(n\) dummy closures in front of the environment. |
\(\textbf{Update}(n)\) |
Physically update the \(n^{th}\) slot of the environment with the value of the accumulator. |
\(\textbf{Letrec1}(ofs)\) |
Same as \(\textbf{Dummies}(1); \space\textbf{Closure}(ofs); \space\textbf{Update}(0)\), a very frequent sequence, corresponding to \(\texttt{let rec f = function ... in ...}\) |
Building and deconstructing block
\(\textbf{Makeblock}(header)\), \(\textbf{Makeblock1}(tag)\), \(\cdots\), \(\textbf{Makeblock4}(tag)\) |
Allocate a block with a given header, initialise field 0 with the accumulator, and the remaining fields with values taken from the argument stack. |
\(\textbf{Getfield}(n)\), \(\textbf{Getfield0}\), \(\cdots\), \(\textbf{Getfield3}\) |
Access the \(n^{th}\) field of the block pointed to by the accumulator. |
\(\textbf{Setfield}(n)\), \(\textbf{Setfield0}\), \(\cdots\), \(\textbf{Setfield3}\) |
Physically replace the \(n^{th}\) field of the block pointed to by the accumulator with the value popped from the argument stack. |
Integers
\(\textbf{SuccInt}\), \(\textbf{PredInt}\), \(\textbf{NegInt}\), \(\textbf{AddInt}\), \(\textbf{SubInt}\), \(\textbf{MulInt}\), \(\textbf{DivInt}\), \(\textbf{ModInt}\), \(\textbf{AndInt}\), \(\textbf{OrInt}\), \(\textbf{XorInt}\), \(\textbf{ShiftLeftInt}\), \(\textbf{ShiftRightInt}\) |
Usual arithmetic operations on integers. |
Floating-point numbers
\(\textbf{Floatop}(n)\) |
Allocates room for one floating point result, and executes the sub-instruction \(n\), one of \(\textbf{AddFloat}\), \(\textbf{SubFloat}\), \(\textbf{MulFloat}\), \(\textbf{DivFloat}\), and the usual transcendental functions. |
\(\textbf{FloatOfInt}\), \(\textbf{IntOfFloat}\) |
Conversion from and integer, and truncation to an integer. |
Strings
\(\textbf{Makestring}\) |
Allocates a string of given length (in the accumulator). |
\(\textbf{StringLength}\) |
Length of the string contained in the accumulator. |
\(\textbf{GetChar}\), \(\textbf{SetChar}\) |
Read or modify one char in a string. |
\(\textbf{FillString}\), \(\textbf{BlitString}\) |
Fill a substring with a given character, or copy one substring into another. |
Predicates
\(\textbf{Boolnot}\) |
Negation: returns “true” (the zero-sized block tagged 1) if the block in the accumulator is tagged 0, and “false” (the zero-sized block tagged 0) otherwise. |
\(\textbf{Ed}\), \(\textbf{Equal}\) |
Pointer equality (resp. structural equality) between the accumulator and the top of the stack. |
\(\textbf{EqInt}\), \(\textbf{NeqInt}\), \(\textbf{LtInt}\), \(\textbf{GtInt}\), \(\textbf{LeInt}\), \(\textbf{GeInt}\) |
Usual comparison predicates on integers. |
\(\textbf{EqFloat}\), \(\textbf{NeqFloat}\), \(\textbf{LtFloat}\), \(\textbf{GtFloat}\), \(\textbf{LeFloat}\), \(\textbf{GeFloat}\) |
Usual comparison predicates on floating-point numbers. |
\(\textbf{EqString}\), \(\textbf{NeqString}\), \(\textbf{LtString}\), \(\textbf{GtString}\), \(\textbf{LeString}\), \(\textbf{GeString}\) |
Usual comparison predicates on strings. |
Branches and conditional branches
\(\textbf{Branch}(ofs)\) |
Unconditional relative jump. |
\(\textbf{BranchIf}(ofs)\), \(\textbf{Branchifnot}(ofs)\), \(\textbf{Branchifeqtag}(tag, \space ofs)\), \(\textbf{Branchifneqtag}(tag, \space ofs)\) |
Conditional branches on the tag \(t\) of the block pointed to by the accumulator: \(\textbf{Branchif}\) jumps if \(t \ne 0\), \(\textbf{Branchifnot}\) jumps if \(t = 0\), \(\textbf{Branchifeqtag}\) jumps if \(t = tag\), \(\textbf{Branchifneqtag}\) jumps if \(t \ne tag\). |
\(\textbf{Switch}(ofs_{ 0}, \space ..., \space ofs_{ k})\) |
Jumps to the offset \(ofs_t\), where \(t\) is the tag \(t\) of the block contained in the accumulator. |
\(\textbf{BranchifEq}(ofs)\), \(\textbf{BranchifNeq}(ofs)\), \(\textbf{BranchifEqual}(ofs)\), \(\textbf{BranchifNequal}(ofs)\), \(\textbf{BranchifLtInt}(ofs)\), \(\cdots\), \(\textbf{BranchifGetString}(ofs)\) |
Conditional branches corresponding to the binary predicates above. |
\(\textbf{BranchIfNeqImmInt}(int_{ 32}, \space ofs)\), \(\textbf{BranchifNeqImmFloat}(float, \space ofs)\), \(\textbf{BranchIfNeqImmString}(string, \space ofs)\) |
Compare the accumulator with the constant given as argument, and jumps if different (Useful for fast pattern matching). |
Miscellaneous
\(\textbf{CCall0}(n)\), \(\cdots\), \(\textbf{CCall5}(n)\) |
Call a C function, with 0 to 5 arguments. C functions are put in a special table; \(n\) is the number of the desired function. The firest argument is the value of the accumulator, the remaining arguments are popped from the argument stack. The result is put in the accumulator. |
\(\textbf{StartFun}\) |
Perform various checks such as stack overflow, pending break condition, and so on. Intended to be inserted at the beginning of each function and loop body. |
\(\textbf{Nop1}\), \(\textbf{Nop2}\), \(\textbf{Nop3}\) |
Do nothing, but skip respectively one, two, and three bytes. Used to align code on 16-bit or 32-bit boundaries. |
Footnotes