148 lines
6.2 KiB
Plaintext
148 lines
6.2 KiB
Plaintext
Implementation notes
|
|
|
|
|
|
1. Overview
|
|
|
|
Since the code should be self-explanatory to anyone knowledgeable
|
|
about Lisp implementation, these notes assume you know Lisp but not
|
|
interpreters. I haven't got around to writing up a complete
|
|
discussion of everything, though.
|
|
|
|
The code for an interpreter can be pretty low on redundancy -- this is
|
|
natural because the whole reason for implementing a new language is to
|
|
avoid having to code a particular class of programs in a redundant
|
|
style in the old language. We implement what that class of programs
|
|
has in common just once, then use it many times. Thus an interpreter
|
|
has a different style of code, perhaps denser, than a typical
|
|
application program.
|
|
|
|
|
|
2. Data representation
|
|
|
|
Conceptually, a Lisp datum is a tagged pointer, with the tag giving
|
|
the datatype and the pointer locating the data. We follow the common
|
|
practice of encoding the tag into the two lowest-order bits of the
|
|
pointer. This is especially easy in awk, since arrays with
|
|
non-consecutive indices are just as efficient as dense ones (so we can
|
|
use the tagged pointer directly as an index, without having to mask
|
|
out the tag bits). (But, by the way, mawk accesses negative indices
|
|
much more slowly than positive ones, as I found out when trying a
|
|
different encoding.)
|
|
|
|
This Lisp provides three datatypes: integers, lists, and symbols. (A
|
|
modern Lisp provides many more.)
|
|
|
|
For an integer, the tag bits are zero and the pointer bits are simply
|
|
the numeric value; thus, N is represented by N*4. This choice of the
|
|
tag value has two advantages. First, we can add and subtract without
|
|
fiddling with the tags. Second, negative numbers fit right in.
|
|
(Consider what would happen if N were represented by 1+N*4 instead,
|
|
and we tried to extract the tag as N%4, where N may be either positive
|
|
or negative. Because of this problem and the above-mentioned
|
|
inefficiency of negative indices, all other datatypes are represented
|
|
by positive numbers.)
|
|
|
|
|
|
3. The evaluation/saved-bindings stack
|
|
|
|
The following is from an email discussion; it doesn't develop
|
|
everything from first principles but is included here in the hope
|
|
it will be helpful.
|
|
|
|
Hi. I just took a look at awklisp, and remembered that there's more
|
|
to your question about why we need a stack -- it's a good question.
|
|
The real reason is because a stack is accessible to the garbage
|
|
collector.
|
|
|
|
We could have had apply() evaluate the arguments itself, and stash
|
|
the results into variables like arg0 and arg1 -- then the case for
|
|
ADD would look like
|
|
|
|
if (proc == ADD) return is(a_number, arg0) + is(a_number, arg1)
|
|
|
|
The obvious problem with that approach is how to handle calls to
|
|
user-defined procedures, which could have any number of arguments.
|
|
Say we're evaluating ((lambda (x) (+ x 1)) 42). (lambda (x) (+ x 1))
|
|
is the procedure, and 42 is the argument.
|
|
|
|
A (wrong) solution could be to evaluate each argument in turn, and
|
|
bind the corresponding parameter name (like x in this case) to the
|
|
resulting value (while saving the old value to be restored after we
|
|
return from the procedure). This is wrong because we must not
|
|
change the variable bindings until we actually enter the procedure --
|
|
for example, with that algorithm ((lambda (x y) y) 1 x) would return
|
|
1, when it should return whatever the value of x is in the enclosing
|
|
environment. (The eval_rands()-type sequence would be: eval the 1,
|
|
bind x to 1, eval the x -- yielding 1 which is *wrong* -- and bind
|
|
y to that, then eval the body of the lambda.)
|
|
|
|
Okay, that's easily fixed -- evaluate all the operands and stash them
|
|
away somewhere until you're done, and *then* do the bindings. So
|
|
the question is where to stash them. How about a global array?
|
|
Like
|
|
|
|
for (i = 0; arglist != NIL; ++i) {
|
|
global_temp[i] = eval(car[arglist])
|
|
arglist = cdr[arglist]
|
|
}
|
|
|
|
followed by the equivalent of extend_env(). This will not do, because
|
|
the global array will get clobbered in recursive calls to eval().
|
|
Consider (+ 2 (* 3 4)) -- first we evaluate the arguments to the +,
|
|
like this: global_temp[0] gets 2, and then global_temp[1] gets the
|
|
eval of (* 3 4). But in evaluating (* 3 4), global_temp[0] gets set
|
|
to 3 and global_temp[1] to 4 -- so the original assignment of 2 to
|
|
global_temp[0] is clobbered before we get a chance to use it. By
|
|
using a stack[] instead of a global_temp[], we finesse this problem.
|
|
|
|
You may object that we can solve that by just making the global array
|
|
local, and that's true; lots of small local arrays may or may not be
|
|
more efficient than one big global stack, in awk -- we'd have to try
|
|
it out to see. But the real problem I alluded to at the start of this
|
|
message is this: the garbage collector has to be able to find all the
|
|
live references to the car[] and cdr[] arrays. If some of those
|
|
references are hidden away in local variables of recursive procedures,
|
|
we're stuck. With the global stack, they're all right there for the
|
|
gc().
|
|
|
|
(In C we could use the local-arrays approach by threading a chain of
|
|
pointers from each one to the next; but awk doesn't have pointers.)
|
|
|
|
(You may wonder how the code gets away with having a number of local
|
|
variables holding lisp values, then -- the answer is that in every
|
|
such case we can be sure the garbage collector can find the values
|
|
in question from some other source. That's what this comment is
|
|
about:
|
|
|
|
# All the interpretation routines have the precondition that their
|
|
# arguments are protected from garbage collection.
|
|
|
|
In some cases where the values would not otherwise be guaranteed to
|
|
be available to the gc, we call protect().)
|
|
|
|
Oh, there's another reason why apply() doesn't evaluate the arguments
|
|
itself: it's called by do_apply(), which handles lisp calls like
|
|
(apply car '((x))) -- where we *don't* want the x to get evaluated
|
|
by apply().
|
|
|
|
|
|
4. Um, what I was going to write about
|
|
|
|
more on data representation
|
|
is_foo procedures slow it down by a few percent but increase clarity
|
|
(try replacing them and other stuff with macros, time it.)
|
|
|
|
gc: overview; how to write gc-safe code using protect(); point out
|
|
that relocating gcs introduce further complications
|
|
|
|
driver loop, macros
|
|
|
|
evaluation
|
|
globals for temp values because of recursion, space efficiency
|
|
environment -- explicit stack needed because of gc
|
|
|
|
error handling, or lack thereof
|
|
strategies for cheaply adding error recovery
|
|
|
|
I/O
|