239 lines
10 KiB
Plaintext

awklisp: a Lisp interpreter in awk
version 1.2
Darius Bacon
darius@accesscom.com
http://www.accesscom.com/~darius/
1. Usage
mawk [-v profiling=1] -f awklisp <optional-Lisp-source-files>
The -v profiling=1 option turns call-count profiling on.
If you want to use it interactively, be sure to include '-' (for the standard
input) among the source files. For example:
mawk -f awklisp startup -
It should work with nawk and gawk, too, but even less quickly.
2. Overview
This program arose out of one-upmanship. At my previous job I had to
use MapBasic, an interpreter so astoundingly slow (around 100 times
slower than GWBASIC) that one must wonder if it itself is implemented
in an interpreted language. I still wonder, but it clearly could be:
a bare-bones Lisp in awk, hacked up in a few hours, ran substantially
faster. Since then I've added features and polish, in the hope of
taking over the burgeoning market for stately language
implementations.
This version tries to deal with as many of the essential issues in
interpreter implementation as is reasonable in awk (though most would
call this program utterly unreasonable from start to finish, perhaps...).
Awk's impoverished control structures put error recovery and tail-call
optimization out of reach, in that I can't see a non-painful way to code
them. The scope of variables is dynamic because that was easier to
implement efficiently. Subject to all those constraints, the language
is as Schemely as I could make it: it has a single namespace with
uniform evaluation of expressions in the function and argument positions,
and the Scheme names for primitives and special forms.
The rest of this file is a reference manual. My favorite tutorial would be
_The Little LISPer_ (see section 5, References); don't let the cute name
and the cartoons turn you off, because it's a really excellent book with
some mind-stretching material towards the end. All of its code will work
with awklisp, except for the last two chapters. (You'd be better off
learning with a serious Lisp implementation, of course.)
The file Impl-notes in this distribution gives an overview of the
implementation.
3. Expressions and their evaluation
Lisp evaluates expressions, which can be simple (atoms) or compound (lists).
An atom is a string of characters, which can be letters, digits, and most
punctuation; the characters may -not- include spaces, quotes, parentheses,
brackets, '.', '#', or ';' (the comment character). In this Lisp, case is
significant ( X is different from x ).
Atoms: atom 42 1/137 + ok? hey:names-with-dashes-are-easy-to-read
Not atoms: don't-include-quotes (or spaces or parentheses)
A list is a '(', followed by zero or more objects (each of which is an atom
or a list), followed by a ')'.
Lists: () (a list of atoms) ((a list) of atoms (and lists))
Not lists: ) ((()) (two) (lists)
The special object nil is both an atom and the empty list. That is,
nil = (). A non-nil list is called a -pair-, because it is represented by a
pair of pointers, one to the first element of the list (its -car-), and one to
the rest of the list (its -cdr-). For example, the car of ((a list) of stuff)
is (a list), and the cdr is (of stuff). It's also possible to have a pair
whose cdr is not a list; the pair with car A and cdr B is printed as (A . B).
That's the syntax of programs and data. Now let's consider their meaning. You
can use Lisp like a calculator: type in an expression, and Lisp prints its
value. If you type 25, it prints 25. If you type (+ 2 2), it prints 4. In
general, Lisp evaluates a particular expression in a particular environment
(set of variable bindings) by following this algorithm:
If the expression is a number, return that number.
If the expression is a non-numeric atom (a -symbol-), return the value of that
symbol in the current environment. If the symbol is currently unbound, that's
an error.
Otherwise the expression is a list. If its car is one of the symbols: quote,
lambda, if, begin, while, set!, or define, then the expression is a -special-
-form-, handled by special rules. Otherwise it's just a procedure call,
handled like this: evaluate each element of the list in the current environment,
and then apply the operator (the value of the car) to the operands (the values
of the rest of the list's elements). For example, to evaluate (+ 2 3), we
first evaluate each of its subexpressions: the value of + is (at least in the
initial environment) the primitive procedure that adds, the value of 2 is 2,
and the value of 3 is 3. Then we call the addition procedure with 2 and 3 as
arguments, yielding 5. For another example, take (- (+ 2 3) 1). Evaluating
each subexpression gives the subtraction procedure, 5, and 1. Applying the
procedure to the arguments gives 4.
We'll see all the primitive procedures in the next section. A user-defined
procedure is represented as a list of the form (lambda <parameters> <body>),
such as (lambda (x) (+ x 1)). To apply such a procedure, evaluate its body
in the environment obtained by extending the current environment so that the
parameters are bound to the corresponding arguments. Thus, to apply the above
procedure to the argument 41, evaluate (+ x 1) in the same environment as the
current one except that x is bound to 41.
If the procedure's body has more than one expression -- e.g.,
(lambda () (write 'Hello) (write 'world!)) -- evaluate them each in turn, and
return the value of the last one.
We still need the rules for special forms. They are:
The value of (quote <x>) is <x>. There's a shorthand for this form: '<x>.
E.g., the value of '(+ 2 2) is (+ 2 2), -not- 4.
(lambda <parameters> <body>) returns itself: e.g., the value of (lambda (x) x)
is (lambda (x) x).
To evaluate (if <test-expr> <then-exp> <else-exp>), first evaluate <test-expr>.
If the value is true (non-nil), then return the value of <then-exp>, otherwise
return the value of <else-exp>. (<else-exp> is optional; if it's left out,
pretend there's a nil there.) Example: (if nil 'yes 'no) returns no.
To evaluate (begin <expr-1> <expr-2>...), evaluate each of the subexpressions
in order, returning the value of the last one.
To evaluate (while <test> <expr-1> <expr-2>...), first evaluate <test>. If
it's nil, return nil. Otherwise, evaluate <expr-1>, <expr-2>,... in order,
and then repeat.
To evaluate (set! <variable> <expr>), evaluate <expr>, and then set the value
of <variable> in the current environment to the result. If the variable is
currently unbound, that's an error. The value of the whole set! expression
is the value of <expr>.
(define <variable> <expr>) is like set!, except it's used to introduce new
bindings, and the value returned is <variable>.
It's possible to define new special forms using the macro facility provided in
the startup file. The macros defined there are:
(let ((<var> <expr>)...)
<body>...)
Bind each <var> to its corresponding <expr> (evaluated in the current
environment), and evaluate <body> in the resulting environment.
(cond (<test-expr> <result-expr>...)... (else <result-expr>...))
where the final else clause is optional. Evaluate each <test-expr> in
turn, and for the first non-nil result, evaluate its <result-expr>. If
none are non-nil, and there's no else clause, return nil.
(and <expr>...)
Evaluate each <expr> in order, until one returns nil; then return nil.
If none are nil, return the value of the last <expr>.
(or <expr>...)
Evaluate each <expr> in order, until one returns non-nil; return that value.
If all are nil, return nil.
4. Built-in procedures
List operations:
(null? <x>) returns true (non-nil) when <x> is nil.
(atom? <x>) returns true when <x> is an atom.
(pair? <x>) returns true when <x> is a pair.
(car <pair>) returns the car of <pair>.
(cdr <pair>) returns the cdr of <pair>.
(cadr <pair>) returns the car of the cdr of <pair>. (i.e., the second element.)
(cddr <pair>) returns the cdr of the cdr of <pair>.
(cons <x> <y>) returns a new pair whose car is <x> and whose cdr is <y>.
(list <x>...) returns a list of its arguments.
(set-car! <pair> <x>) changes the car of <pair> to <x>.
(set-cdr! <pair> <x>) changes the cdr of <pair> to <x>.
(reverse! <list>) reverses <list> in place, returning the result.
Numbers:
(number? <x>) returns true when <x> is a number.
(+ <n> <n>) returns the sum of its arguments.
(- <n> <n>) returns the difference of its arguments.
(* <n> <n>) returns the product of its arguments.
(quotient <n> <n>) returns the quotient. Rounding is towards zero.
(remainder <n> <n>) returns the remainder.
(< <n1> <n2>) returns true when <n1> is less than <n2>.
I/O:
(write <x>) writes <x> followed by a space.
(newline) writes the newline character.
(read) reads the next expression from standard input and returns it.
Meta-operations:
(eval <x>) evaluates <x> in the current environment, returning the result.
(apply <proc> <list>) calls <proc> with arguments <list>, returning the result.
Miscellany:
(eq? <x> <y>) returns true when <x> and <y> are the same object. Be careful
using eq? with lists, because (eq? (cons <x> <y>) (cons <x> <y>)) is false.
(put <x> <y> <z>)
(get <x> <y>) returns the last value <z> that was put for <x> and <y>, or nil
if there is no such value.
(symbol? <x>) returns true when <x> is a symbol.
(gensym) returns a new symbol distinct from all symbols that can be read.
(random <n>) returns a random integer between 0 and <n>-1 (if <n> is positive).
(error <x>...) writes its arguments and aborts with error code 1.
5. References
Harold Abelson and Gerald J. Sussman, with Julie Sussman.
Structure and Interpretation of Computer Programs. MIT Press, 1985.
John Allen. Anatomy of Lisp. McGraw-Hill, 1978.
Daniel P. Friedman and Matthias Felleisen. The Little LISPer. Macmillan, 1989.
Roger Rohrbach wrote a Lisp interpreter, in old awk (which has no
procedures!), called walk . It can't do as much as this Lisp, but it
certainly has greater hack value. Cooler name, too. It's available at
http://www-2.cs.cmu.edu/afs/cs/project/ai-repository/ai/lang/lisp/impl/awk/0.html
6. Bugs
Eval doesn't check the syntax of expressions. This is a probably-misguided
attempt to bump up the speed a bit, that also simplifies some of the code.
The macroexpander in the startup file would be the best place to add syntax-
checking.