239 lines
10 KiB
Plaintext
239 lines
10 KiB
Plaintext
awklisp: a Lisp interpreter in awk
|
|
version 1.2
|
|
|
|
Darius Bacon
|
|
darius@accesscom.com
|
|
http://www.accesscom.com/~darius/
|
|
|
|
|
|
1. Usage
|
|
|
|
mawk [-v profiling=1] -f awklisp <optional-Lisp-source-files>
|
|
|
|
The -v profiling=1 option turns call-count profiling on.
|
|
|
|
If you want to use it interactively, be sure to include '-' (for the standard
|
|
input) among the source files. For example:
|
|
|
|
mawk -f awklisp startup -
|
|
|
|
It should work with nawk and gawk, too, but even less quickly.
|
|
|
|
|
|
2. Overview
|
|
|
|
This program arose out of one-upmanship. At my previous job I had to
|
|
use MapBasic, an interpreter so astoundingly slow (around 100 times
|
|
slower than GWBASIC) that one must wonder if it itself is implemented
|
|
in an interpreted language. I still wonder, but it clearly could be:
|
|
a bare-bones Lisp in awk, hacked up in a few hours, ran substantially
|
|
faster. Since then I've added features and polish, in the hope of
|
|
taking over the burgeoning market for stately language
|
|
implementations.
|
|
|
|
This version tries to deal with as many of the essential issues in
|
|
interpreter implementation as is reasonable in awk (though most would
|
|
call this program utterly unreasonable from start to finish, perhaps...).
|
|
Awk's impoverished control structures put error recovery and tail-call
|
|
optimization out of reach, in that I can't see a non-painful way to code
|
|
them. The scope of variables is dynamic because that was easier to
|
|
implement efficiently. Subject to all those constraints, the language
|
|
is as Schemely as I could make it: it has a single namespace with
|
|
uniform evaluation of expressions in the function and argument positions,
|
|
and the Scheme names for primitives and special forms.
|
|
|
|
The rest of this file is a reference manual. My favorite tutorial would be
|
|
_The Little LISPer_ (see section 5, References); don't let the cute name
|
|
and the cartoons turn you off, because it's a really excellent book with
|
|
some mind-stretching material towards the end. All of its code will work
|
|
with awklisp, except for the last two chapters. (You'd be better off
|
|
learning with a serious Lisp implementation, of course.)
|
|
|
|
The file Impl-notes in this distribution gives an overview of the
|
|
implementation.
|
|
|
|
|
|
3. Expressions and their evaluation
|
|
|
|
Lisp evaluates expressions, which can be simple (atoms) or compound (lists).
|
|
|
|
An atom is a string of characters, which can be letters, digits, and most
|
|
punctuation; the characters may -not- include spaces, quotes, parentheses,
|
|
brackets, '.', '#', or ';' (the comment character). In this Lisp, case is
|
|
significant ( X is different from x ).
|
|
|
|
Atoms: atom 42 1/137 + ok? hey:names-with-dashes-are-easy-to-read
|
|
Not atoms: don't-include-quotes (or spaces or parentheses)
|
|
|
|
A list is a '(', followed by zero or more objects (each of which is an atom
|
|
or a list), followed by a ')'.
|
|
|
|
Lists: () (a list of atoms) ((a list) of atoms (and lists))
|
|
Not lists: ) ((()) (two) (lists)
|
|
|
|
The special object nil is both an atom and the empty list. That is,
|
|
nil = (). A non-nil list is called a -pair-, because it is represented by a
|
|
pair of pointers, one to the first element of the list (its -car-), and one to
|
|
the rest of the list (its -cdr-). For example, the car of ((a list) of stuff)
|
|
is (a list), and the cdr is (of stuff). It's also possible to have a pair
|
|
whose cdr is not a list; the pair with car A and cdr B is printed as (A . B).
|
|
|
|
That's the syntax of programs and data. Now let's consider their meaning. You
|
|
can use Lisp like a calculator: type in an expression, and Lisp prints its
|
|
value. If you type 25, it prints 25. If you type (+ 2 2), it prints 4. In
|
|
general, Lisp evaluates a particular expression in a particular environment
|
|
(set of variable bindings) by following this algorithm:
|
|
|
|
If the expression is a number, return that number.
|
|
|
|
If the expression is a non-numeric atom (a -symbol-), return the value of that
|
|
symbol in the current environment. If the symbol is currently unbound, that's
|
|
an error.
|
|
|
|
Otherwise the expression is a list. If its car is one of the symbols: quote,
|
|
lambda, if, begin, while, set!, or define, then the expression is a -special-
|
|
-form-, handled by special rules. Otherwise it's just a procedure call,
|
|
handled like this: evaluate each element of the list in the current environment,
|
|
and then apply the operator (the value of the car) to the operands (the values
|
|
of the rest of the list's elements). For example, to evaluate (+ 2 3), we
|
|
first evaluate each of its subexpressions: the value of + is (at least in the
|
|
initial environment) the primitive procedure that adds, the value of 2 is 2,
|
|
and the value of 3 is 3. Then we call the addition procedure with 2 and 3 as
|
|
arguments, yielding 5. For another example, take (- (+ 2 3) 1). Evaluating
|
|
each subexpression gives the subtraction procedure, 5, and 1. Applying the
|
|
procedure to the arguments gives 4.
|
|
|
|
We'll see all the primitive procedures in the next section. A user-defined
|
|
procedure is represented as a list of the form (lambda <parameters> <body>),
|
|
such as (lambda (x) (+ x 1)). To apply such a procedure, evaluate its body
|
|
in the environment obtained by extending the current environment so that the
|
|
parameters are bound to the corresponding arguments. Thus, to apply the above
|
|
procedure to the argument 41, evaluate (+ x 1) in the same environment as the
|
|
current one except that x is bound to 41.
|
|
|
|
If the procedure's body has more than one expression -- e.g.,
|
|
(lambda () (write 'Hello) (write 'world!)) -- evaluate them each in turn, and
|
|
return the value of the last one.
|
|
|
|
We still need the rules for special forms. They are:
|
|
|
|
The value of (quote <x>) is <x>. There's a shorthand for this form: '<x>.
|
|
E.g., the value of '(+ 2 2) is (+ 2 2), -not- 4.
|
|
|
|
(lambda <parameters> <body>) returns itself: e.g., the value of (lambda (x) x)
|
|
is (lambda (x) x).
|
|
|
|
To evaluate (if <test-expr> <then-exp> <else-exp>), first evaluate <test-expr>.
|
|
If the value is true (non-nil), then return the value of <then-exp>, otherwise
|
|
return the value of <else-exp>. (<else-exp> is optional; if it's left out,
|
|
pretend there's a nil there.) Example: (if nil 'yes 'no) returns no.
|
|
|
|
To evaluate (begin <expr-1> <expr-2>...), evaluate each of the subexpressions
|
|
in order, returning the value of the last one.
|
|
|
|
To evaluate (while <test> <expr-1> <expr-2>...), first evaluate <test>. If
|
|
it's nil, return nil. Otherwise, evaluate <expr-1>, <expr-2>,... in order,
|
|
and then repeat.
|
|
|
|
To evaluate (set! <variable> <expr>), evaluate <expr>, and then set the value
|
|
of <variable> in the current environment to the result. If the variable is
|
|
currently unbound, that's an error. The value of the whole set! expression
|
|
is the value of <expr>.
|
|
|
|
(define <variable> <expr>) is like set!, except it's used to introduce new
|
|
bindings, and the value returned is <variable>.
|
|
|
|
It's possible to define new special forms using the macro facility provided in
|
|
the startup file. The macros defined there are:
|
|
|
|
(let ((<var> <expr>)...)
|
|
<body>...)
|
|
|
|
Bind each <var> to its corresponding <expr> (evaluated in the current
|
|
environment), and evaluate <body> in the resulting environment.
|
|
|
|
(cond (<test-expr> <result-expr>...)... (else <result-expr>...))
|
|
|
|
where the final else clause is optional. Evaluate each <test-expr> in
|
|
turn, and for the first non-nil result, evaluate its <result-expr>. If
|
|
none are non-nil, and there's no else clause, return nil.
|
|
|
|
(and <expr>...)
|
|
|
|
Evaluate each <expr> in order, until one returns nil; then return nil.
|
|
If none are nil, return the value of the last <expr>.
|
|
|
|
(or <expr>...)
|
|
|
|
Evaluate each <expr> in order, until one returns non-nil; return that value.
|
|
If all are nil, return nil.
|
|
|
|
|
|
4. Built-in procedures
|
|
|
|
List operations:
|
|
(null? <x>) returns true (non-nil) when <x> is nil.
|
|
(atom? <x>) returns true when <x> is an atom.
|
|
(pair? <x>) returns true when <x> is a pair.
|
|
(car <pair>) returns the car of <pair>.
|
|
(cdr <pair>) returns the cdr of <pair>.
|
|
(cadr <pair>) returns the car of the cdr of <pair>. (i.e., the second element.)
|
|
(cddr <pair>) returns the cdr of the cdr of <pair>.
|
|
(cons <x> <y>) returns a new pair whose car is <x> and whose cdr is <y>.
|
|
(list <x>...) returns a list of its arguments.
|
|
(set-car! <pair> <x>) changes the car of <pair> to <x>.
|
|
(set-cdr! <pair> <x>) changes the cdr of <pair> to <x>.
|
|
(reverse! <list>) reverses <list> in place, returning the result.
|
|
|
|
Numbers:
|
|
(number? <x>) returns true when <x> is a number.
|
|
(+ <n> <n>) returns the sum of its arguments.
|
|
(- <n> <n>) returns the difference of its arguments.
|
|
(* <n> <n>) returns the product of its arguments.
|
|
(quotient <n> <n>) returns the quotient. Rounding is towards zero.
|
|
(remainder <n> <n>) returns the remainder.
|
|
(< <n1> <n2>) returns true when <n1> is less than <n2>.
|
|
|
|
I/O:
|
|
(write <x>) writes <x> followed by a space.
|
|
(newline) writes the newline character.
|
|
(read) reads the next expression from standard input and returns it.
|
|
|
|
Meta-operations:
|
|
(eval <x>) evaluates <x> in the current environment, returning the result.
|
|
(apply <proc> <list>) calls <proc> with arguments <list>, returning the result.
|
|
|
|
Miscellany:
|
|
(eq? <x> <y>) returns true when <x> and <y> are the same object. Be careful
|
|
using eq? with lists, because (eq? (cons <x> <y>) (cons <x> <y>)) is false.
|
|
(put <x> <y> <z>)
|
|
(get <x> <y>) returns the last value <z> that was put for <x> and <y>, or nil
|
|
if there is no such value.
|
|
(symbol? <x>) returns true when <x> is a symbol.
|
|
(gensym) returns a new symbol distinct from all symbols that can be read.
|
|
(random <n>) returns a random integer between 0 and <n>-1 (if <n> is positive).
|
|
(error <x>...) writes its arguments and aborts with error code 1.
|
|
|
|
|
|
5. References
|
|
|
|
Harold Abelson and Gerald J. Sussman, with Julie Sussman.
|
|
Structure and Interpretation of Computer Programs. MIT Press, 1985.
|
|
|
|
John Allen. Anatomy of Lisp. McGraw-Hill, 1978.
|
|
|
|
Daniel P. Friedman and Matthias Felleisen. The Little LISPer. Macmillan, 1989.
|
|
|
|
Roger Rohrbach wrote a Lisp interpreter, in old awk (which has no
|
|
procedures!), called walk . It can't do as much as this Lisp, but it
|
|
certainly has greater hack value. Cooler name, too. It's available at
|
|
http://www-2.cs.cmu.edu/afs/cs/project/ai-repository/ai/lang/lisp/impl/awk/0.html
|
|
|
|
|
|
6. Bugs
|
|
|
|
Eval doesn't check the syntax of expressions. This is a probably-misguided
|
|
attempt to bump up the speed a bit, that also simplifies some of the code.
|
|
The macroexpander in the startup file would be the best place to add syntax-
|
|
checking.
|