awklisp: a Lisp interpreter in awk version 1.2 Darius Bacon darius@accesscom.com http://www.accesscom.com/~darius/ 1. Usage mawk [-v profiling=1] -f awklisp The -v profiling=1 option turns call-count profiling on. If you want to use it interactively, be sure to include '-' (for the standard input) among the source files. For example: mawk -f awklisp startup - It should work with nawk and gawk, too, but even less quickly. 2. Overview This program arose out of one-upmanship. At my previous job I had to use MapBasic, an interpreter so astoundingly slow (around 100 times slower than GWBASIC) that one must wonder if it itself is implemented in an interpreted language. I still wonder, but it clearly could be: a bare-bones Lisp in awk, hacked up in a few hours, ran substantially faster. Since then I've added features and polish, in the hope of taking over the burgeoning market for stately language implementations. This version tries to deal with as many of the essential issues in interpreter implementation as is reasonable in awk (though most would call this program utterly unreasonable from start to finish, perhaps...). Awk's impoverished control structures put error recovery and tail-call optimization out of reach, in that I can't see a non-painful way to code them. The scope of variables is dynamic because that was easier to implement efficiently. Subject to all those constraints, the language is as Schemely as I could make it: it has a single namespace with uniform evaluation of expressions in the function and argument positions, and the Scheme names for primitives and special forms. The rest of this file is a reference manual. My favorite tutorial would be _The Little LISPer_ (see section 5, References); don't let the cute name and the cartoons turn you off, because it's a really excellent book with some mind-stretching material towards the end. All of its code will work with awklisp, except for the last two chapters. (You'd be better off learning with a serious Lisp implementation, of course.) The file Impl-notes in this distribution gives an overview of the implementation. 3. Expressions and their evaluation Lisp evaluates expressions, which can be simple (atoms) or compound (lists). An atom is a string of characters, which can be letters, digits, and most punctuation; the characters may -not- include spaces, quotes, parentheses, brackets, '.', '#', or ';' (the comment character). In this Lisp, case is significant ( X is different from x ). Atoms: atom 42 1/137 + ok? hey:names-with-dashes-are-easy-to-read Not atoms: don't-include-quotes (or spaces or parentheses) A list is a '(', followed by zero or more objects (each of which is an atom or a list), followed by a ')'. Lists: () (a list of atoms) ((a list) of atoms (and lists)) Not lists: ) ((()) (two) (lists) The special object nil is both an atom and the empty list. That is, nil = (). A non-nil list is called a -pair-, because it is represented by a pair of pointers, one to the first element of the list (its -car-), and one to the rest of the list (its -cdr-). For example, the car of ((a list) of stuff) is (a list), and the cdr is (of stuff). It's also possible to have a pair whose cdr is not a list; the pair with car A and cdr B is printed as (A . B). That's the syntax of programs and data. Now let's consider their meaning. You can use Lisp like a calculator: type in an expression, and Lisp prints its value. If you type 25, it prints 25. If you type (+ 2 2), it prints 4. In general, Lisp evaluates a particular expression in a particular environment (set of variable bindings) by following this algorithm: If the expression is a number, return that number. If the expression is a non-numeric atom (a -symbol-), return the value of that symbol in the current environment. If the symbol is currently unbound, that's an error. Otherwise the expression is a list. If its car is one of the symbols: quote, lambda, if, begin, while, set!, or define, then the expression is a -special- -form-, handled by special rules. Otherwise it's just a procedure call, handled like this: evaluate each element of the list in the current environment, and then apply the operator (the value of the car) to the operands (the values of the rest of the list's elements). For example, to evaluate (+ 2 3), we first evaluate each of its subexpressions: the value of + is (at least in the initial environment) the primitive procedure that adds, the value of 2 is 2, and the value of 3 is 3. Then we call the addition procedure with 2 and 3 as arguments, yielding 5. For another example, take (- (+ 2 3) 1). Evaluating each subexpression gives the subtraction procedure, 5, and 1. Applying the procedure to the arguments gives 4. We'll see all the primitive procedures in the next section. A user-defined procedure is represented as a list of the form (lambda ), such as (lambda (x) (+ x 1)). To apply such a procedure, evaluate its body in the environment obtained by extending the current environment so that the parameters are bound to the corresponding arguments. Thus, to apply the above procedure to the argument 41, evaluate (+ x 1) in the same environment as the current one except that x is bound to 41. If the procedure's body has more than one expression -- e.g., (lambda () (write 'Hello) (write 'world!)) -- evaluate them each in turn, and return the value of the last one. We still need the rules for special forms. They are: The value of (quote ) is . There's a shorthand for this form: '. E.g., the value of '(+ 2 2) is (+ 2 2), -not- 4. (lambda ) returns itself: e.g., the value of (lambda (x) x) is (lambda (x) x). To evaluate (if ), first evaluate . If the value is true (non-nil), then return the value of , otherwise return the value of . ( is optional; if it's left out, pretend there's a nil there.) Example: (if nil 'yes 'no) returns no. To evaluate (begin ...), evaluate each of the subexpressions in order, returning the value of the last one. To evaluate (while ...), first evaluate . If it's nil, return nil. Otherwise, evaluate , ,... in order, and then repeat. To evaluate (set! ), evaluate , and then set the value of in the current environment to the result. If the variable is currently unbound, that's an error. The value of the whole set! expression is the value of . (define ) is like set!, except it's used to introduce new bindings, and the value returned is . It's possible to define new special forms using the macro facility provided in the startup file. The macros defined there are: (let (( )...) ...) Bind each to its corresponding (evaluated in the current environment), and evaluate in the resulting environment. (cond ( ...)... (else ...)) where the final else clause is optional. Evaluate each in turn, and for the first non-nil result, evaluate its . If none are non-nil, and there's no else clause, return nil. (and ...) Evaluate each in order, until one returns nil; then return nil. If none are nil, return the value of the last . (or ...) Evaluate each in order, until one returns non-nil; return that value. If all are nil, return nil. 4. Built-in procedures List operations: (null? ) returns true (non-nil) when is nil. (atom? ) returns true when is an atom. (pair? ) returns true when is a pair. (car ) returns the car of . (cdr ) returns the cdr of . (cadr ) returns the car of the cdr of . (i.e., the second element.) (cddr ) returns the cdr of the cdr of . (cons ) returns a new pair whose car is and whose cdr is . (list ...) returns a list of its arguments. (set-car! ) changes the car of to . (set-cdr! ) changes the cdr of to . (reverse! ) reverses in place, returning the result. Numbers: (number? ) returns true when is a number. (+ ) returns the sum of its arguments. (- ) returns the difference of its arguments. (* ) returns the product of its arguments. (quotient ) returns the quotient. Rounding is towards zero. (remainder ) returns the remainder. (< ) returns true when is less than . I/O: (write ) writes followed by a space. (newline) writes the newline character. (read) reads the next expression from standard input and returns it. Meta-operations: (eval ) evaluates in the current environment, returning the result. (apply ) calls with arguments , returning the result. Miscellany: (eq? ) returns true when and are the same object. Be careful using eq? with lists, because (eq? (cons ) (cons )) is false. (put ) (get ) returns the last value that was put for and , or nil if there is no such value. (symbol? ) returns true when is a symbol. (gensym) returns a new symbol distinct from all symbols that can be read. (random ) returns a random integer between 0 and -1 (if is positive). (error ...) writes its arguments and aborts with error code 1. 5. References Harold Abelson and Gerald J. Sussman, with Julie Sussman. Structure and Interpretation of Computer Programs. MIT Press, 1985. John Allen. Anatomy of Lisp. McGraw-Hill, 1978. Daniel P. Friedman and Matthias Felleisen. The Little LISPer. Macmillan, 1989. Roger Rohrbach wrote a Lisp interpreter, in old awk (which has no procedures!), called walk . It can't do as much as this Lisp, but it certainly has greater hack value. Cooler name, too. It's available at http://www-2.cs.cmu.edu/afs/cs/project/ai-repository/ai/lang/lisp/impl/awk/0.html 6. Bugs Eval doesn't check the syntax of expressions. This is a probably-misguided attempt to bump up the speed a bit, that also simplifies some of the code. The macroexpander in the startup file would be the best place to add syntax- checking.