diff --git a/hawk/README.md b/hawk/README.md index 3e2d2e1a..b5ad047b 100644 --- a/hawk/README.md +++ b/hawk/README.md @@ -1,26 +1,137 @@ # Hawk -- [Language](#language) -- [Basic Modules](#basic-modules) -- [Embedding Guide](#embedding-guide) + - [Language](#language) + - [Basic Modules](#basic-modules) + - [Embedding Guide](#embedding-guide) ## Language Hawk implements most of the AWK programming language elements with extensions. +### Program Structure + +A Hawk program is composed of the following elements at the top level. + + - pattern-action block pair + - BEGIN action block pair + - END action block pair + - action block without a pattern + - pattern without an action block + - user-defined function + - @global variable declaration + - @include directive + - @pragma directive + +However, none of the above is mandatory. Hawk accepts an empty program. + +### Pattern-Action Block Pair + +A pattern-action pair is composed of a pattern and an action block as shown below: + + pattern { + statement + statement + ... + } + +A pattern can be one of the followings when specified: + + - expression + - first-expression, last-expression + - *BEGIN* + - *END* + +An action block is a series of statements enclosed in a curly bracket pair. The *BEGIN* and *END* patterns require an action block while normal patterns don't. When no action block is specified for a normal pattern, it is treated +as if `{ print $0; }` is specified. + +Hawk executes the action block for the *BEGIN* pattern when it starts executing a program; No start-up action is taken if no *BEGIN* pattern-action pair is specified. If a normal pattern-action pair and/or the *END* +pattern-action is specified, it reads the standard input stream. For each input line it reads, it checks if a normal pattern expression evaluates to true. For each pattern that evaluates to true, it executes the action block specified for +the pattern. When it reaches the end of the input stream, it executes the action block for the *END* pattern. + +Hawk allows zero or more *BEGIN* patterns. When multiple *BEGIN* patterns are specified, it executes their action blocks in their appearance order in the program. The same applies to the *END* patterns and their action blocks. It +doesn't read the standard input stream for programs composed of BEGIN blocks only whereas it reads the stream as long as there is an action block for the END pattern or a normal pattern. It evaluates an empty pattern to true; +As a result, the action block for an empty pattern is executed for all input lines read. + +You can compose a pattern range by putting 2 patterns separated by a comma. The pattern range evaluates to true once the first expression evaluates to true until the last expression evaluates to true. + +The following code snippet is a valid Hawk program that prints the string *hello, world* to the console and exits. + + + BEGIN { + print "hello, world"; + } + +This program prints "hello, world" followed by "hello, all" to the console. + + BEGIN { + print "hello, world"; + } + BEGIN { + print "hello, all"; + } + +For the following text input, + + abcdefgahijklmn + 1234567890 + opqrstuvwxyzabc + 9876543210 + +this program + + BEGIN { mr=0; my_nr=0; } + /abc/ { print "[" $0 "]"; mr++; } + { my_nr++; } + END { + print "total records: " NR; + print "total records selfcounted: " my_nr; + print "matching records: " mr; + } + +produces the output text like this: + + [abcdefgahijklmn] + [opqrstuvwxyzabc] + total records: 4 + total records selfcounted: 4 + matching records: 2 + +See the table for the order of execution indicated by the number and the result +of pattern evaluation enclosed in parenthesis. The action block is executed if +the evaluation result is true. + +| | START-UP | abcdefgahijklmn | 1234567890 | opqrstuvwxyzabc | 9876543210 | SHUTDOWN | +|-------------------------------------|----------|-----------------|------------|-----------------|------------|----------| +| `BEGIN { mr = 0; my_nr=0; }` | 1(true) | | | | | | +| `/abc/ { print "[" $0 "]"; mr++; }` | | 2(true) | 4(false) | 6(true) | 8(false) | | +| `{ my_nr++; }` | | 3(true) | 5(true) | 7(true) | 9(true) | | +| `END { print ... }` | | | | | | 10(true) | + +For the same input, this program shows how to use a ranged pattern. + + /abc/,/stu/ { print "[" $0 "]"; } + +It produces the output text like this: + + [abcdefgahijklmn] + [1234567890] + [opqrstuvwxyzabc] + +The regular expression `/abc/` matches the first input line and `/stu/` matches the +third input line. So the range is true between the first input line and the +third input line inclusive. ### Entry Point -You may change the entry point of your script by setting a function name with @pragma entry. +The typical execution begins with the BEGIN block, goes through pattern-action blocks, and eaches the END block. If you like to use a function as an entry point, you may set a function name with @pragma entry. -``` -@pragma entry main + @pragma entry main + + function main () + { + print "hello, world"; + } -function main () -{ - print "hello, world"; -} -``` ### Value @@ -41,24 +152,20 @@ function main () In AWK, the caller can pass an uninitialized variable as a function parameter and get a changed value if the callled function sets it to an array. -``` -function q(a) {a[1]=20; a[2]=30;} -BEGIN { q(x); for (i in x) print i, x[i]; } -``` + + function q(a) {a[1]=20; a[2]=30;} + BEGIN { q(x); for (i in x) print i, x[i]; } In Hawk, you can prefix the pramater name with & to indicate call-by-reference for the same effect. -``` -function q(&a) {a[1]=20; a[2]=30;} -BEGIN { q(x); for (i in x) print i, x[i]; } -``` + function q(&a) {a[1]=20; a[2]=30;} + BEGIN { q(x); for (i in x) print i, x[i]; } + Alternatively, you may form an array before passing it to a function. -``` -function q(a) {a[1]=20; a[2]=30;} -BEGIN { x[3]=99; q(x); for (i in x) print i, x[i]; }' -``` + function q(a) {a[1]=20; a[2]=30;} + BEGIN { x[3]=99; q(x); for (i in x) print i, x[i]; }' ## Basic Modules @@ -71,79 +178,92 @@ BEGIN { x[3]=99; q(x); for (i in x) print i, x[i]; }' ## Embedding Guide +To use hawk in your program, do the followings: -``` -#include -#include -#include +- create a hawk instance +- parse a source script +- create a runtime context +- trigger execution on the runtime context +- destroy the runtime context +- destroy the hawk instance -static const hawk_bch_t* src = - "BEGIN {" - " for (i=2;i<=9;i++)" - " {" - " for (j=1;j<=9;j++)" - " print i \"*\" j \"=\" i * j;" - " print \"---------------------\";" - " }" - "}"; +The following sample illustrates the basic steps hightlighed above. -int main () -{ - hawk_t* hawk = HAWK_NULL; - hawk_rtx_t* rtx = HAWK_NULL; - hawk_val_t* retv; + #include + #include + #include - hawk_parsestd_t psin[2]; + static const hawk_bch_t* src = + "BEGIN {" + " for (i=2;i<=9;i++)" + " {" + " for (j=1;j<=9;j++)" + " print i \"*\" j \"=\" i * j;" + " print \"---------------------\";" + " }" + "}"; - int ret; - - hawk = hawk_openstd(0, HAWK_NULL); - if (!hawk) + int main () { - fprintf (stderr, "ERROR: cannot open hawk\n"); - ret = -1; goto oops; + hawk_t* hawk = HAWK_NULL; + hawk_rtx_t* rtx = HAWK_NULL; + hawk_val_t* retv; + hawk_parsestd_t psin[2]; + int ret; + + hawk = hawk_openstd(0, HAWK_NULL); /* create a hawk instance */ + if (!hawk) + { + fprintf (stderr, "ERROR: cannot open hawk\n"); + ret = -1; goto oops; + } + + /* set up source script file to read in */ + memset (&psin, 0, HAWK_SIZEOF(psin)); + psin[0].type = HAWK_PARSESTD_BCS; /* specify the first script path */ + psin[0].u.bcs.ptr = (hawk_bch_t*)src; + psin[0].u.bcs.len = hawk_count_bcstr(src); + psin[1].type = HAWK_PARSESTD_NULL; /* indicate the no more script to read */ + + ret = hawk_parsestd(hawk, psin, HAWK_NULL); /* parse the script */ + if (ret <= -1) + { + hawk_logbfmt (hawk, HAWK_LOG_STDERR, "ERROR(parse): %js\n", hawk_geterrmsg(hawk)); + ret = -1; goto oops; + } + + /* create a runtime context needed for execution */ + rtx = hawk_rtx_openstd ( + hawk, + 0, + HAWK_T("hawk02"), + HAWK_NULL, /* stdin */ + HAWK_NULL, /* stdout */ + HAWK_NULL /* default cmgr */ + ); + if (!rtx) + { + hawk_logbfmt (hawk, HAWK_LOG_STDERR, "ERROR(rtx_open): %js\n", hawk_geterrmsg(hawk)); + ret = -1; goto oops; + } + + /* execute the BEGIN/pattern-action/END blocks */ + retv = hawk_rtx_loop(rtx); + if (!retv) + { + hawk_logbfmt (hawk, HAWK_LOG_STDERR, "ERROR(rtx_loop): %js\n", hawk_geterrmsg(hawk)); + ret = -1; goto oops; + } + + /* lowered the reference count of the returned value */ + hawk_rtx_refdownval (rtx, retv); + ret = 0; + + oops: + if (rtx) hawk_rtx_close (rtx); /* destroy the runtime context */ + if (hawk) hawk_close (hawk); /* destroy the hawk instance */ + return -1; } - memset (&psin, 0, HAWK_SIZEOF(psin)); - psin[0].type = HAWK_PARSESTD_BCS; - psin[0].u.bcs.ptr = (hawk_bch_t*)src; - psin[0].u.bcs.len = hawk_count_bcstr(src); - psin[1].type = HAWK_PARSESTD_NULL; - ret = hawk_parsestd(hawk, psin, HAWK_NULL); - if (ret <= -1) - { - hawk_logbfmt (hawk, HAWK_LOG_STDERR, "ERROR(parse): %js\n", hawk_geterrmsg(hawk)); - ret = -1; goto oops; - } - - rtx = hawk_rtx_openstd ( - hawk, - 0, - HAWK_T("hawk02"), - HAWK_NULL, /* stdin */ - HAWK_NULL, /* stdout */ - HAWK_NULL /* default cmgr */ - ); - if (!rtx) - { - hawk_logbfmt (hawk, HAWK_LOG_STDERR, "ERROR(rtx_open): %js\n", hawk_geterrmsg(hawk)); - ret = -1; goto oops; - } - - retv = hawk_rtx_loop(rtx); - if (!retv) - { - hawk_logbfmt (hawk, HAWK_LOG_STDERR, "ERROR(rtx_loop): %js\n", hawk_geterrmsg(hawk)); - ret = -1; goto oops; - } - - hawk_rtx_refdownval (rtx, retv); - ret = 0; - -oops: - if (rtx) hawk_rtx_close (rtx); - if (hawk) hawk_close (hawk); - return -1; -} -``` +If you prefer C++, you may use the Hawk/HawkStd wrapper classes to simplify the task. The C++ classes are inferior to the C equivalents in that they don't allow creation of multiple runtime contexts over a single hawk instance. diff --git a/hawk/samples/hawk02.c b/hawk/samples/hawk02.c index 6005582f..9acb0925 100644 --- a/hawk/samples/hawk02.c +++ b/hawk/samples/hawk02.c @@ -17,31 +17,31 @@ int main () hawk_t* hawk = HAWK_NULL; hawk_rtx_t* rtx = HAWK_NULL; hawk_val_t* retv; - hawk_parsestd_t psin[2]; - int ret; - hawk = hawk_openstd(0, HAWK_NULL); + hawk = hawk_openstd(0, HAWK_NULL); /* create a hawk instance */ if (!hawk) { fprintf (stderr, "ERROR: cannot open hawk\n"); ret = -1; goto oops; } + /* set up source script file to read in */ memset (&psin, 0, HAWK_SIZEOF(psin)); - psin[0].type = HAWK_PARSESTD_BCS; + psin[0].type = HAWK_PARSESTD_BCS; /* specify the first script path */ psin[0].u.bcs.ptr = (hawk_bch_t*)src; psin[0].u.bcs.len = hawk_count_bcstr(src); - psin[1].type = HAWK_PARSESTD_NULL; + psin[1].type = HAWK_PARSESTD_NULL; /* indicate the no more script to read */ - ret = hawk_parsestd(hawk, psin, HAWK_NULL); + ret = hawk_parsestd(hawk, psin, HAWK_NULL); /* parse the script */ if (ret <= -1) { hawk_logbfmt (hawk, HAWK_LOG_STDERR, "ERROR(parse): %js\n", hawk_geterrmsg(hawk)); ret = -1; goto oops; } + /* create a runtime context needed for execution */ rtx = hawk_rtx_openstd ( hawk, 0, @@ -56,6 +56,7 @@ int main () ret = -1; goto oops; } + /* execute the BEGIN/pattern-action/END blocks */ retv = hawk_rtx_loop(rtx); if (!retv) { @@ -63,12 +64,12 @@ int main () ret = -1; goto oops; } + /* lowered the reference count of the returned value */ hawk_rtx_refdownval (rtx, retv); ret = 0; oops: - if (rtx) hawk_rtx_close (rtx); - if (hawk) hawk_close (hawk); + if (rtx) hawk_rtx_close (rtx); /* destroy the runtime context */ + if (hawk) hawk_close (hawk); /* destroy the hawk instance */ return -1; } -