hyung-hwan 46f01ff267
All checks were successful
continuous-integration/drone/push Build is passing
fixed a double-free issue in eval_getbline()
2025-12-26 19:54:13 +09:00
2025-12-25 04:13:48 +09:00
2025-12-25 04:13:48 +09:00
2025-12-25 04:13:48 +09:00
2025-12-25 04:13:48 +09:00
2025-12-25 04:13:48 +09:00
2022-09-25 02:08:01 +09:00
2025-12-25 04:13:48 +09:00
2025-12-25 04:13:48 +09:00
2025-10-20 01:28:28 +09:00
2025-12-25 04:13:48 +09:00
2025-12-25 04:13:48 +09:00
2025-12-25 04:13:48 +09:00
2025-10-26 23:18:23 +09:00
2024-04-28 23:16:15 +09:00
2025-12-25 04:13:48 +09:00

Hawk - Embeddable AWK Interpreter in C/C++

Hawk is a stable and embeddable AWK interpreter written in C. It can run AWK scripts inside your own applications or as a standalone AWK engine. The library is stable, portable, and designed for projects that need a scripting engine with a small footprint.

Hawk

Table of Contents

Features

  • Full AWK interpreter - mostly POSIX AWK compatible, with additional extensions.
  • Embeddable library - integrate AWK scripting into C or C++ projects as an execution engine.
  • C and C++ APIs - core functions exposed in C, with convenient C++ wrapper classes available.
  • Flexible usage - usable as both a standalone command-line interpreter and a library.
  • Portable core - the base library depends only on the standard C library.
  • Optional extensions - loadable modules (e.g. MySQL access, FFI) can be built in or used via shared objects.
  • Mature and stable - developed and maintained for many years with proven reliability.
  • Embedded sed functionality - includes a sed engine that can be used from C/C++ or invoked via the CLI using --sed

Building Hawk From Source Code

Hawk uses autoconf and automake for building. Run the following commands to configure and compile Hawk:

$ ./configure ## This step offers various build options
$ make
$ make install

Embedding Hawk in C Applications

Here's an example of how Hawk can be embedded within a C application:

#include <hawk.h>
#include <stdio.h>
#include <string.h>

static const hawk_bch_t* src =
	"BEGIN { print ARGV[0];"
	"   for (i=2;i<=9;i++)"
	"   {"
	"       for (j=1;j<=9;j++)"
	"           print i \"*\" j \"=\" i * j;"
	"       print \"---------------------\";"
	"   }"
	"}";

int main ()
{
	hawk_t* hawk = HAWK_NULL;
	hawk_rtx_t* rtx = HAWK_NULL;
	hawk_val_t* retv;
	hawk_parsestd_t psin[2];
	int ret;

	hawk = hawk_openstd(0, HAWK_NULL); /* create a hawk instance */
	if (!hawk)
	{
		fprintf(stderr, "ERROR: cannot open hawk\n");
		ret = -1; goto oops;
	}

	/* set up source script file to read in */
	memset(&psin, 0, HAWK_SIZEOF(psin));
	psin[0].type = HAWK_PARSESTD_BCS;  /* specify the first script path */
	psin[0].u.bcs.ptr = (hawk_bch_t*)src;
	psin[0].u.bcs.len = hawk_count_bcstr(src);
	psin[1].type = HAWK_PARSESTD_NULL; /* indicate the no more script to read */

	ret = hawk_parsestd(hawk, psin, HAWK_NULL); /* parse the script */
	if (ret <= -1)
	{
		hawk_logbfmt(hawk, HAWK_LOG_STDERR, "ERROR(parse): %js\n", hawk_geterrmsg(hawk));
		ret = -1; goto oops;
	}

	/* create a runtime context needed for execution */
	rtx = hawk_rtx_openstd(
		hawk,
		0,
		HAWK_T("hawk02"), /* ARGV[0] */
		HAWK_NULL,  /* stdin */
		HAWK_NULL,  /* stdout */
		HAWK_NULL   /* default cmgr */
	);
	if (!rtx)
	{
		hawk_logbfmt(hawk, HAWK_LOG_STDERR, "ERROR(rtx_open): %js\n", hawk_geterrmsg(hawk));
		ret = -1; goto oops;
	}

	/* execute the BEGIN/pattern-action/END blocks */
	retv = hawk_rtx_loop(rtx); /* alternatively, hawk_rtx_exec(rtx, HAWK_NULL, 0) */
	if (!retv)
	{
		hawk_logbfmt(hawk, HAWK_LOG_STDERR, "ERROR(rtx_loop): %js\n", hawk_geterrmsg(hawk));
		ret = -1; goto oops;
	}

	/* lowered the reference count of the returned value */
	hawk_rtx_refdownval(rtx, retv);
	ret = 0;

oops:
	if (rtx) hawk_rtx_close(rtx); /* destroy the runtime context */
	if (hawk) hawk_close(hawk); /* destroy the hawk instance */
	return -1;
}

Embedding Hawk within an application involves a few key steps:

  • Creating a Hawk Instance: The hawk_openstd() function is used to create a new instance of the Hawk interpreter, which serves as the entry point for interacting with Hawk from within the application.
  • Parsing Scripts: The application can provide Hawk scripts as string literals or read them from files using the hawk_parsestd() function. This associates the scripts with the Hawk instance for execution.
  • Creating a Runtime Context: A runtime context is created using hawk_rtx_openstd(), encapsulating the state and configuration required for script execution, such as input/output streams and other settings.
  • Executing the Script: The hawk_rtx_loop() or hawk_rtx_exec() functions are used to execute the Hawk script within the created runtime context, returning a value representing the result of the execution.
  • Handling the Result: The application can check the returned value for successful execution and handle any errors or results as needed.
  • Cleaning Up: Finally, the application cleans up by closing the runtime context and destroying the Hawk instance using hawk_rtx_close() and hawk_close(), respectively.

By following this pattern, applications can seamlessly embed the Hawk interpreter, leveraging its scripting capabilities and data manipulation functionality while benefiting from its portability, efficiency, and extensibility.

Assuming the above sample code is stored in hawk02.c and the built Hawk library has been installed properly, you may compile the sample code by running the following commands:

$ gcc -Wall -O2 -o hawk02 hawk02.c -lhawk

The actual command may vary depending on the compiler used and the configure options used.

Embedding Hawk in C++ Applications

Hawk can also be embedded in C++ applications. Here's an example:

#include <Hawk.hpp>
#include <stdio.h>

int main ()
{
	HAWK::HawkStd hawk;

	if (hawk.open() <= -1)
	{
		fprintf(stderr, "unable to open hawk - %s\n", hawk.getErrorMessageB());
		return -1;
	}

	HAWK::HawkStd::SourceString s("BEGIN { print \"hello, world\"; }");
	if (hawk.parse(s, HAWK::HawkStd::Source::NONE) == HAWK_NULL)
	{
		fprintf(stderr, "unable to parse - %s\n", hawk.getErrorMessageB());
		hawk.close();
		return -1;
	}

	HAWK::Hawk::Value vr;
	hawk.loop(&vr);  // alternatively, hawk.exec(&vr, HAWK_NULL, 0);

	hawk.close();
	return 0;
}

Embedding Hawk within a C++ application involves the following key steps:

  • Creating a Hawk Instance: Create a new instance of the Hawk interpreter using the HAWK::HawkStd class.
  • Parsing Scripts: Provide Hawk scripts as strings using the HAWK::HawkStd::SourceString class, and parse them using the hawk.parse() method.
  • Executing the Script: Use the hawk.loop() or hawk.exec() methods to execute the Hawk script, returning a value representing the result of the execution.
  • Handling the Result: Handle the returned value or any errors that occurred during execution.
  • Cleaning Up: Clean up by calling hawk.close() to destroy the Hawk instance.

The C++ classes are inferior to the C equivalents in that they don't allow creation of multiple runtime contexts over a single hawk instance.

Language

What Hawk Is

Hawk is an embeddable awk interpreter with extensions. It can run awk scripts from the CLI or from C/C++ and provides modules like str::, sys::, ffi::, mysql::, and sqlite::.

Running Hawk

Run a script file:

$ hawk -f script.hawk input.txt

Run an inline program:

$ echo "a,b,c" | hawk 'BEGIN{FS=","} {print $2}'

Execution Model

Hawk follows the awk pipeline:

  • Input is read as records (usually lines). RS controls record separation.
  • Each record ($0) is split into fields $1, $2, ... by FS.
  • A script is a sequence of pattern { action } blocks.
  • BEGIN runs before input; END runs after input.

Example:

BEGIN { FS=","; print "start" }
$3 ~ /ERR/ { print NR, $1, $3 }
END { print "done", NR }

@pragma entry

Hawk can override the default BEGIN/pattern/END flow with a custom entry point:

@pragma entry main
function main(a, b) {
	print "entry:", a, b
}

Run:

$ hawk -f script.hawk one two
entry: one two

Values and Types

Hawk is dynamically typed:

  • Numbers: integer and floating-point.
  • Strings: Unicode text.
  • Characters can be written with single quotes (e.g., 'A') and are Unicode.
  • Byte strings: raw bytes (@b"...").
  • Byte characters use @b'X' and must fit in a single byte.
  • Containers: array, map.
  • @nil represents null.

Examples:

BEGIN {
	a = 10
	b = 3.14
	s = "hello"
	c = 'X'
	bc = @b'x'
	bs = @b"\x00\x01"
	m = @{"k": 1}
	arr = @["x", "y"]
}

Expressions and Operators

Arithmetic and Comparison

  • Arithmetic: +, -, *, /, %, ** (exponentiation), ++, --.
  • Comparisons: ==, !=, <, <=, >, >=.
  • Type-precise compare: === and !==.

Example:

BEGIN {
	x = 10 + 5 * 2
	if (x >= 20) print x
	if ("10" === 10) print "no"
}

Strings and Regex

  • Concatenation by adjacency: "a" "b".
  • Explicit concatenation: "a" %% "b".
  • Regex match: ~ and !~.

Example:

BEGIN {
	print "hi" %% "!"
	if ("A" ~ /^[A-Z]$/) print "regex ok"
}

Logical Operators

  • Logical AND/OR: &&, ||.
  • Boolean results are numeric (0 or 1).

Example:

BEGIN {
	if (1 && 0) print "no"; else print "ok"
}

Bitwise Operators

  • Bitwise AND/OR: &, |.
  • | also denotes pipes, so use parentheses when you mean bitwise OR.

Bitwise OR vs pipe example:

BEGIN {
	print (1 | 2)  # bitwise OR => 3
	print 1 | 2    # pipe to external command "2"
}

Variables and Scope

  • Variables are created on assignment.
  • @local and @global declare scope explicitly.

Example:

@global g
BEGIN {
	@local x
	x = 1
	g = 2
}

Arrays and Maps

Hawk supports arrays and maps.

  • Arrays are indexed by numbers.
  • Maps accept string and numeric keys.
  • Constructors: @[], @{}, hawk::array(), hawk::map().
  • All constructors accept initial values.

Example:

BEGIN {
	arr = @["a", "b", "c"]
	m = @{"k": "v", 10: "ten"}
	arr[4] = "d"
	m["x"] = 99
	print arr[1], m["k"], m[10]
}

Functions

Define functions with function name(...) { ... }.

  • Missing args are @nil.
  • Use & for call-by-reference.
  • Use ... for varargs and access them via @argc and @argv.
  • Functions are first-class values and can be passed as parameters (e.g., a comparator for asort).

Example:

function inc(&x) { x += 1 }
function greet(name) { if (name == "") name = "world"; print "hi", name }
BEGIN { n = 1; inc(n); greet(); greet("hawk"); print n }

Varargs example:

function dump(...) {
	@local i
	for (i = 0; i < @argc; i++) print @argv[i]
}
BEGIN { dump("a", 10, "b") }

Function-parameter example:

function desc(a, b) { return b - a }
BEGIN {
	@local a, b, i
	a = @[3, 1, 2]
	asort(a, b, desc)
	for (i in b) print i, b[i]
}

Control Flow

Hawk supports standard awk control flow.

if / else

{ if ($1 > 0) print $1; else print "skip" }

while

BEGIN {
	i = 1
	while (i <= 3) { print i; i++ }
}

do ... while

BEGIN {
	i = 0
	do { print i; i++ } while (i < 3)
}

for

BEGIN {
	for (i = 1; i <= 3; i++) print i
}

for (i in array)

BEGIN {
	arr = @["x", "y"]
	for (i in arr) print i, arr[i]
}

in operator (key existence)

Use x in b to test if a key/index exists in a map or array.

BEGIN {
	b = @{"k": 1}
	if ("k" in b) print "yes"
}

switch

BEGIN {
	x = 2
	switch (x) {
	case 1: print "one"; break;
	case 2: print "two"; break;
	default: print "other";
	}
}

break / continue / return / exit

BEGIN {
	for (i = 1; i <= 5; i++) {
		if (i == 3) continue
		if (i == 5) break
		print i
	}
	exit 0
}

Note: Hawk allows return inside BEGIN and END blocks, in addition to functions.

nextfile / nextofile

nextfile skips the rest of the current input file (standard awk behavior). nextofile advances to the next output file specified with -t.

Example:

$ hawk -t /tmp/1 -t /tmp/2 'BEGIN { print 10; nextofile; print 20 }'

This writes 10 to /tmp/1 and 20 to /tmp/2.

Input, Output, and Pipes

  • getline reads records.
  • getbline reads records as bytes.
  • getline/getbline return 1 on success, 0 on EOF, and -1 on error.
  • Redirection works with <, >, and >>.
  • Pipes: cmd | getline var and print x | "cmd".
  • Two-way pipes: |& when @pragma rwpipe on.
  • CSV-style field splitting is supported when FS begins with ? followed by four characters (separator, escaper, left quote, right quote).

Example:

BEGIN { "ls" | getline x; print x }

Two-way pipe example:

@pragma rwpipe on
BEGIN {
	cmd = "sort"
	print "b" |& cmd
	print "d" |& cmd
	print "c" |& cmd
	print "a" |& cmd
	close(cmd, "to")
	while ((cmd |& getline line) > 0) print line
}

Redirection examples:

BEGIN {
	while ((getline line < "input.txt") > 0) print line > "out.txt"
	print "more" >> "out.txt"
}

Byte-record example:

BEGIN { getbline b < "bin.dat"; print str::tohex(b) }

CSV-style FS example:

BEGIN {
	FS = "?," "\"\""
}
{ print $1, $2 }

Built-in Variables

Common built-ins:

  • NR, FNR, NF
  • FS, RS, OFS, ORS
  • FILENAME, OFILENAME

Example:

{ print NR, NF, $0 }

Built-in Functions

Hawk includes awk built-ins (e.g., length, substr, split, index) plus extensions in modules (see below).

Example:

BEGIN { print length("hawk"), substr("hawk", 2, 2) }

Pragmas

@pragma controls parser/runtime behavior. File-scope pragmas apply per file; global-scope pragmas appear once across all files.

Name Scope Values Default Description
entry global function name change the program entry point
implicit file on, off on allow undeclared variables
multilinestr file on, off off allow a multiline string literal without continuation
rwpipe file on, off on allow the two-way pipe operator |&
striprecspc global on, off off removes leading and trailing blank fields in splitting a record if FS is a regular expression mathcing all spaces
stripstrspc global on, off on trim leading and trailing spaces when converting a string to a number
numstrdetect global on, off on trim leading and trailing spaces when converting a string to a number
stack_limit global number 5120 specify the runtime stack size measured in the number of values

@pragma entry

Sets a custom entry function instead of the default BEGIN/pattern/END flow.

@pragma entry main;
function main () { print "hello, world"; }

Arguments passed on the command line are provided to the entry function:

@pragma entry main
function main(arg1, arg2) {
	print "Arguments:", arg1, arg2
}
$ hawk -f main.hawk arg1_value arg2_value

If you don't know the number of arguments in advance, use ... and @argv/@argc:

@pragma entry main
function main(...) {
	@local i
	for (i = 0; i < @argc; i++) printf("%s:", @argv[i])
	print ""
}
$ hawk -f main.hawk 10 20 30 40 50

Named arguments can be combined with ... to require a minimum number of parameters:

function x(a, b, ...) {
	print "a=", a, "b=", b, "rest=", (@argc - 2)
}
BEGIN { x(1, 2, 3, 4) }

@pragma implicit

Controls implicit variable declaration. off requires @local/@global.

@pragma implicit off;
BEGIN {
    a = 10; ## syntax error - undefined identifier 'a'
}

In the example above, the @pragma implicit off directive is used to turn off implicit variable declaration. As a result, attempting to use the undeclared variable a will result in a syntax error.

@pragma implicit off;
BEGIN {
    @local a;
    a = 10; ## syntax ok - 'a' is declared before use
}

@pragma striprecspc

When FS is a space-matching regex, this controls whether leading/trailing blank fields are removed.

  • @pragma striprecspc on
$ echo '  a  b  c  d  ' | hawk '@pragma striprecspc on;
BEGIN { FS="[[:space:]]+"; }
{
    print "NF=" NF;
    for (i = 0; i < NF; i++) print i " [" $(i+1) "]";
}'
NF=4
0 [a]
1 [b]
2 [c]
3 [d]
  • @pragma striprecspc off
$ echo '  a  b  c  d  ' | hawk '@pragma striprecspc off;
BEGIN { FS="[[:space:]]+"; }
{
    print "NF=" NF;
    for (i = 0; i < NF; i++) print i " [" $(i+1) "]";
}'
NF=6
0 []
1 [a]
2 [b]
3 [c]
4 [d]
5 []

@include and @include_once

@include inserts another file at parse time; the semicolon is optional. @include_once avoids duplicate inclusion.

function print_hello() { print "hello\n"; }
@include "hello.inc";
BEGIN { print_hello(); }
@include_once "hello.inc";
@include_once "hello.inc";
BEGIN { print_hello(); }

You can use them inside a block or at the top level:

BEGIN {
	@include "init.inc";
	...
}

Comments

Hawk supports a single-line comment that begins with a hash sign # and the C-style multi-line comment.

x = y; # assign y to x.
/*
this line is ignored.
this line is ignored too.
*/

Reserved Words

The following words are reserved and cannot be used as a variable name, a parameter name, or a function name.

  • @abort
  • @argc
  • @argv
  • @global
  • @include
  • @include_once
  • @local
  • @nil
  • @pragma
  • @reset
  • BEGIN
  • END
  • break
  • case
  • continue
  • default
  • delete
  • do
  • else
  • exit
  • for
  • function
  • getbline
  • getline
  • if
  • in
  • next
  • nextfile
  • nextofile
  • print
  • printf
  • return
  • while
  • switch

However, some of these words not beginning with @ can be used as normal names in the context of a module call. For example, mymod::break. In practice, the predefined names used for built-in commands, functions, and variables are treated as if they are reserved since you can't create another definition with the same name.

More Examples

  • Print the first 10 even numbers
BEGIN {
	i = 0
	n = 1
	while (i < 10) {
		if (n % 2 == 0) {
			print n
			i++
		}
		n++
	}
}
  • Prompt the user for a positive number
BEGIN {
	do {
		printf "Enter a positive number: "
		getline num
	} while (num <= 0)
	print "You entered:", num
}
  • Print the multiplication table
BEGIN {
	for (i = 1; i <= 10; i++) {
		for (j = 1; j <= 10; j++) {
			printf "%4d", i * j
		}
		printf "\n"
	}
}
  • Print only the even numbers from 1 to 16
BEGIN {
	for (i = 1; i <= 20; i++) {
		if (i % 2 != 0) {
			continue
		}
		print i
		if (i >= 16) {
			break
		}
	}
}
  • Count the frequency of words in a file
{
	n = split($0, words, /[^[:alnum:]_]+/)
	for (i = 1; i <= n; i++) {
		freq[words[i]]++
	}
}

END {
	for (w in freq) {
		printf "%s: %d\n", w, freq[w]
	}
}
BEGIN {
	while (("ls -laF" | getline x) > 0) print "\t", x;
	close ("ls -laF");
}
{ print $0 | "cat" }
END { close("cat"); print "ENDED"; }
BEGIN {
	cmd = "sort";
	data = hawk::array("hello", "world", "two-way pipe", "testing");

	for (i = 1; i <= length(data); i++) print data[i] |& cmd;
	close(cmd, "to");

	while ((cmd |& getline line) > 0) print line;
	close(cmd);
}

Garbage Collection

The primary value management is reference counting based but map and array values are garbage-collected additionally.

Modules

Hawk supports various modules.

Hawk

  • hawk::array
  • hawk::call
  • hawk::cmgr_exists
  • hawk::function_exists
  • hawk::gc
  • hawk::gc_get_threshold
  • hawk::gc_set_threshold
  • hawk::gcrefs
  • hawk::hash
  • hawk::isarray
  • hawk::ismap
  • hawk::isnil
  • hawk::map
  • hawk::modlibdirs
  • hawk::type
  • hawk::typename
  • hawk::GC_NUM_GENS

String

The str module provides an extensive set of string manipulation functions.

  • str::frombase64 - decode a base64-encoded byte string
  • str::fromcharcode
  • str::fromhex
  • str::gsub - equivalent to gsub
  • str::index
  • str::isalnum
  • str::isalpha
  • str::isblank
  • str::iscntrl
  • str::isdigit
  • str::isgraph
  • str::islower
  • str::isprint
  • str::ispunct
  • str::isspace
  • str::isupper
  • str::isxdigit
  • str::length - equivalent to length
  • str::ltrim
  • str::match - similar to match. the optional third argument is the search start index. the optional fourth argument is equivalent to the third argument to match().
  • str::normspace
  • str::printf - equivalent to sprintf
  • str::rindex
  • str::rtrim
  • str::split - equivalent to split
  • str::sub - equivalent to sub
  • str::substr - equivalent to substr
  • str::tobase64 - encode data to a base64 byte string
  • str::tocharcode - get the numeric value of the first character
  • str::tohex
  • str::tolower - equivalent to tolower
  • str::tonum - convert a string to a number. a numeric value passed as a parameter is returned as it is. the leading prefix of 0b, 0, and 0x specifies the radix of 2, 8, 16 respectively. conversion stops when the end of the string is reached or the first invalid character for conversion is encountered.
  • str::toupper - equivalent to toupper
  • str::trim

System

The sys module provides various functions concerning the underlying operation system.

  • sys::basename
  • sys::chmod
  • sys::close
  • sys::closedir
  • sys::dirname
  • sys::dup
  • sys::errmsg
  • sys::fork
  • sys::getegid
  • sys::getenv
  • sys::geteuid
  • sys::getgid
  • sys::getpid
  • sys::getppid
  • sys::gettid
  • sys::gettime
  • sys::getuid
  • sys::kill
  • sys::mkdir
  • sys::mktime
  • sys::open
  • sys::opendir
  • sys::openfd
  • sys::pipe
  • sys::read
  • sys::readdir
  • sys::setttime
  • sys::sleep
  • sys::strftime
  • sys::system
  • sys::unlink
  • sys::wait
  • sys::write

You may read the file in raw bytes.

BEGIN {
	f = sys::open("/etc/sysctl.conf", sys::O_RDONLY);
	if (f >= 0) {
		while (sys::read(f, x, 10) > 0) printf (B"%s", x);
		sys::close(f);
	}
}

You can map a raw file descriptor to a handle created by this module and use it.

BEGIN {
	a = sys::openfd(1);
	sys::write(a, B"let me write something here\n");
	sys::close(a, sys::C_KEEPFD); ## set C_KEEPFD to release 1 without closing it.
	##sys::close(a);
	print "done\n";
}

Creating pipes and sharing them with a child process is not big an issue.

BEGIN {
	if (sys::pipe(p0, p1, sys::O_CLOEXEC | sys::O_NONBLOCK) <= -1)
	##if (sys::pipe(p0, p1, sys::O_CLOEXEC) <= -1)
	##if (sys::pipe(p0, p1) <= -1)
	{
		print "pipe error";
		return -1;
	}
	a = sys::fork();
	if (a <= -1) 
	{
		print "fork error";
		sys::close (p0);
		sys::close (p1);
	}
	else if (a == 0)
	{
		## child
		printf ("child.... %d %d %d\n", sys::getpid(), p0, p1);
		sys::close (p1);
		while (1)
		{
			n = sys::read (p0, k, 3);
			if (n <= 0) 
			{
				if (n == sys::RC_EAGAIN) continue; ## nonblock but data not available
				if (n != 0) print "ERROR: " sys::errmsg();
				break;
			}
			print k;
		}
		sys::close (p0);
		return 123;
	}
	else
	{
		## parent
		printf ("parent.... %d %d %d\n", sys::getpid(), p0, p1);
		sys::close (p0);
		sys::write (p1, B"hello");
		sys::write (p1, B"world");
		sys::close (p1);

		##sys::wait(a, status, sys::WNOHANG);
		while (sys::wait(a, status) != a);
		if (sys::WIFEXITED(status)) print "Exit code: " sys::WEXITSTATUS(status);
		else print "Child terminated abnormally"
	}
}

You can read standard output of a child process in a parent process.

BEGIN {
	if (sys::pipe(p0, p1, sys::O_NONBLOCK | sys::O_CLOEXEC) <= -1)
	{
			print "pipe error";
			return -1;
	}
	a = sys::fork();
	if (a <= -1)
	{
		print "fork error";
		sys::close (p0);
		sys::close (p1);
	}
	else if (a == 0)
	{
		## child
		sys::close (p0);

		stdout = sys::openfd(1);
		sys::dup(p1, stdout);

		print B"hello world";
		print B"testing sys::dup()";
		print B"writing to standard output..";

		sys::close (p1);
		sys::close (stdout);
	}
	else
	{
		sys::close (p1);
		while (1)
		{
			n = sys::read(p0, k, 10);
			if (n <= 0)
			{
				if (n == sys::RC_EAGAIN) continue; ## nonblock but data not available
				if (n != 0) print "ERROR: " sys::errmsg();
				break;
			}
			print "[" k "]";
		}
		sys::close (p0);
		sys::wait(a);
	}
}

You can duplicate file handles as necessary.

BEGIN {
	a = sys::open("/etc/inittab", sys::O_RDONLY);
	x = sys::open("/etc/fstab", sys::O_RDONLY);

	b = sys::dup(a);
	sys::close(a);

	while (sys::read(b, abc, 100) > 0) printf (B"%s", abc);

	print "-------------------------------";

	c = sys::dup(x, b, sys::O_CLOEXEC);
	## assertion: b == c
	sys::close (x);

	while (sys::read(c, abc, 100) > 0) printf (B"%s", abc);
	sys::close (c);
}

Directory traversal is easy.

BEGIN {
	d = sys::opendir("/etc", sys::DIR_SORT);
	if (d >= 0)
	{
		while (sys::readdir(d,a) > 0)
		{
			print a;
			sys::stat("/etc/" %% a, b);
			for (i in b) print "\t", i, b[i];
		}
		sys::closedir(d);
	} 
}

You can get information of a network interface.

BEGIN { 
	if (sys::getnwifcfg("lo", sys::NWIFCFG_IN6, x) <= -1)
		print sys::errmsg();
	else
		for (i in x) print i, x[i]; 
}

Socket functions are available.

BEGIN
{
	s = sys::socket();
	...
	sys::close (s);
}

ffi

  • ffi::open
  • ffi::close
  • ffi::call
  • ffi::errmsg
BEGIN {
	ffi = ffi::open();
	if (ffi::call(ffi, r, @B"getenv", @B"s>s", "PATH") <= -1) print ffi::errmsg();
	else print r;
	ffi::close (ffi);
}

mysql

BEGIN {
	mysql = mysql::open();

	if (mysql::connect(mysql, "localhost", "username", "password", "mysql") <= -1)
	{
		print "connect error -", mysql::errmsg();
	}

	if (mysql::query(mysql, "select * from user") <= -1)
	{
		print "query error -", mysql::errmsg();
	}

	result = mysql::store_result(mysql);
	if (result <= -1)
	{
		print "store result error - ", mysql::errmsg();
	}

	while (mysql::fetch_row(result, row) > 0)
	{
		ncols = length(row);
		for (i = 0; i < ncols; i++) print row[i];
		print "----";
	}

	mysql::free_result(result);

	mysql::close(mysql);
}

sqlite

Assuming /tmp/test.db with the following schema,

sqlite> .schema
CREATE TABLE a(x int, y varchar(255));

You can retreive all rows as shown below:

@pragma entry main
@pragma implicit off

function main() {
	@local db, stmt, row, i, ncols;

	db = sqlite::open();
	if (db <= -1) {
		print "open error -", sqlite::errmsg();
		return;
	}

	if (sqlite::connect(db, "/tmp/test.db", sqlite::CONNECT_READWRITE) <= -1) {
		print "connect error -", sqlite::errmsg();
		sqlite::close(db);
		return;
	}

	sqlite::exec(db, "begin transaction");
	sqlite::exec(db, "delete from a");
	for (i = 0; i < 10; i++) {
		@local sql, fld;
		if (sqlite::escape_string(db, ((i % 2)? @b"'STXETX'": "'␂␃'") %% (math::rand() * 100), fld) <= -1) {
			print "escape_string error -", sqlite::errmsg();
			sqlite::exec(db, "rollback");
			sqlite::close(db);
			return;
		}
		sql=sprintf("insert into a(x,y) values(%d,'%s')", math::rand() * 100, fld);
		print sql;
		if (sqlite::exec(db, sql) <= -1) {
			print "exec error -", sqlite::errmsg();
			sqlite::exec(db, "rollback");
			sqlite::close(db);
			return;
		}
	}
	sqlite::exec(db, "commit");

	stmt = sqlite::prepare(db, "select x,y from a where x>?");
	if (stmt <= -1) {
		print "prepare error -", sqlite::errmsg();
		sqlite::close(db);
		return;
	}

	if (sqlite::bind(stmt, 1, 10) <= -1) {
		print "bind error -", sqlite::errmsg();
		sqlite::finalize(stmt);
		sqlite::close(db);
		return;
	}

	ncols = sqlite::column_count(stmt);
	printf ("TOTAL %d COLUMNS:\n", ncols);
	for (i = 1; i <= ncols; i++) {
		print "-", i, sqlite::column_name(stmt, i);
	}
	while (sqlite::fetch_row(stmt, row, sqlite::FETCH_ROW_ARRAY) > 0) {
		print "[id]", row[1], "[name]", row[2];
	}

	sqlite::finalize(stmt);
	sqlite::close(db);
}

Incompatibility with AWK

Parameter passing

In AWK, it is possible for the caller to pass an uninitialized variable as a function parameter and obtain a modified value if the called function sets it to an array.

function q(a) {
  a[1] = 20;
  a[2] = 30;
}

BEGIN {
  q(x);
  for (i in x)
    print i, x[i];
}

In Hawk, to achieve the same effect, you can indicate call-by-reference by prefixing the parameter name with an ampersand (&).

function q(&a) {
  a[1] = 20;
  a[2] = 30;
}

BEGIN {
  q(x);
  for (i in x)
    print i, x[i];
}

Alternatively, you may create an array or a map before passing it to a function.

function q(a) {
  a[1] = 20;
  a[2] = 30;
}

BEGIN {
  x[3] = 99; delete (x[3]);  ## x = hawk::array() or x = hawk::map() also will do
  q(x);
  for (i in x)
    print i, x[i];
}

Positional variable expression

There are subtle differences in handling expressions for positional variables. In Hawk, many of the ambiguity issues can be resolved by enclosing the expression in parentheses.

Expression Hawk AWK
$++$++i syntax error OK
$(++$(++i)) OK syntax error

Return value of getline

Others

  • return is allowed in BEGIN blocks, END blocks, and pattern-action blocks.
Description
No description provided
Readme 27 MiB
Languages
C 70.3%
Shell 9.9%
Makefile 8%
C++ 6.2%
M4 4.1%
Other 1.5%