`Hawk` is a powerful and embeddable scripting engine inspired by the traditional awk programming language. While it maintains compatibility with awk, Hawk is designed to be seamlessly integrated into other applications, providing a versatile and efficient solution for various scripting and data manipulation tasks.
As an embeddable interpreter, Hawk offers several advantages:
- Highly Portable: Implemented in portable C, Hawk can be easily integrated into applications running on diverse platforms and architectures.
- Extensible Architecture: Hawk features an extensible architecture, allowing developers to create and integrate custom extensions tailored to specific application requirements.
While mostly compatible with awk, Hawk introduces several enhancements and extensions, including:
- Improved Variable Handling: Enhanced mechanisms for working with complex data structures and performing advanced data manipulation.
- Additional Built-in Functions: A rich set of built-in functions that extend the capabilities of awk for string manipulation, array handling, and more.
- External Modules: Hawk supports external modules that provide additional functionality and extensibility.
Hawk's embeddable nature and extensible design make it a versatile choice for integrating scripting capabilities into a wide range of applications, from system utilities and tools to data processing pipelines and beyond.
In the following sections, we'll explore Hawk's features in detail, covering its embeddable nature, awk compatibility, extensions, and usage examples to help you effectively integrate and leverage this powerful scripting engine within your applications.
/* lowered the reference count of the returned value */
hawk_rtx_refdownval (rtx, retv);
ret = 0;
oops:
if (rtx) hawk_rtx_close (rtx); /* destroy the runtime context */
if (hawk) hawk_close (hawk); /* destroy the hawk instance */
return -1;
}
```
Embedding Hawk within an application involves a few key steps:
- Creating a Hawk Instance: The `hawk_openstd()` function is used to create a new instance of the Hawk interpreter, which serves as the entry point for interacting with Hawk from within the application.
- Parsing Scripts: The application can provide Hawk scripts as string literals or read them from files using the `hawk_parsestd()` function. This associates the scripts with the Hawk instance for execution.
- Creating a Runtime Context: A runtime context is created using `hawk_rtx_openstd()`, encapsulating the state and configuration required for script execution, such as input/output streams and other settings.
- Executing the Script: The `hawk_rtx_loop()` or `hawk_rtx_exec()` functions are used to execute the Hawk script within the created runtime context, returning a value representing the result of the execution.
- Handling the Result: The application can check the returned value for successful execution and handle any errors or results as needed.
- Cleaning Up: Finally, the application cleans up by closing the runtime context and destroying the Hawk instance using `hawk_rtx_close()` and `hawk_close()`, respectively.
By following this pattern, applications can seamlessly embed the Hawk interpreter, leveraging its scripting capabilities and data manipulation functionality while benefiting from its portability, efficiency, and extensibility.
Assuming the above sample code is stored in `hawk02.c` and the built Hawk library has been installed properly, you may compile the sample code by running the following commands:
Embedding Hawk within a C++ application involves the following key steps:
- Creating a Hawk Instance: Create a new instance of the Hawk interpreter using the `HAWK::HawkStd` class.
- Parsing Scripts: Provide Hawk scripts as strings using the `HAWK::HawkStd::SourceString` class, and parse them using the `hawk.parse()` method.
- Executing the Script: Use the `hawk.loop()` or `hawk.exec()` methods to execute the Hawk script, returning a value representing the result of the execution.
- Handling the Result: Handle the returned value or any errors that occurred during execution.
- Cleaning Up: Clean up by calling `hawk.close()` to destroy the Hawk instance.
The C++ classes are inferior to the C equivalents in that they don't allow creation of multiple runtime contexts over a single hawk instance.
Hawk is an AWK interpreter created by an individual whose name starts with `H`, hence the `H` in the name. It serves a dual purpose: to be an easy-to-embed implementation within other applications and a standalone tool for users. At its core, Hawk largely supports all the fundamental features of AWK, ensuring compatibility with existing AWK programs and scripts. However, it introduces subtle differences in behavior compared to traditional AWK implementations, which will be explained in the [Incompatibility with AWK](#incompatibility-with-awk) section.
In Hawk, as in traditional awk, the execution flow follows a specific order: the `BEGIN` block is executed first, followed by the pattern-action blocks, and finally the `END` block.
1.`BEGIN` Block: The `BEGIN` block is executed before any input is processed. It is typically used for initializations, such as setting variable values or defining functions that will be used later in the script.
1. Pattern-Action Blocks: After the `BEGIN` block, Hawk reads the input line by line (or record by record, depending on the record separator `RS`). For each input line or record, Hawk checks if it matches the specified pattern. If a match is found, the associated action block is executed.
1.`END` Block: After processing all input lines or records, the `END` block is executed. It is typically used for performing final operations, such as printing summaries or closing files.
Here's a sample code that demonstrates the basic `BEGIN`, pattern-action, and `END` loop in Hawk:
```awk
BEGIN {
print "Starting the script..."
total = 0
}
/^[0-9]+$/ { # Pattern-action block to sum up the numbers
total += $0 # Add the current line (which is a number) to the total
}
END {
print "The sum of all numbers is:", total
}
```
In this example:
1. The `BEGIN` block is executed first, printing the message "Starting the script..." and initializing the total variable to 0.
1. For each input line, Hawk checks if it matches the regular expression `/^[0-9]+$/` (which matches lines containing only digits). If a match is found, the action block `{ total += $0 }` is executed, adding the current line (treated as a number) to the total variable.
1. After processing all input lines, the `END` block is executed, printing the final message "The sum of all numbers is: `total`", where `total` is the accumulated sum of all numbers from the input.
You can provide input to this script in various ways, such as piping from another command, reading from a file, or entering input interactively. For example:
```sh
$ echo -e "42\n3.14\n100" | hawk -f sum.hawk
Starting the script...
The sum of all numbers is: 142
```
In this example, the `sum.hawk` file contains the Hawk script that sums up the numbers from the input. The input is provided via the `echo` command, which outputs three lines: 42, 3.14 (ignored because it doesn't match the pattern), and 100. The script sums up the numbers 42 and 100, resulting in a total of 142.
It's important to note that if there is no action-pattern block or `END` block present in the Hawk script, the interpreter will not wait for input records. In this case, the script will execute only the `BEGIN` block (if present) and then immediately terminate.
However, if an action-pattern block or an END block is present in the script, even if there is no action-pattern block, Hawk (and awk) will wait for input records or lines. This behavior is consistent with the way awk was designed to operate: it expects input data to process unless the script explicitly indicates that no input is required.
In this case, the Hawk script contains only an `END` block that prints the value of the `NR`(Number of Records) variable, which keeps track of the number of input records processed. Since there is an END block present, Hawk will wait for input records from the `ls -l` command, process them (though no action is taken for each record), and finally execute the END block, printing the total number of records processed.
Additionally, Hawk introduces the `@pragma entry` feature, which allows you to change the entry point of your script to a custom function instead of the default `BEGIN` block. This feature will be covered in the [Pragmas](#pragmas) section.
The `@pragma` keyword enables you to modify Hawk’s behavior. You can place a pragma item at the file scope within any source files. Additionally, a pragma item at the global scope can appear only once across all source files.
| striprecspc | global | on, off | off | removes leading and trailing blank fields in splitting a record if FS is a regular expression mathcing all spaces |
In addition to the standard `BEGIN` and `END` blocks found in awk, Hawk introduces the `@pragma entry` feature, which allows you to specify a custom entry point function. This can be useful when you want to bypass the default `BEGIN` block behavior and instead start executing your script from a specific function.
The `@pragma entry` pragma is used to define the entry point function, like this:
In this example, the `main` function is set as the entry point for script execution. When the script is run, Hawk will execute the code inside the main function instead of the `BEGIN` block.
You can also pass arguments to the entry point function by defining it with parameters:
In this example, let's assume the script is saved as `main.hawk`. The `main` function is set as the entry point for script execution, and it accepts two arguments, `arg1` and `arg2`. Then, when executing the `main.hawk` script, you can provide the arguments like this:
This will cause Hawk to execute the code inside the main function, passing `arg1_value` and `arg2_value` as the respective values for `arg1` and `arg2`.
This flexibility in specifying the entry point can be useful in various scenarios, such as:
- Modular Script Design: You can organize your script into multiple functions and specify the entry point function, making it easier to manage and maintain your code.
- Command-line Arguments: By defining the entry point function with parameters, you can easily accept and process command-line arguments passed to your script.
- Testing and Debugging: When working on specific parts of your script, you can temporarily set the entry point to a different function, making it easier to test and debug that particular functionality.
- Integration with Other Systems: If you need to embed Hawk scripts within a larger application or system, you can use the `@pragma entry` feature to specify the function that should be executed as the entry point, enabling better integration and control over the script execution flow.
It's important to note that if you don't define an entry point function using `@pragma entry`, Hawk will default to the standard awk behavior and execute the `BEGIN` block first, followed by the pattern-action blocks, and finally the `END` block.
Overall, the @pragma entry feature in Hawk provides you with greater flexibility and control over the execution flow of your scripts, allowing you to tailor the entry point to your specific needs and requirements.
Hawk also introduces the `@pragma implicit` feature, which allows you to enforce variable declarations. Unlike traditional awk, where local variable declarations are not necessary, Hawk can require you to declare variables before using them. This is controlled by the `@pragma implicit` pragma:
In the example above, the `@pragma implicit off` directive is used to turn off implicit variable declaration. As a result, attempting to use the undeclared variable a will result in a syntax error.
With the `@local` declaration, the variable `a` is explicitly declared, allowing it to be used without triggering a syntax error.
This feature can be beneficial for catching potential variable misspellings or unintended uses of global variables, promoting better code quality and maintainability.
If you don't want to enforce variable declarations, you can simply omit the `@pragma implicit off` directive or specify `@pragma implicit on`, and Hawk will behave like traditional awk, allowing implicit variable declarations.
The `@pragma striprecspc` directive in Hawk controls how the interpreter handles leading and trailing blank fields in input records when using a regular expression as the field separator (FS).
When you set `FS` to a regular expression that matches one or more whitespace characters (e.g., FS="[[:space:]]+"), Hawk will split the input records into fields based on that pattern. By default, Hawk follows the behavior of traditional awk, which means that leading and trailing blank fields are preserved.
However, Hawk introduces the `@pragma striprecspc` directive, which allows you to change this behavior. Here's how it works:
When `@pragma striprecspc on` is set, Hawk will automatically remove any leading and trailing blank fields from the input records. In the example above, the input string ' a b c d ' has a leading and trailing space, which would normally result in two additional blank fields. However, with `@pragma striprecspc on`, these blank fields are stripped, and the resulting `NF`(number of fields) is 4, corresponding to the fields "a", "b", "c", and "d".
When `@pragma striprecspc off` is set (or the directive is omitted, as this is the default behavior), Hawk preserves any leading and trailing blank fields in the input records. In the example above, the input string ' a b c d ' has a leading and trailing space, resulting in two additional blank fields. The `NF`(number of fields) is now 6, with the first and last fields being empty, and the remaining fields containing "a", "b", "c", and "d".
The following words are reserved and cannot be used as a variable name, a parameter name, or a function name.
-@abort
-@global
-@include
-@include_once
-@local
-@pragma
-@reset
- BEGIN
- END
- break
- continue
- delete
- do
- else
- exit
- for
- function
- getbline
- getline
- if
- in
- next
- nextfile
- nextofile
- print
- printf
- return
- while
However, these words can be used as normal names in the context of a module call. For example, mymod::break. In practice, the predefined names used for built-in commands, functions, and variables are treated as if they are reserved since you can't create another denifition with the same name.
A regular expression literal is special in that it never appears as an indendent value and still entails a match operation against $0 without an match operator.
Hawk supports various control structures for flow control and iteration, similar to those found in awk.
The `if` statement in Hawk follows the same syntax as in awk and other programming languages. It allows you to execute a block of code conditionally based on a specified condition.
```awk
if (condition) {
## statements
} else if (another_condition) {
## other statements
} else {
## default statements
}
```
The `while` loop in Hawk is used to repeatedly execute a block of code as long as a specific condition is true.
```awk
while (condition) {
# statements
}
```
The `do`-`while` loop is similar to the `while` loop, but it guarantees that the code block will be executed at least once, as the condition is evaluated after the first iteration.
```awk
do {
# statements
} while (condition)
```
The `for` loop in Hawk follows the same syntax as in awk and allows you to iterate over a range of values or an array.
```awk
for (initialization; condition; increment/decrement) {
## statements
}
```
You can also use the for loop to iterate over the elements of an array:
```awk
for (index in array) {
## statements using array[index]
}
```
Hawk also supports the `break` and `continue` statements, which work the same way as in awk and other programming languages. The `break` statement is used to exit a loop prematurely, while `continue` skips the remaining statements in the current iteration and moves to the next iteration.
The syntax and behavior of these structures are largely consistent with awk, making it easy for awk users to transition to Hawk and leverage their existing knowledge.
Hawk supports user-defined functions, enabling developers to break down complex logic into modular component for reuse. Hawk also provides a wide range of built-in functions that extend its capabilities for various tasks, such as string manipulation, array handling, and more.
To define a function in Hawk, you use the function keyword followed by the function name and a set of parentheses to enclose the optional function parameters:
```awk
function function_name(parameter1, parameter2, ...) {
## function body
## statements
return value
}
```
Functions in Hawk can accept parameters, perform operations, and optionally return a value using the `return` statement.
Here's an example of a function that calculates the factorial of 10:
```awk
function factorial(n) {
if (n <= 1) {
return 1
} else {
return n * factorial(n - 1)
}
}
BEGIN {
num = 10
result = factorial(num)
print "The factorial of", num, "is", result
}
```
If no `return` statement is encountered, the function returns `@nil`, which is Hawk's equivalent of `nil` or `null` in other programming languages.
The expected output of the above example code is `1 0 1`.
-`k === @nil`: This expression evaluates to 1 (true) because `k` is indeed equal to `@nil` when using the type-precise `===` operator.
-`k === ""`: This expression evaluates to 0 (false) because `k` is not equal to an empty string when using the type-precise `===` operator.
-`k == ""`: This expression evaluates to 1 (true) because `@nil` is considered equal to an empty string when using the double equal sign `==` operator.
Functions can be called from various contexts, including `BEGIN`, pattern-action blocks, and `END` blocks, as well as from other functions. They can be defined before or after they are used, as Hawk resolves function references.
You can pass fewer arguments than the number of declared parameters to a function. In such cases, the missing parameters are treated as having `@nil`.
Here's an example to illustrate this behavior:
```awk
@function greet(name, greeting) {
if (greeting == "") {
greeting = "Hello"
}
print greeting, name
}
BEGIN {
greet("Alice", "Hi") ## Output: Hi Alice
greet("Bob") ## Output: Hello Bob
greet() ## Output: Hello
}
```
In the above example:
1. The `greet` function is defined with two parameters: `name` and `greeting`.
1. In the first function call `greet("Alice", "Hi")`, both arguments are provided, so name is assigned `"Alice"`, and greeting is assigned `"Hi"`.
1. In the second function call `greet("Bob")`, only one argument is provided. Therefore, name is assigned `"Bob"`, and greeting is assigned `@nil`. The function checks if greeting is empty and assigns the default value `"Hello"`.
1. In the third function call `greet()`, no arguments are provided. Both `name` and `greeting` are assigned `@nil`. The function then assigns the default value `"Hello"` to greeting and prints `"Hello"`.
However, it's important to note that you cannot pass more arguments than the number of declared parameters in a function. If you attempt to do so, Hawk will raise an error.
## Variable
Variables can be used to store and manipulate data. There are two types of variables:
- Built-in Variables: These are predefined variables provided by the awk language itself. They are used for specific purposes and contain information about the input data or the state of the program.
- User-defined Variables: These are variables created and used by the programmer to store and manipulate data as needed within the program.
You can declare variables explicitly using the following syntax:
1. Local Variables:
- Declared using `@local var_1, var_2, ...`
- These variables are scoped within the current block or function.
1. Global Variables:
- Declared using `@global var_1, var_2, ...`
- These variables are accessible throughout the entire Hawk program.
While explicit variable declaration is supported, Hawk also maintains compatibility with awk by allowing implicit variable creation and usage. See [@pragma implicit](#pragma-implicit) on how to control this behavior.
```awk
@global count, total; # Global variables
BEGIN {
@local i, j; ## Local variables in the BEGIN block
count = 0;
total = 0;
}
{
@local value; ## Local variable in the main block
value = $1 + $2;
count++;
total += value;
}
END {
print "Total count:", count;
print "Sum of values:", total;
}
```
In this example:
-`count` and `total` are global variables declared using `@global`.
-`i` and `j` are local variables declared in the `BEGIN` block using `@local`.
-`value` is a local variable declared in the main block using `@local`.
### Built-in Variable
| Variable | Description |
|--------------|-------------|
| CONVFMT | |
| FILENAME | |
| FNR | File Number of Records, reset to 1 for each new input file |
| FS | Field Separator, specifies the character(s) that separate fields (columns) in an input record. Default is whitespace |
| IGNORECASE | |
| NF | Number of Fields (columns) in the current input record |
| NR | Number of Records processed so far |
| NUMSTRDETECT | |
| OFILENAME | |
| OFMT | |
| OFS | |
| ORS | |
| RLENGTH | |
| RS | Record Separator, specifies the character(s) that separate input records (lines). Default is newline `"\n"` |
- str::match - similar to match. the optional third argument is the search start index. the optional fourth argument is equivalent to the thrid argument to match().
- str::tonum - convert a string to a number. a numeric value passed as a parameter is returned as it is. the leading prefix of 0b, 0, and 0x specifies the radix of 2, 8, 16 repectively. conversion stops when the end of the string is reached or the first invalid character for conversion is encountered.
In AWK, it is possible for the caller to pass an uninitialized variable as a function parameter and obtain a modified value if the called function sets it to an array.
There are subtle differences in handling expressions for positional variables. In Hawk, many of the ambiguity issues can be resolved by enclosing the expression in parentheses.