updated README.md with more text

2024-04-28 21:50:34 +09:00
parent db2651d811
commit 21fc3dc86e
2 changed files with 134 additions and 50 deletions
--- a/README.md
+++ b/README.md
@ -3,10 +3,28 @@
 	█░█ ▄▀█ █░█░█ █▄▀
 	█▀█ █▀█ ▀▄▀▄▀ █░█

- [Embedding Hawk in C Applications](#embedding-in-c-app)
- [Embedding Hawk in C++ Applications](#embedding-in-cxx-app)
+- [Hawk](#hawk)
+- [Embedding Hawk in C Applications](#embedding-hawk-in-c-applications)
+- [Embedding Hawk in C++ Applications](#embedding-hawk-in-c-applications-1)
 - [Language](#language)
- [Modules](#modules)
+	- [Pragmas](#pragmas)
+		- [@pragma entry](#pragma-entry)
+		- [@pragma implicit](#pragma-implicit)
+		- [@pragma sriprecspc](#pragma-sriprecspc)
+	- [@include and @include\_once](#include-and-include_once)
+	- [Comments](#comments)
+	- [Reserved Words](#reserved-words)
+	- [Values](#values)
+	- [Numbers](#numbers)
+	- [Modules](#modules)
+		- [Hawk](#hawk-1)
+		- [String](#string)
+		- [System](#system)
+		- [ffi](#ffi)
+		- [mysql](#mysql)
+	- [Incompatibility with AWK](#incompatibility-with-awk)
+		- [Parameter passing](#parameter-passing)
+	- [Positional variable expression](#positional-variable-expression)


 `Hawk` is a powerful and embeddable scripting engine inspired by the traditional awk programming language. While it maintains compatibility with awk, Hawk is designed to be seamlessly integrated into other applications, providing a versatile and efficient solution for various scripting and data manipulation tasks.
@ -27,11 +45,11 @@ Hawk's embeddable nature and extensible design make it a versatile choice for in

 In the following sections, we'll explore Hawk's features in detail, covering its embeddable nature, awk compatibility, extensions, and usage examples to help you effectively integrate and leverage this powerful scripting engine within your applications.

-## Embedding Hawk in C Applications <div id="embedding-in-c-app"/>
+# Embedding Hawk in C Applications

 Here's an example of how Hawk can be embedded within a C application:

-```
+```c
 #include <hawk-std.h>
 #include <stdio.h>
 #include <string.h>
@ -120,11 +138,11 @@ Embedding Hawk within an application involves a few key steps:

 By following this pattern, applications can seamlessly embed the Hawk interpreter, leveraging its scripting capabilities and data manipulation functionality while benefiting from its portability, efficiency, and extensibility.

-## Embedding Hawk in C++ Applications <div id="embedding-in-cxx-app"/>
+# Embedding Hawk in C++ Applications

 Hawk can also be embedded in C++ applications. Here's an example:

-```
+```c++
 #include <Hawk.hpp>
 #include <stdio.h>

@ -164,42 +182,83 @@ Embedding Hawk within a C++ application involves the following key steps:

 The C++ classes are inferior to the C equivalents in that they don't allow creation of multiple runtime contexts over a single hawk instance.

-## Language
+# Language

-`Hawk` is an `AWK` interpreter with many extended features implemented by its creator, with 'H' representing the initial of the creator's name. It aims to be an easy-to-embed implementation as well as a standalone tool.
+Hawk is an AWK interpreter created by an individual whose name starts with `H`, hence the `H` in the name. It serves a dual purpose: to be an easy-to-embed implementation within other applications and a standalone tool for users. At its core, Hawk largely supports all the fundamental features of AWK, ensuring compatibility with existing AWK programs and scripts. However, it introduces subtle differences in behavior compared to traditional AWK implementations, which will be explained in the [Incompatibility with AWK](#incompatibility-with-awk) section.

-At its core, `Hawk` largely supports all the fundamental features of `AWK`, ensuring compatibility with existing AWK programs and scripts,  while introducing subtle differences in behavior, which will be explained in the [incompatibility](#incompatibility-with-awk) section. The following is an overview of the basic AWK features supported by `Hawk`:
+In Hawk, as in traditional awk, the execution flow follows a specific order: the `BEGIN` block is executed first, followed by the pattern-action blocks, and finally the `END` block.

-1. Pattern-Action Statements: Hawk operates on a series of pattern-action statements. Each statement consists of a pattern that matches input records and an associated action that is executed when the pattern matches.
-1. Record Processing: Hawk processes input data by splitting it into records and fields. Records are typically lines in a file, while fields are segments of each record separated by a field separator (by default, whitespace). This enables powerful text processing capabilities.
-1. Built-in Variables: Hawk provides a set of built-in variables that facilitate data manipulation. Commonly used variables include `NF` (number of fields in the current record), `NR` (current record number), and `$n` (the value of the nth field in the current record).
-1. Built-in Functions: Hawk offers a rich set of built-in functions to perform various operations on data. These functions include string manipulation, numeric calculations, regular expression matching, and more. You can harness their power to transform and analyze your input data effectively.
-1. Output Formatting: Hawk provides flexible control over the formatting and presentation of output. You can customize the field and record separators, control the output field width, and apply formatting rules to align columns.
+1. `BEGIN` Block: The `BEGIN` block is executed before any input is processed. It is typically used for initializations, such as setting variable values or defining functions that will be used later in the script.
+1. Pattern-Action Blocks: After the `BEGIN` block, Hawk reads the input line by line (or record by record, depending on the record separator `RS`). For each input line or record, Hawk checks if it matches the specified pattern. If a match is found, the associated action block is executed.
+1. `END` Block: After processing all input lines or records, the `END` block is executed. It is typically used for performing final operations, such as printing summaries or closing files.

-With these foundational features, Hawk ensures compatibility with existing AWK scripts and enables you to utilize the vast range of AWK resources available.
+Here's a sample code that demonstrates the basic `BEGIN`, pattern-action, and `END` loop in Hawk:

-### Pragmas
+```awk
+BEGIN {
+	print "Starting the script..."
+	total = 0
+}
+/^[0-9]+$/ { # Pattern-action block to sum up the numbers
+	total += $0  # Add the current line (which is a number) to the total
+}
+END {
+	print "The sum of all numbers is:", total
+}
+```

-The `@prama` keyword allows you to change the Hawk's behaviors. A pragma item of the file scope can be placed in any source files. A pragma item of the global scope can appear only once thoughout the all source files.
+In this example:
+
+1. The `BEGIN` block is executed first, printing the message "Starting the script..." and initializing the total variable to 0.
+1. For each input line, Hawk checks if it matches the regular expression `/^[0-9]+$/` (which matches lines containing only digits). If a match is found, the action block `{ total += $0 }` is executed, adding the current line (treated as a number) to the total variable.
+1. After processing all input lines, the `END` block is executed, printing the final message "The sum of all numbers is: `total`", where `total` is the accumulated sum of all numbers from the input.
+
+You can provide input to this script in various ways, such as piping from another command, reading from a file, or entering input interactively. For example:
+
+```sh
+$ echo -e "42\n3.14\n100" | hawk -f sum.hawk
+Starting the script...
+The sum of all numbers is: 142
+```
+
+In this example, the `sum.hawk` file contains the Hawk script that sums up the numbers from the input. The input is provided via the `echo` command, which outputs three lines: 42, 3.14 (ignored because it doesn't match the pattern), and 100. The script sums up the numbers 42 and 100, resulting in a total of 142.
+
+It's important to note that if there is no action-pattern block or `END` block present in the Hawk script, the interpreter will not wait for input records. In this case, the script will execute only the `BEGIN` block (if present) and then immediately terminate.
+
+However, if an action-pattern block or an END block is present in the script, even if there is no action-pattern block, Hawk (and awk) will wait for input records or lines. This behavior is consistent with the way awk was designed to operate: it expects input data to process unless the script explicitly indicates that no input is required.
+
+For example, consider the following command:
+
+```sh
+$ ls -l | hawk 'END { print NR; }'
+```
+
+In this case, the Hawk script contains only an `END` block that prints the value of the `NR`(Number of Records) variable, which keeps track of the number of input records processed. Since there is an END block present, Hawk will wait for input records from the `ls -l` command, process them (though no action is taken for each record), and finally execute the END block, printing the total number of records processed.
+
+Additionally, Hawk introduces the `@pragma entry` feature, which allows you to change the entry point of your script to a custom function instead of the default `BEGIN` block. This feature will be covered in the [Pragmas](#pragmas) section.
+
+## Pragmas
+
+The `@pragma` keyword enables you to modify Hawk’s behavior. You can place a pragma item at the file scope within any source files. Additionally, a pragma item at the global scope can appear only once across all source files.

 | Name          | Scope  | Values        | Default | Description                                            |
 |---------------|--------|---------------|---------|--------------------------------------------------------|
 | entry         | global | function name |         | change the program entry point                         |
 | implicit      | file   | on, off       | on      | allow undeclared variables                             |
 | multilinestr  | file   | on, off       | off     | allow a multiline string literal without continuation  |
-| striprecspc   | global | on, off       | off     | removes empty fields in splitting a record if FS is a regular expression mathcing all spaces |
+| striprecspc   | global | on, off       | off     | removes leading and trailing blank fields in splitting a record if FS is a regular expression mathcing all spaces |
 | stripstrspc   | global | on, off       | on      | trim leading and trailing spaces when convering a string to a number |
 | numstrdetect  | global | on, off       | on      | trim leading and trailing spaces when convering a string to a number |
 | stack_limit   | global | number        | 5120    | specify the runtime stack size measured in the number of values |

-#### entry
+### @pragma entry

 In addition to the standard `BEGIN` and `END` blocks found in awk, Hawk introduces the `@pragma entry` feature, which allows you to specify a custom entry point function. This can be useful when you want to bypass the default `BEGIN` block behavior and instead start executing your script from a specific function.

 The `@pragma entry` pragma is used to define the entry point function, like this:

 ```awk
-@pragma entry main
+@pragma entry main;
 function main () { print "hello, world"; }
 ```

@ -208,26 +267,37 @@ In this example, the `main` function is set as the entry point for script execut
 You can also pass arguments to the entry point function by defining it with parameters:

 ```awk
-@pragma entry main
+@pragma entry main;
 function main(arg1, arg2) {
    print "Arguments:", arg1, arg2
 }
 ```

-In this example, let's assume the script is saved as `script.awk`. The `main` function is set as the entry point for script execution, and it accepts two arguments, `arg1` and `arg2`. Then, when executing the `script.awk` script, you can provide the arguments like this:
+In this example, let's assume the script is saved as `main.hawk`. The `main` function is set as the entry point for script execution, and it accepts two arguments, `arg1` and `arg2`. Then, when executing the `main.hawk` script, you can provide the arguments like this:

 ```sh
-$ hawk script.awk arg1_value arg2_value
+$ hawk -f main.hawk arg1_value arg2_value
 ```

 This will cause Hawk to execute the code inside the main function, passing `arg1_value` and `arg2_value` as the respective values for `arg1` and `arg2`.

-#### implicit
+This flexibility in specifying the entry point can be useful in various scenarios, such as:
+
+- Modular Script Design: You can organize your script into multiple functions and specify the entry point function, making it easier to manage and maintain your code.
+- Command-line Arguments: By defining the entry point function with parameters, you can easily accept and process command-line arguments passed to your script.
+- Testing and Debugging: When working on specific parts of your script, you can temporarily set the entry point to a different function, making it easier to test and debug that particular functionality.
+- Integration with Other Systems: If you need to embed Hawk scripts within a larger application or system, you can use the `@pragma entry` feature to specify the function that should be executed as the entry point, enabling better integration and control over the script execution flow.
+
+It's important to note that if you don't define an entry point function using `@pragma entry`, Hawk will default to the standard awk behavior and execute the `BEGIN` block first, followed by the pattern-action blocks, and finally the `END` block.
+
+Overall, the @pragma entry feature in Hawk provides you with greater flexibility and control over the execution flow of your scripts, allowing you to tailor the entry point to your specific needs and requirements.
+
+### @pragma implicit

 Hawk also introduces the `@pragma implicit` feature, which allows you to enforce variable declarations. Unlike traditional awk, where local variable declarations are not necessary, Hawk can require you to declare variables before using them. This is controlled by the `@pragma implicit` pragma:

 ```awk
-@pragma implicit off
+@pragma implicit off;
 BEGIN {
    a = 10; ## syntax error - undefined identifier 'a'
 }
@ -236,7 +306,7 @@ BEGIN {
 In the example above, the `@pragma implicit off` directive is used to turn off implicit variable declaration. As a result, attempting to use the undeclared variable a will result in a syntax error.

 ```awk
-@pragma implicit off
+@pragma implicit off;
 BEGIN {
    @local a;
    a = 10; ## syntax ok - 'a' is declared before use
@ -247,9 +317,17 @@ This feature can be beneficial for catching potential variable misspellings or u

 If you don't want to enforce variable declarations, you can simply omit the `@pragma implicit off` directive or specify `@pragma implicit on`, and Hawk will behave like traditional awk, allowing implicit variable declarations.

-#### sriprecspc
-```
-$ echo '  a  b  c  d  ' | hawk '@pragma striprecspc on
+### @pragma sriprecspc
+
+The `@pragma striprecspc` directive in Hawk controls how the interpreter handles leading and trailing blank fields in input records when using a regular expression as the field separator (FS).
+
+When you set `FS` to a regular expression that matches one or more whitespace characters (e.g., FS="[[:space:]]+"), Hawk will split the input records into fields based on that pattern. By default, Hawk follows the behavior of traditional awk, which means that leading and trailing blank fields are preserved.
+
+However, Hawk introduces the `@pragma striprecspc` directive, which allows you to change this behavior. Here's how it works:
+
+1. @pragma striprecspc on
+```sh
+$ echo '  a  b  c  d  ' | hawk '@pragma striprecspc on;
 BEGIN { FS="[[:space:]]+"; }
 {
    print "NF=" NF;
@ -262,8 +340,12 @@ NF=4
 3 [d]
 ```

-```
-echo '  a  b  c  d  ' | hawk '@pragma striprecspc off
+When `@pragma striprecspc on` is set, Hawk will automatically remove any leading and trailing blank fields from the input records. In the example above, the input string ' a b c d ' has a leading and trailing space, which would normally result in two additional blank fields. However, with `@pragma striprecspc on`, these blank fields are stripped, and the resulting `NF`(number of fields) is 4, corresponding to the fields "a", "b", "c", and "d".
+
+2. @pragma striprecspc off
+
+``` sh
+$ echo '  a  b  c  d  ' | hawk '@pragma striprecspc off;
 BEGIN { FS="[[:space:]]+"; }
 {
    print "NF=" NF;
@ -278,19 +360,21 @@ NF=6
 5 []
 ```

-### @include and @include_once
+When `@pragma striprecspc off` is set (or the directive is omitted, as this is the default behavior), Hawk preserves any leading and trailing blank fields in the input records. In the example above, the input string ' a b c d ' has a leading and trailing space, resulting in two additional blank fields. The `NF`(number of fields) is now 6, with the first and last fields being empty, and the remaining fields containing "a", "b", "c", and "d".
+
+## @include and @include_once

 The `@include` directive inserts the contents of the file specified in the following string as if they appeared in the source stream being processed.

 Assuming the `hello.inc` file contains the print_hello() function as shown below,

-```
+```awk
 function print_hello() { print "hello\n"; }
 ```

 You may include the the file and use the function.

-```
+```awk
@include "hello.inc";
 BEGIN { print_hello(); }
 ```
@ -299,7 +383,7 @@ The semicolon after the included file name is optional. You could write `@includ

 `@include_once` is similar to `@include` except it doesn't include the same file multiple times.

-```
+```awk
@include_once "hello.inc";
@include_once "hello.inc";
 BEGIN { print_hello(); }
@ -309,18 +393,18 @@ In this example, `print_hello()` is not included twice.

 You may use @include and @include_once inside a block as well as at the top level.

-```
+```awk
 BEGIN {
 	@include "init.inc";
 	...
 }
 ```

-### Comments
+## Comments

 `Hawk` supports a single-line commnt that begins with a hash sign # and the C-style multi-line comment.

-```
+```awk
 x = y; # assign y to x.
 /*
 this line is ignored.
@ -328,7 +412,7 @@ this line is ignored too.
 */
 ```

-### Reserved Words
+## Reserved Words

 The following words are reserved and cannot be used as a variable name, a parameter name, or a function name.

@ -363,7 +447,7 @@ The following words are reserved and cannot be used as a variable name, a parame

 However, these words can be used as normal names in the context of a module call. For example, mymod::break. In practice, the predefined names used for built-in commands, functions, and variables are treated as if they are reserved since you can't create another denifition with the same name.

-### Values
+## Values

 - unitialized value
 - integer
@ -394,7 +478,7 @@ BEGIN { $0="ab"; print /ab/, hawk::typename(/ab/); }

 For this reason, there is no way to get the type name of a regular expressin literal.

-### Numbers ###
+## Numbers

 An integer begins with a numeric digit between 0 and 9 inclusive and can be
 followed by more numeric digits. If an integer is immediately followed by a
@ -428,7 +512,7 @@ and represents the value of 0.
 - `0x` # 0x0 but not desirable.
 - `0b` # 0b0 but not desirable.

-### Modules
+## Modules

 Hawk supports various modules.

@ -451,7 +535,7 @@ Hawk supports various modules.
 - hawk::typename
 - hawk::GC_NUM_GENS

-#### String
+### String
 The `str` module provides an extensive set of string manipulation functions.

 - str::fromcharcode
@ -486,7 +570,7 @@ The `str` module provides an extensive set of string manipulation functions.
 - str::trim


-#### System
+### System

 The `sys` module provides various functions concerning the underlying operation system.

@ -765,9 +849,9 @@ BEGIN {
 }
 ```

-### Incompatibility with AWK
+## Incompatibility with AWK

-#### Parameter passing
+### Parameter passing

 In AWK, it is possible for the caller to pass an uninitialized variable as a function parameter and obtain a modified value if the called function sets it to an array.

@ -815,7 +899,7 @@ BEGIN {
 }
 ```

-### Positional variable expression
+## Positional variable expression

 There are subtle differences in handling expressions for positional variables. In Hawk, many of the ambiguity issues can be resolved by enclosing the expression in parentheses.

--- a/lib/hawk.h
+++ b/lib/hawk.h
@ -1261,8 +1261,8 @@ enum hawk_trait_t
 	HAWK_NEWLINE = (1 << 5),

 	/** 
-	 * removes empty fields when splitting a record if FS is a regular
-	 * expression and the match is all spaces.
+	 * removes leading and trailing blank fields when splitting a record if FS
+	 * is a regular expression and the match is all spaces.
 	 *
 	 * \code
 	 * BEGIN { FS="[[:space:]]+"; }