This commit is contained in:
hyung-hwan 2007-12-24 01:12:53 +00:00
parent 397ea9dd43
commit 23c849e75a
5 changed files with 220 additions and 101 deletions

View File

@ -1,11 +0,0 @@
== Difference From The Standard AWK ==
== line terminiator ==
it doesn't accept the new line as a line terminator.
== print/printf ==
if the statement succeeds, it sets ERRNO to 0. otherwise, it sets ERRNO to -1.

View File

@ -1,47 +0,0 @@
.title Embedding AWK
To embed the awk interpreter to an application, the developer should provide the following routines.
* System functions including memory management
* Source code I/O functions
* I/O functions to interface with the console, files, and pipes.
ase_awk_open creates an awk object and requires a pointer to a structure holding system functions. The structure is described in ase_awk_prmfns_t.
{{{
struct ase_awk_prmfns_t
{
/* memory */
void* (*malloc) (ase_size_t n, void* custom_data);
void* (*realloc) (void* ptr, ase_size_t n, void* custom_data);
void (*free) (void* ptr, void* custom_data);
/* character class */
ase_bool_t (*is_upper) (ase_cint_t c);
ase_bool_t (*is_lower) (ase_cint_t c);
ase_bool_t (*is_alpha) (ase_cint_t c);
ase_bool_t (*is_digit) (ase_cint_t c);
ase_bool_t (*is_xdigit) (ase_cint_t c);
ase_bool_t (*is_alnum) (ase_cint_t c);
ase_bool_t (*is_space) (ase_cint_t c);
ase_bool_t (*is_print) (ase_cint_t c);
ase_bool_t (*is_graph) (ase_cint_t c);
ase_bool_t (*is_cntrl) (ase_cint_t c);
ase_bool_t (*is_punct) (ase_cint_t c);
ase_cint_t (*to_upper) (ase_cint_t c);
ase_cint_t (*to_lower) (ase_cint_t c);
/* utilities */
void* (*memcpy) (void* dst, const void* src, ase_size_t n);
void* (*memset) (void* dst, int val, ase_size_t n);
ase_real_t (*pow) (ase_real_t x, ase_real_t y);
int (*sprintf) (ase_char_t* buf, ase_size_t size, const ase_char_t* fmt, ...);
void (*aprintf) (const ase_char_t* fmt, ...); /* assertion */
void (*dprintf) (const ase_char_t* fmt, ...); /* debug */
void (*abort) (void);
void* custom_data;
};
}}}

View File

@ -1,9 +1,16 @@
.title ASEAWK
.title Embedding ASEAWK
.tabstop 6
= ASEAWK =
ASE provides an embeddable processor of a dialect of the AWK programming language. The language implemented is slightly different from {the version developed by Brian W. Kernighan, http://cm.bell-labs.com/cm/cs/awkbook/index.html} and has been adjusted to the author's preference.
== Overview ==
To embed the awk interpreter to an application, the developer should provide the following routines.
* System functions including memory management
* Source code I/O functions
* I/O functions to interface with the console, files, and pipes.
The following code fragment illustrates the basic steps of embedding the processor.
{{{
@ -26,15 +33,15 @@ The following code fragment illustrates the basic steps of embedding the process
}}}
(((
* Most of the functions and data types needed are defined in the header file [[ase/awk/awk.h]].
* [[ase_awk_t]] represents the processor. However, the internal representation is not exposed.
* [[ase_awk_open]] creates the processor instance.
* [[ase_awk_parse]] parses an AWK script.
* [[ase_awk_run]] executes the script parsed.
* [[ase_awk_close]] destroys the processor instance.
* Most of the functions and data types needed are defined in the header file ##=ase/awk/awk.h=##.
* ##=ase_awk_t=## represents the processor. However, the internal representation is not exposed.
* ##=ase_awk_open=## creates the processor instance.
* ##=ase_awk_parse=## parses an AWK script.
* ##=ase_awk_run=## executes the script parsed.
* ##=ase_awk_close=## destroys the processor instance.
)))
More detailed description is available {here,awk-mini-en.html}. You may refer to other sample files such as [[ase/test/awk/awk.c]] and [[ase/awk/jni.c]].
More detailed description is available {here,awk-mini-en.html}. You may refer to other sample files such as ##=ase/test/awk/awk.c=## and ##=ase/awk/jni.c=##.
== Primitive Functions ==
A set of primitive functions is needed to create an instance of the AWK processor. A primitive function is a user-defined function to help the library perform system-dependent operations such as memory allocation, character class handling.
@ -57,7 +64,7 @@ struct ase_awk_prmfns_t
};
}}}
A caller of [[ase_awk_open]] should fill in most of the fields of a [[ase_awk_prmfns_t]] structure and pass the structure to it. The function pointers in the miscellaneous group labeled [misc] is defined as follows:
A caller of ##=ase_awk_open=## should fill in most of the fields of a ##=ase_awk_prmfns_t=## structure and pass the structure to it. The function pointers in the miscellaneous group labeled [misc] is defined as follows:
{{{
/* returns the value of x raised to the power of y */
@ -71,7 +78,7 @@ typedef int (*ase_awk_sprintf_t) (void* custom, ase_char_t* buf, ase_size_t size
typedef void (*ase_awk_dprintf_t) (void* custom, const ase_char_t* fmt, ...);
}}}
The fourth field of the group is passed to its member functions as the first argument on invocation. The function pointed by the [[sprintf]] field should ensure that the resuliting string is null-terminated and [[%s]] and [[%c]] are accepted for the [[ase_char_t*]] and [[ase_char_t]] type respectively regardless the character mode.
The fourth field of the group is passed to its member functions as the first argument on invocation. The function pointed by the ##=sprintf=## field should ensure that the resuliting string is null-terminated and ##=%s=## and ##=%c=## are accepted for the ##=ase_char_t*=## and ##=ase_char_t=## type respectively regardless the character mode.
The memory manager group labeled [mmgr] and the character class group labled [ccls] are defined as follows:
@ -110,7 +117,7 @@ struct ase_ccls_t
};
}}}
The functions in these groups perform the memory operations and character class related operations respectively. They follow the style of the memory allocation functions and character class handling functions of the standard C library except that they accept a pointer to the user-defined data as the first argument, thus providing more flexibility. The pointer to the user-defined data is specified into the [[custom_data]] field of each group. The [[realloc]] field, however, can be [[ASE_NULL]], in which case the functions pointed by the free and the malloc field replace the role of the function pointed by the [[realloc]] field.
The functions in these groups perform the memory operations and character class related operations respectively. They follow the style of the memory allocation functions and character class handling functions of the standard C library except that they accept a pointer to the user-defined data as the first argument, thus providing more flexibility. The pointer to the user-defined data is specified into the ##=custom_data=## field of each group. The ##=realloc=## field, however, can be ##=ASE_NULL=##, in which case the functions pointed by the free and the malloc field replace the role of the function pointed by the ##=realloc=## field.
== Source IO Handler ==
@ -131,11 +138,11 @@ struct ase_awk_srcios_t
};
}}}
The [[in]] field of the ase_awk_srcios_t is mandatory and should be filled in. The [[out]] field can be set to [[ASE_NULL]] or can point to a source output handling function. The [[custom_data]] field is passed to the source handlers as the second argument. The first parameter [[cmd]] of the source input handler is one of [[ASE_AWK_IO_OPEN]], [[ASE_AWK_IO_CLOSE]], [[ASE_AWK_IO_READ]]. The first parameter [[cmd]] of the source output handler is one of [[ASE_AWK_IO_OPEN]], [[ASE_AWK_IO_CLOSE]], [[ASE_AWK_IO_WRITE]]. The third parameter [[data]] and the fourth parameter [[count]] are the pointer to the buffer to read data into and its size if the first parameter [[cmd]] is [[ASE_AWK_IO_READ]] while they are the pointer to the data and its size if [[cmd]] is [[ASE_AWK_IO_WRITE]].
The ##=in=## field of the ase_awk_srcios_t is mandatory and should be filled in. The ##=out=## field can be set to ##=ASE_NULL=## or can point to a source output handling function. The ##=custom_data=## field is passed to the source handlers as the second argument. The first parameter ##=cmd=## of the source input handler is one of ##=ASE_AWK_IO_OPEN=##, ##=ASE_AWK_IO_CLOSE=##, ##=ASE_AWK_IO_READ=##. The first parameter ##=cmd=## of the source output handler is one of ##=ASE_AWK_IO_OPEN=##, ##=ASE_AWK_IO_CLOSE=##, ##=ASE_AWK_IO_WRITE=##. The third parameter ##=data=## and the fourth parameter ##=count=## are the pointer to the buffer to read data into and its size if the first parameter ##=cmd=## is ##=ASE_AWK_IO_READ=## while they are the pointer to the data and its size if ##=cmd=## is ##=ASE_AWK_IO_WRITE=##.
The source handler should return a negative value for an error and zero or a positive value otherwise. However, there is a subtle difference in the meaning of the return value depending on the value of the first parameter [[cmd]].
The source handler should return a negative value for an error and zero or a positive value otherwise. However, there is a subtle difference in the meaning of the return value depending on the value of the first parameter ##=cmd=##.
When [[cmd]] is [[ASE_AWK_IO_OPEN]], the return value of -1 and 1 indicates the failure and the success respectively. In addition, the return value of 0 indicates that the operation is successful but has reached the end of the stream. The library calls the handler with [[ASE_AWK_IO_CLOSE]] for deinitialization if the return value is 0 or 1. When [[cmd]] is [[ASE_AWK_IO_CLOSE]], the return value of -1 and 0 indicate the failure and the success respectively. When [[cmd]] is [[ASE_AWK_IO_READ]] or [[ASE_AWK_IO_WRITE]], the return value of -1 indicates the failure, 0 the end of the stream, and other positive values the number of characters read or written.
When ##=cmd=## is ##=ASE_AWK_IO_OPEN=##, the return value of -1 and 1 indicates the failure and the success respectively. In addition, the return value of 0 indicates that the operation is successful but has reached the end of the stream. The library calls the handler with ##=ASE_AWK_IO_CLOSE=## for deinitialization if the return value is 0 or 1. When ##=cmd=## is ##=ASE_AWK_IO_CLOSE=##, the return value of -1 and 0 indicate the failure and the success respectively. When ##=cmd=## is ##=ASE_AWK_IO_READ=## or ##=ASE_AWK_IO_WRITE=##, the return value of -1 indicates the failure, 0 the end of the stream, and other positive values the number of characters read or written.
The typical source handler will look as follows:
{{{
@ -182,7 +189,7 @@ ase_ssize_t awk_srcio_out (int cmd, void* arg, ase_char_t* data, ase_size_t size
}
}}}
Once you have the source handler ready, you can fill in the fields of a [[ase_awk_srcios_t]] structure and pass it to the call of [[ase_awk_parse]].
Once you have the source handler ready, you can fill in the fields of a ##=ase_awk_srcios_t=## structure and pass it to the call of ##=ase_awk_parse=##.
{{{
ase_awk_srcios_t srcios;

198
ase/doc/en/awk-lang.man Normal file
View File

@ -0,0 +1,198 @@
.title AWK Language
.tabstop 6
Most of the AWK language features are supported. This documents shows notable language features that might be different from other implementations.
== Variable ==
A local variable and a global variable are supported if ASE_AWK_EXPLICIT is enabled. ASE_AWK_IMPLICIT is to enable a named variable. You may enable both options to support both types of variables. Either should be enabled for the language to be useful, however.
A local variable can be declared at the top of each block before any statements are encountered. A global variable can be declared in any places outside a function and a pattern-action block.
{{{
global a, b;
global c;
BEGIN {
local x, y;
a = 30; x = 30; x = a + 40; print x; }
}
}}}
{{|
! Code
! Description
|-
| function a() { }
BEGIN { ##-a=20;-## }
| A function and a named variable cannot have the same name. A named variable requires ASE_AWK_IMPLICIT to be enabled.
|-
| function a() { }
BEGIN {
local a;
a = 20;
}
| A local variable can shade the same function name. The deparsed output shows this.
function a ()
{
}
BEGIN {
local __local0;
__local0 = 20;
}
Local variable declaration requires ASE_AWK_EXPLICIT, though.
|-
| global a;
function ##-a()-## { }
function a() { }
global ##-a-##;
| A function and a global variable cannot have the same name.
|-
| function fn () {
x = 20;
return x;
}
global x;
BEGIN {
x = 30;
print fn ();
print x;
}
| A global variable is visible after it is declared to the remaining part of the program. x inside fn is x named variable while x in BEGIN is a global variable.
global __global17;
function fn ()
{
x = 20;
return x;
}
BEGIN {
__global17 = 30;
print fn ();
print __global17;
}
|-
| global x;
BEGIN {
x = 1;
{
local x;
x = 2;
{
local x;
x = 3;
print x;
}
print x;
}
print x;
}
| A local variable can shade a global variable and a local variable at outer scope.
global __global17;
BEGIN {
local __local0, __local1;
__global17 = 1;
{
__local0 = 2;
{
__local1 = 3;
print __local1;
}
print __local0;
}
print __global17;
}
|}}
== Parameter ==
A parameter name can shade a enclosing function name. The following table shows the details.
{{|
! Code
! Description
|-
| function f(f) { print f; }
| A parameter name can be the same as the enclosing function name.
|-
| function f(f) { ##-f("hello")-##; }
| A resursive call to the function f is not possible as the third f is the parameter f.
|-
| function fn(f) {
f = 20;
}
BEGIN {
f = 50;
fn(100);
print f;
}
| 50 is printed. The parameter f in fn doesn't affect the named variable f in BEGIN. The deparsed output shows this clearly.
function fn (__param0)
{
__param0 = 20;
}
BEGIN {
f = 50;
fn (100);
print f;
}
|}}
== Statement Terminator ==
A statement must end with a semicolon. A new-line character is treated as whitespace. For this reason, no line continuator, a backslash, is supported.
{{{
BEGIN { print "hello, world"; }
}}}
== Function ==
A blank is allowed between a function name and a left parenthesis. The left bracket for function body doesn't have to be on the same line as the function name and parameters.
{{{
function fn (x, y)
{
return x + y;
}
BEGIN { print fn (10, 20); }
}}}
== Return ==
A return statement is allowed in BEGIN and END.
{{{
END { return 20; }
}}}
== Pattern-Action Block ==
ASE_AWK_BLOCKLESS enables the use of a action-less pattern-action block. Turning it off changes the parser behaviour to treat a block not following any patterns, BEGIN, END.
{{{
BEGIN
{ print "hello"; }
{ print "hello2"; }
}}}
In the code snippet above, the first block is associated with BEGIN while the second block is a patternless pattern-action block that matches any lines of input. It is the same as the following.
{{{
BEGIN {
print "hello";
}
{
print "hello2";
}
}}}

View File

@ -41,7 +41,6 @@ $ make
The following table shows the output locations of generated files.
{{|
|-
! Mode
! Executable Files
! Library Files
@ -63,7 +62,6 @@ The following table shows the output locations of generated files.
If you have preference for a particular compiler and the flags, you may explicitly specify them when you run the [[configure]] script. Here are presented a few such examples.
{{|
|-
!
! 32 Bits
! 64 Bits
@ -140,7 +138,6 @@ error string customization
== Languages ==
{{|
|-
! Language
! Status
! Bindings
@ -157,28 +154,3 @@ error string customization
| Planned
| C
|}}
== AWK ==
Language Difference
{{|
|-
! ASE
! NAWK
! REMARKS
|-
| Statement terminator Semicolon
| n/a
| n/a
|-
| function abc()
this is very nice hello world...
| function abc ()
| n/a
|---------------------------------------
| return in BEGIN or END
|
|
|}}