qse/ase/doc/awk-en.man

203 lines
7.9 KiB
Groff

.title ASEAWK
== ASEAWK ==
ASE provides an embeddable processor of a dialect of the AWK programming language. The language implemented is slightly different from {the version developed by Brian W. Kernighan, http://cm.bell-labs.com/cm/cs/awkbook/index.html} and has been adjusted to the author's preference.
=== Overview ===
The following code fragment illustrates the basic steps of embedding the processor.
{{{
1) #include <ase/awk/awk.h>
2) ase_awk_t* awk;
3) awk = ase_awk_open (...);
4) if (ase_awk_parse (awk, ...) == -1)
{
/* parse error */
}
else
{
5) if (ase_awk_run (awk, ...) == -1)
{
/* run-time error */
}
}
6) ase_awk_close (awk);
}}}
(((
* Most of the functions and data types needed are defined in the header file [[ase/awk/awk.h]].
* [[ase_awk_t]] represents the processor. However, the internal representation is not exposed.
* [[ase_awk_open]] creates the processor instance.
* [[ase_awk_parse]] parses an AWK script.
* [[ase_awk_run]] executes the script parsed.
* [[ase_awk_close]] destroys the processor instance.
)))
More detailed description is available {here,awk-mini-en.html}. You may refer to other sample files such as [[ase/test/awk/awk.c]] and [[ase/awk/jni.c]].
=== Primitive Functions ===
A set of primitive functions is needed to create an instance of the AWK processor. A primitive function is a user-defined function to help the library perform system-dependent operations such as memory allocation, character class handling.
{{{
typedef struct ase_awk_prmfns_t ase_awk_prmfns_t;
struct ase_awk_prmfns_t
{
ase_mmgr_t mmgr;
ase_ccls_t ccls;
struct
{
ase_awk_pow_t pow;
ase_awk_sprintf_t sprintf;
ase_awk_dprintf_t dprintf;
void* custom_data;
} misc;
};
}}}
A caller of [[ase_awk_open]] should fill in most of the fields of a [[ase_awk_prmfns_t]] structure and pass the structure to it. The function pointers in the miscellaneous group labeled [misc] is defined as follows:
{{{
/* returns the value of x raised to the power of y */
typedef ase_real_t (*ase_awk_pow_t) (void* custom, ase_real_t x, ase_real_t y);
/* similar to snprintf of the standard C library. */
typedef int (*ase_awk_sprintf_t) (void* custom, ase_char_t* buf, ase_size_t size, const ase_char_t* fmt, ...);
/* similar to printf of the standard C library. called by a few uncommonly
* used output functions usually for debugging purpose */
typedef void (*ase_awk_dprintf_t) (void* custom, const ase_char_t* fmt, ...);
}}}
The fourth field of the group is passed to its member functions as the first argument on invocation. The function pointed by the [[sprintf]] field should ensure that the resuliting string is null-terminated and [[%s]] and [[%c]] are accepted for the [[ase_char_t*]] and [[ase_char_t]] type respectively regardless the character mode.
The memory manager group labeled [mmgr] and the character class group labled [ccls] are defined as follows:
{{{
typedef void* (*ase_malloc_t) (void* custom, ase_size_t n);
typedef void* (*ase_realloc_t) (void* custom, void* ptr, ase_size_t n);
typedef void (*ase_free_t) (void* custom, void* ptr);
typedef ase_bool_t (*ase_isccls_t) (void* custom, ase_cint_t c);
typedef ase_cint_t (*ase_toccls_t) (void* custom, ase_cint_t c);
struct ase_mmgr_t
{
ase_malloc_t malloc;
ase_realloc_t realloc;
ase_free_t free;
void* custom_data;
};
struct ase_ccls_t
{
ase_isccls_t is_upper;
ase_isccls_t is_lower;
ase_isccls_t is_alpha;
ase_isccls_t is_digit;
ase_isccls_t is_xdigit;
ase_isccls_t is_alnum;
ase_isccls_t is_space;
ase_isccls_t is_print;
ase_isccls_t is_graph;
ase_isccls_t is_cntrl;
ase_isccls_t is_punct;
ase_toccls_t to_upper;
ase_toccls_t to_lower;
void* custom_data;
};
}}}
The functions in these groups perform the memory operations and character class related operations respectively. They follow the style of the memory allocation functions and character class handling functions of the standard C library except that they accept a pointer to the user-defined data as the first argument, thus providing more flexibility. The pointer to the user-defined data is specified into the [[custom_data]] field of each group. The [[realloc]] field, however, can be [[ASE_NULL]], in which case the functions pointed by the free and the malloc field replace the role of the function pointed by the [[realloc]] field.
=== Source IO Handler ===
The source code is handled by a source input handler provided by the user. The optional source code output handler can be provided to have the internal parse tree converted back to the source code.
The source code handler is defined as follows:
{{{
typedef ase_ssize_t (*ase_awk_io_t) (int cmd, void* custom, ase_char_t* data, ase_size_t count);
typedef struct ase_awk_srcios_t ase_awk_srcios_t;
struct ase_awk_srcios_t
{
ase_awk_io_t in; /* source input */
ase_awk_io_t out; /* source output */
void* custom_data;
};
}}}
The [[in]] field of the ase_awk_srcios_t is mandatory and should be filled in. The [[out]] field can be set to [[ASE_NULL]] or can point to a source output handling function. The [[custom_data]] field is passed to the source handlers as the second argument. The first parameter [[cmd]] of the source input handler is one of [[ASE_AWK_IO_OPEN]], [[ASE_AWK_IO_CLOSE]], [[ASE_AWK_IO_READ]]. The first parameter [[cmd]] of the source output handler is one of [[ASE_AWK_IO_OPEN]], [[ASE_AWK_IO_CLOSE]], [[ASE_AWK_IO_WRITE]]. The third parameter [[data]] and the fourth parameter [[count]] are the pointer to the buffer to read data into and its size if the first parameter [[cmd]] is [[ASE_AWK_IO_READ]] while they are the pointer to the data and its size if [[cmd]] is [[ASE_AWK_IO_WRITE]].
The source handler should return a negative value for an error and zero or a positive value otherwise. However, there is a subtle difference in the meaning of the return value depending on the value of the first parameter [[cmd]].
When [[cmd]] is [[ASE_AWK_IO_OPEN]], the return value of -1 and 1 indicates the failure and the success respectively. In addition, the return value of 0 indicates that the operation is successful but has reached the end of the stream. The library calls the handler with [[ASE_AWK_IO_CLOSE]] for deinitialization if the return value is 0 or 1. When [[cmd]] is [[ASE_AWK_IO_CLOSE]], the return value of -1 and 0 indicate the failure and the success respectively. When [[cmd]] is [[ASE_AWK_IO_READ]] or [[ASE_AWK_IO_WRITE]], the return value of -1 indicates the failure, 0 the end of the stream, and other positive values the number of characters read or written.
The typical source handler will look as follows:
{{{
ase_ssize_t awk_srcio_in (int cmd, void* arg, ase_char_t* data, ase_size_t size)
{
if (cmd == ASE_AWK_IO_OPEN)
{
/* open the stream */
return 1;
}
else if (cmd == ASE_AWK_IO_CLOSE)
{
/* close the stream */
return 0;
}
else if (cmd == ASE_AWK_IO_READ)
{
/* read the stream */
return the number of characters read;
}
return -1;
}
ase_ssize_t awk_srcio_out (int cmd, void* arg, ase_char_t* data, ase_size_t size)
{
if (cmd == ASE_AWK_IO_OPEN)
{
/* open the stream */
return 1;
}
else if (cmd == ASE_AWK_IO_CLOSE)
{
/* close the stream after flushing it */
return 0;
}
else if (cmd == ASE_AWK_IO_WRITE)
{
/* write the stream */
return the number of characters written;
}
return -1;
}
}}}
Once you have the source handler ready, you can fill in the fields of a [[ase_awk_srcios_t]] structure and pass it to the call of [[ase_awk_parse]].
{{{
ase_awk_srcios_t srcios;
srcios.in = awk_srcio_in;
srcios.out = awk_srcio_out;
srcios.custom_data = point to the extra information necessary;
if (ase_awk_parse (awk, &srcios) == -1)
{
/* handle error */
}
}}}
=== External IO Handler ===
External IO handlers should be provided to support the AWK's built-in IO facilities.