Node:Internals, Next:Sample Library, Previous:Dynamic Extensions, Up:Dynamic Extensions
gawk
InternalsThe truth is that gawk
was not designed for simple extensibility.
The facilities for adding functions using shared libraries work, but
are something of a "bag on the side." Thus, this tour is
brief and simplistic; would-be gawk
hackers are encouraged to
spend some time reading the source code before trying to write
extensions based on the material presented here. Of particular note
are the files awk.h
, builtin.c
, and eval.c
.
Reading awk.y
in order to see how the parse tree is built
would also be of use.
With the disclaimers out of the way, the following types, structure
members, functions, and macros are declared in awk.h
and are of
use when writing extensions. The next section
shows how they are used:
AWKNUM
AWKNUM
is the internal type of awk
floating-point numbers. Typically, it is a C double
.
NODE
NODE
.
These contain both strings and numbers, as well as variables and arrays.
AWKNUM force_number(NODE *n)
gawk
function.
void force_string(NODE *n)
NODE
's string value is current.
It may end up calling an internal gawk
function.
It also guarantees that the string is zero-terminated.
n->param_cnt
n->stptr
n->stlen
NODE
's string value, respectively.
The string is not guaranteed to be zero-terminated.
If you need to pass the string value to a C library function, save
the value in n->stptr[n->stlen]
, assign '\0'
to it,
call the routine, and then restore the value.
n->type
NODE
. This is a C enum
. Values should
be either Node_var
or Node_var_array
for function
parameters.
n->vname
void assoc_clear(NODE *n)
n
.
Make sure that n->type == Node_var_array
first.
NODE **assoc_lookup(NODE *symbol, NODE *subs, int reference)
symbol
is the array, subs
is the subscript.
This is usually a value created with tmp_string
(see below).
reference
should be TRUE
if it is an error to use the
value before it is created. Typically, FALSE
is the
correct value to use from extension functions.
NODE *make_string(char *s, size_t len)
NODE
that
can be stored appropriately. This is permanent storage; understanding
of gawk
memory management is helpful.
NODE *make_number(AWKNUM val)
AWKNUM
and turn it into a pointer to a NODE
that
can be stored appropriately. This is permanent storage; understanding
of gawk
memory management is helpful.
NODE *tmp_string(char *s, size_t len);
NODE
that
can be stored appropriately. This is temporary storage; understanding
of gawk
memory management is helpful.
NODE *tmp_number(AWKNUM val)
AWKNUM
and turn it into a pointer to a NODE
that
can be stored appropriately. This is temporary storage;
understanding of gawk
memory management is helpful.
NODE *dupnode(NODE *n)
NODE
;
understanding of gawk
memory management is helpful.
void free_temp(NODE *n)
NODE
allocated with tmp_string
or tmp_number
.
Understanding of gawk
memory management is helpful.
void make_builtin(char *name, NODE *(*func)(NODE *), int count)
func
as new built-in
function name
. name
is a regular C string. count
is the maximum number of arguments that the function takes.
The function should be written in the following manner:
/* do_xxx --- do xxx function for gawk */ NODE * do_xxx(NODE *tree) { ... }
NODE *get_argument(NODE *tree, int i)
i
-th argument from the function call.
The first argument is argument zero.
void set_value(NODE *tree)
awk
program sees as the return value from the
new awk
function.
void update_ERRNO(void)
gawk
's ERRNO
variable, based on the current
value of the C errno
variable.
It is provided as a convenience.
An argument that is supposed to be an array needs to be handled with
some extra code, in case the array being passed in is actually
from a function parameter.
The following boilerplate code shows how to do this:
NODE *the_arg; the_arg = get_argument(tree, 2); /* assume need 3rd arg, 0-based */ /* if a parameter, get it off the stack */ if (the_arg->type == Node_param_list) the_arg = stack_ptr[the_arg->param_cnt]; /* parameter referenced an array, get it */ if (the_arg->type == Node_array_ref) the_arg = the_arg->orig_array; /* check type */ if (the_arg->type != Node_var && the_arg->type != Node_var_array) fatal("newfunc: third argument is not an array"); /* force it to be an array, if necessary, clear it */ the_arg->type = Node_var_array; assoc_clear(the_arg);
Again, you should spend time studying the gawk
internals;
don't just blindly copy this code.