Node:Future Extensions, Previous:Dynamic Extensions, Up:Notes
AWK is a language similar to PERL, only considerably more elegant.
Arnold RobbinsHey!
Larry Wall
This section briefly lists extensions and possible improvements
that indicate the directions we are
currently considering for gawk
. The file FUTURES
in the
gawk
distribution lists these extensions as well.
Following is a list of probable future changes visible at the
awk
language level:
awk
-level interface to the
modules facility is as good as it should be. The interface needs to be
redesigned, particularly taking namespace issues into account, as
well as possibly including issues such as library search path order
and versioning.
RECLEN
variable for fixed-length records
FIELDWIDTHS
, this would speed up the processing of
fixed-length records.
PROCINFO["RS"]
would be "RS"
or "RECLEN"
,
depending upon which kind of record processing is in effect.
printf
specifiers
printf
format specifiers. These should be evaluated for possible inclusion
in gawk
.
awk
array.
gawk
could handle UTF-8 and other
character sets that are larger than eight bits.
lint
warnings
Following is a list of probable improvements that will make gawk
's
source code easier to work with:
gawk
),
but is rather primitive. It requires a fair amount of manual work
to create and integrate a loadable module.
Nor is the current mechanism as portable as might be desired.
The GNU libtool
package provides a number of features that
would make using loadable modules much easier.
gawk
should be changed to use libtool
.
gawk
"exports" should be revised.
Too many things are needlessly exposed. A new API should be designed
and implemented to make module writing easier.
gawk
's management of array subscript storage could use revamping,
so that using the same value to index multiple arrays only
stores one copy of the index value.
Following is a list of probable improvements that will make gawk
perform better:
dfa
dfa
pattern matcher from GNU grep
has some
problems. Either a new version or a fixed one will deal with some
important regexp matching issues.
awk
programs
gawk
uses a Bison (YACC-like)
parser to convert the script given it into a syntax tree; the syntax
tree is then executed by a simple recursive evaluator. This method incurs
a lot of overhead, since the recursive evaluator performs many procedure
calls to do even the simplest things.
It should be possible for gawk
to convert the script's parse tree
into a C program which the user would then compile, using the normal
C compiler and a special gawk
library to provide all the needed
functions (regexps, fields, associative arrays, type coercion, and so on).
An easier possibility might be for an intermediate phase of gawk
to
convert the parse tree into a linear byte code form like the one used
in GNU Emacs Lisp. The recursive evaluator would then be replaced by
a straight line byte code interpreter that would be intermediate in speed
between running a compiled program and doing what gawk
does
now.
Finally, the programs in the test suite could use documenting in this Web page.
See Making Additions to gawk
,
if you are interested in tackling any of these projects.