Node:Close Files And Pipes, Previous:Special Files, Up:Printing
If the same file name or the same shell command is used with getline
more than once during the execution of an awk
program
(see Explicit Input with getline
),
the file is opened (or the command is executed) the first time only.
At that time, the first record of input is read from that file or command.
The next time the same file or command is used with getline
,
another record is read from it, and so on.
Similarly, when a file or pipe is opened for output, the file name or
command associated with it is remembered by awk
, and subsequent
writes to the same file or command are appended to the previous writes.
The file or pipe stays open until awk
exits.
This implies that special steps are necessary in order to read the same
file again from the beginning, or to rerun a shell command (rather than
reading more output from the same command). The close
function
makes these things possible:
close(filename)
or:
close(command)
The argument filename or command can be any expression. Its
value must exactly match the string that was used to open the file or
start the command (spaces and other "irrelevant" characters
included). For example, if you open a pipe with this:
"sort -r names" | getline foo
then you must close it with this:
close("sort -r names")
Once this function call is executed, the next getline
from that
file or command, or the next print
or printf
to that
file or command, reopens the file or reruns the command.
Because the expression that you use to close a file or pipeline must
exactly match the expression used to open the file or run the command,
it is good practice to use a variable to store the file name or command.
The previous example becomes the following:
sortcom = "sort -r names" sortcom | getline foo ... close(sortcom)
This helps avoid hard-to-find typographical errors in your awk
programs. Here are some of the reasons for closing an output file:
awk
program. Close the file after writing it, then
begin reading it with getline
.
awk
program. If the files aren't closed, eventually awk
may exceed a
system limit on the number of open files in one process. It is best to
close each one when the program has finished writing it.
mail
program, the message is not
actually sent until the pipe is closed.
For example, suppose a program pipes output to the mail
program.
If it outputs several lines redirected to this pipe without closing
it, they make a single message of several lines. By contrast, if the
program closes the pipe after each line of output, then each line makes
a separate message.
If you use more files than the system allows you to have open,
gawk
attempts to multiplex the available open files among
your data files. gawk
's ability to do this depends upon the
facilities of your operating system, so it may not always work. It is
therefore both good practice and good portability advice to always
use close
on your files when you are done with them.
In fact, if you are using a lot of pipes, it is essential that
you close commands when done. For example, consider something like this:
{ ... command = ("grep " $1 " /some/file | my_prog -q " $3) while ((command | getline) > 0) { process output of command } # need close(command) here }
This example creates a new pipeline based on data in each record.
Without the call to close
indicated in the comment, awk
creates child processes to run the commands, until it eventually
runs out of file descriptors for more pipelines.
Even though each command has finished (as indicated by the end-of-file
return status from getline
), the child process is not
terminated;1
more importantly, the file descriptor for the pipe
is not closed and released until close
is called or
awk
exits.
close
will silently do nothing if given an argument that
does not represent a file, pipe or coprocess that was opened with
a redirection.
When using the |&
operator to communicate with a coprocess,
it is occasionally useful to be able to close one end of the two-way
pipe without closing the other.
This is done by supplying a second argument to close
.
As in any other call to close
,
the first argument is the name of the command or special file used
to start the coprocess.
The second argument should be a string, with either of the values
"to"
or "from"
. Case does not matter.
As this is an advanced feature, a more complete discussion is
delayed until
Two-Way Communications with Another Process,
which discusses it in more detail and gives an example.
close
's Return ValueIn many versions of Unix awk
, the close
function
is actually a statement. It is a syntax error to try and use the return
value from close
:
(d.c.)
command = "..." command | getline info retval = close(command) # syntax error in most Unix awks
gawk
treats close
as a function.
The return value is -1 if the argument names something
that was never opened with a redirection, or if there is
a system problem closing the file or process.
In these cases, gawk
sets the built-in variable
ERRNO
to a string describing the problem.
In gawk
,
when closing a pipe or coprocess,
the return value is the exit status of the command.
Otherwise, it is the return value from the system's close
or
fclose
C functions when closing input or output
files, respectively.
This value is zero if the close succeeds, or -1 if
it fails.
The return value for closing a pipeline is particularly useful. It allows you to get the output from a command as well as its exit status.
For POSIX-compliant systems,
if the exit status is a number above 128, then the program
was terminated by a signal. Subtract 128 to get the signal number:
exit_val = close(command) if (exit_val > 128) print command, "died with signal", exit_val - 128 else print command, "exited with code", exit_val
Currently, in gawk
, this only works for commands
piping into getline
. For commands piped into
from print
or printf
, the
return value from close
is that of the library's
pclose
function.
The technical terminology is rather morbid. The finished child is called a ``zombie,'' and cleaning up after it is referred to as ``reaping.''