Node:Field Separators, Next:Constant Size, Previous:Changing Fields, Up:Reading Files
FS
from the command-line.
The field separator, which is either a single character or a regular
expression, controls the way awk
splits an input record into fields.
awk
scans the input record for character sequences that
match the separator; the fields themselves are the text between the matches.
In the examples that follow, we use the bullet symbol () to
represent spaces in the output.
If the field separator is oo
, then the following line:
moo goo gai pan
is split into three fields: m
, g
, and
gaipan
.
Note the leading spaces in the values of the second and third fields.
The field separator is represented by the built-in variable FS
.
Shell programmers take note: awk
does not use the
name IFS
that is used by the POSIX-compliant shells (such as
the Unix Bourne shell, sh
, or bash
).
The value of FS
can be changed in the awk
program with the
assignment operator, =
(see Assignment Expressions).
Often the right time to do this is at the beginning of execution
before any input has been processed, so that the very first record
is read with the proper separator. To do this, use the special
BEGIN
pattern
(see The BEGIN
and END
Special Patterns).
For example, here we set the value of FS
to the string
","
:
awk 'BEGIN { FS = "," } ; { print $2 }'
Given the input line:
John Q. Smith, 29 Oak St., Walamazoo, MI 42139
this awk
program extracts and prints the string
29OakSt.
.
Sometimes the input data contains separator characters that don't
separate fields the way you thought they would. For instance, the
person's name in the example we just used might have a title or
suffix attached, such as:
John Q. Smith, LXIX, 29 Oak St., Walamazoo, MI 42139
The same program would extract LXIX
, instead of
29OakSt.
.
If you were expecting the program to print the
address, you would be surprised. The moral is to choose your data layout and
separator characters carefully to prevent such problems.
(If the data is not in a form that is easy to process, perhaps you
can massage it first with a separate awk
program.)
Fields are normally separated by whitespace sequences
(spaces, tabs, and newlines), not by single spaces. Two spaces in a row do not
delimit an empty field. The default value of the field separator FS
is a string containing a single space, " "
. If awk
interpreted this value in the usual way, each space character would separate
fields, so two spaces in a row would make an empty field between them.
The reason this does not happen is that a single space as the value of
FS
is a special case--it is taken to specify the default manner
of delimiting fields.
If FS
is any other single character, such as ","
, then
each occurrence of that character separates two fields. Two consecutive
occurrences delimit an empty field. If the character occurs at the
beginning or the end of the line, that too delimits an empty field. The
space character is the only single character that does not follow these
rules.