Node:Typing and Comparison, Next:Boolean Ops, Previous:Truth Values, Up:Expressions
The Guide is definitive. Reality is frequently inaccurate.
The Hitchhiker's Guide to the Galaxy
Unlike other programming languages, awk
variables do not have a
fixed type. Instead, they can be either a number or a string, depending
upon the value that is assigned to them.
The 1992 POSIX standard introduced
the concept of a numeric string, which is simply a string that looks
like a number--for example, " +2"
. This concept is used
for determining the type of a variable.
The type of the variable is important because the types of two variables
determine how they are compared.
In gawk
, variable typing follows these rules:
getline
input, FILENAME
, ARGV
elements,
ENVIRON
elements, and the
elements of an array created by split
that are numeric strings
have the strnum attribute. Otherwise, they have the string
attribute.
Uninitialized variables also have the strnum attribute.
The last rule is particularly important. In the following program,
a
has numeric type, even though it is later used in a string
operation:
BEGIN { a = 12.345 b = a " is a cute number" print b }
When two operands are compared, either string comparison or numeric comparison
may be used. This depends upon the attributes of the operands, according to the
following symmetric matrix:
+---------------------------------------------- | STRING NUMERIC STRNUM --------+---------------------------------------------- | STRING | string string string | NUMERIC | string numeric numeric | STRNUM | string numeric numeric --------+----------------------------------------------
The basic idea is that user input that looks numeric--and only
user input--should be treated as numeric, even though it is actually
made of characters and is therefore also a string.
Thus, for example, the string constant " +3.14"
is a string, even though it looks numeric,
and is never treated as number for comparison
purposes.
In short, when one operand is a "pure" string, such as a string constant, then a string comparison is performed. Otherwise, a numeric comparison is performed.1
Comparison expressions compare strings or numbers for relationships such as equality. They are written using relational operators, which are a superset of those in C. Here is a table of them:
x < y
x <= y
x > y
x >= y
x == y
x != y
x ~ y
x !~ y
subscript in array
Comparison expressions have the value one if true and zero if false.
When comparing operands of mixed types, numeric operands are converted
to strings using the value of CONVFMT
(see Conversion of Strings and Numbers).
Strings are compared
by comparing the first character of each, then the second character of each,
and so on. Thus, "10"
is less than "9"
. If there are two
strings where one is a prefix of the other, the shorter string is less than
the longer one. Thus, "abc"
is less than "abcd"
.
It is very easy to accidentally mistype the ==
operator and
leave off one of the =
characters. The result is still valid awk
code, but the program does not do what is intended:
if (a = b) # oops! should be a == b ... else ...
Unless b
happens to be zero or the null string, the if
part of the test always succeeds. Because the operators are
so similar, this kind of error is very difficult to spot when
scanning the source code.
The following table of expressions illustrates the kind of comparison
gawk
performs, as well as what the result of the comparison is:
1.5 <= 2.0
"abc" >= "xyz"
1.5 != " +2"
"1e2" < "3"
a = 2; b = "2"
a == b
a = 2; b = " +2"
a == b
In the next example:
$ echo 1e2 3 | awk '{ print ($1 < $2) ? "true" : "false" }' -| false
the result is false
because both $1
and $2
are user input. They are numeric strings--therefore both have
the strnum attribute, dictating a numeric comparison.
The purpose of the comparison rules and the use of numeric strings is
to attempt to produce the behavior that is "least surprising," while
still "doing the right thing."
String comparisons and regular expression comparisons are very different.
For example:
x == "foo"
has the value one, or is true if the variable x
is precisely foo
. By contrast:
x ~ /foo/
has the value one if x
contains foo
, such as
"Oh, what a fool am I!"
.
The righthand operand of the ~
and !~
operators may be
either a regexp constant (/.../
) or an ordinary
expression. In the latter case, the value of the expression as a string is used as a
dynamic regexp (see How to Use Regular Expressions; also
see Using Dynamic Regexps).
In modern implementations of awk
, a constant regular
expression in slashes by itself is also an expression. The regexp
/regexp/
is an abbreviation for the following comparison expression:
$0 ~ /regexp/
One special place where /foo/
is not an abbreviation for
$0 ~ /foo/
is when it is the righthand operand of ~
or
!~
.
See Using Regular Expression Constants,
where this is discussed in more detail.
The POSIX standard is under
revision. The revised standard's rules for typing and comparison are
the same as just described for gawk
.