Node:Basic Data Typing, Next:Floating Point Issues, Previous:Basic High Level, Up:Basic Concepts
In a program,
you keep track of information and values in things called variables.
A variable is just a name for a given value, such as first_name
,
last_name
, address
, and so on.
awk
has several predefined variables, and it has
special names to refer to the current input record
and the fields of the record.
You may also group multiple
associated values under one name, as an array.
Data, particularly in awk
, consists of either numeric
values, such as 42 or 3.1415927, or string values.
String values are essentially anything that's not a number, such as a name.
Strings are sometimes referred to as character data, since they
store the individual characters that comprise them.
Individual variables, as well as numeric and string variables, are
referred to as scalar values.
Groups of values, such as arrays, are not scalars.
Within computers, there are two kinds of numeric values: integers and floating-point. In school, integer values were referred to as "whole" numbers--that is, numbers without any fractional part, such as 1, 42, or -17. The advantage to integer numbers is that they represent values exactly. The disadvantage is that their range is limited. On most modern systems, this range is -2,147,483,648 to 2,147,483,647.
Integer values come in two flavors: signed and unsigned. Signed values may be negative or positive, with the range of values just described. Unsigned values are always positive. On most modern systems, the range is from 0 to 4,294,967,295.
Floating-point numbers represent what are called "real" numbers; i.e.,
those that do have a fractional part, such as 3.1415927.
The advantage to floating-point numbers is that they
can represent a much larger range of values.
The disadvantage is that there are numbers that they cannot represent
exactly.
awk
uses double-precision floating-point numbers, which
can hold more digits than single-precision
floating-point numbers.
Floating-point issues are discussed more fully in
Floating-Point Number Caveats.
At the very lowest level, computers store values as groups of binary digits,
or bits. Modern computers group bits into groups of eight, called bytes.
Advanced applications sometimes have to manipulate bits directly,
and gawk
provides functions for doing so.
While you are probably used to the idea of a number without a value (i.e., zero),
it takes a bit more getting used to the idea of zero-length character data.
Nevertheless, such a thing exists.
It is called the null string.
The null string is character data that has no value.
In other words, it is empty. It is written in awk
programs
like this: ""
.
Humans are used to working in decimal; i.e., base 10. In base 10, numbers go from 0 to 9, and then "roll over" into the next column. (Remember grade school? 42 is 4 times 10 plus 2.)
There are other number bases though. Computers commonly use base 2 or binary, base 8 or octal, and base 16 or hexadecimal. In binary, each column represents two times the value in the column to its right. Each column may contain either a 0 or a 1. Thus, binary 1010 represents 1 times 8, plus 0 times 4, plus 1 times 2, plus 0 times 1, or decimal 10. Octal and hexadecimal are discussed more in Octal and Hexadecimal Numbers.
Programs are written in programming languages.
Hundreds, if not thousands, of programming languages exist.
One of the most popular is the C programming language.
The C language had a very strong influence on the design of
the awk
language.
There have been several versions of C. The first is often referred to
as "K&R" C, after the initials of Brian Kernighan and Dennis Ritchie,
the authors of the first book on C. (Dennis Ritchie created the language,
and Brian Kernighan was one of the creators of awk
.)
In the mid-1980s, an effort began to produce an international standard
for C. This work culminated in 1989, with the production of the ANSI
standard for C. This standard became an ISO standard in 1990.
Where it makes sense, POSIX awk
is compatible with 1990 ISO C.
In 1999, a revised ISO C standard was approved and released.
Future versions of gawk
will be as compatible as possible
with this standard.