Go to the first, previous, next, last section, table of contents.


Introduction

If you are like many computer users, you would frequently like to make changes in various text files wherever certain patterns appear, or extract data from parts of certain lines while discarding the rest. To write a program to do this in a language such as C or Pascal is a time-consuming inconvenience that may take many lines of code. The job may be easier with awk.

The awk utility interprets a special-purpose programming language that makes it possible to handle simple data-reformatting jobs with just a few lines of code.

The GNU implementation of awk is called gawk; it is fully upward compatible with the System V Release 4 version of awk. gawk is also upward compatible with the POSIX specification of the awk language. This means that all properly written awk programs should work with gawk. Thus, we usually don't distinguish between gawk and other awk implementations.

Using awk you can:

Using This Book

The term awk refers to a particular program, and to the language you use to tell this program what to do. When we need to be careful, we call the program "the awk utility" and the language "the awk language." The term gawk refers to a version of awk developed as part the GNU project. The purpose of this book is to explain both the awk language and how to run the awk utility.

The main purpose of the book is to explain the features of awk, as defined in the POSIX standard. It does so in the context of one particular implementation, gawk. While doing so, it will also attempt to describe important differences between gawk and other awk implementations. Finally, any gawk features that are not in the POSIX standard for awk will be noted.

This book has the difficult task of being both tutorial and reference. If you are a novice, feel free to skip over details that seem too complex. You should also ignore the many cross references; they are for the expert user, and for the on-line Info version of the document.

The term awk program refers to a program written by you in the awk programming language.

See section Getting Started with awk, for the bare essentials you need to know to start using awk.

Some useful "one-liners" are included to give you a feel for the awk language (see section Useful One Line Programs).

Many sample awk programs have been provided for you (see section A Library of awk Functions; also see section Practical awk Programs).

The entire awk language is summarized for quick reference in section gawk Summary. Look there if you just need to refresh your memory about a particular feature.

If you find terms that you aren't familiar with, try looking them up in the glossary (see section Glossary).

Most of the time complete awk programs are used as examples, but in some of the more advanced sections, only the part of the awk program that illustrates the concept being described is shown.

While this book is aimed principally at people who have not been exposed to awk, there is a lot of information here that even the awk expert should find useful. In particular, the description of POSIX awk, and the example programs in section A Library of awk Functions, and section Practical awk Programs, should be of interest.

Dark Corners

Who opened that window shade?!?
Count Dracula

Until the POSIX standard (and The Gawk Manual), many features of awk were either poorly documented, or not documented at all. Descriptions of such features (often called "dark corners") are noted in this book with "(d.c.)". They also appear in the index under the heading "dark corner."

Typographical Conventions

This book is written using Texinfo, the GNU documentation formatting language. A single Texinfo source file is used to produce both the printed and on-line versions of the documentation. Because of this, the typographical conventions are slightly different than in other books you may have read.

Examples you would type at the command line are preceded by the common shell primary and secondary prompts, `$' and `>'. Output from the command is preceded by the glyph "-|". This typically represents the command's standard output. Error messages, and other output on the command's standard error, are preceded by the glyph "error-->". For example:

$ echo hi on stdout
-| hi on stdout
$ echo hello on stderr 1>&2
error--> hello on stderr

In the text, command names appear in this font, while code segments appear in the same font and quoted, `like this'. Some things will be emphasized like this, and if a point needs to be made strongly, it will be done like this. The first occurrence of a new term is usually its definition, and appears in the same font as the previous occurrence of "definition" in this sentence. File names are indicated like this: `/path/to/ourfile'.

Characters that you type at the keyboard look like this. In particular, there are special characters called "control characters." These are characters that you type by holding down both the CONTROL key and another key, at the same time. For example, a Control-d is typed by first pressing and holding the CONTROL key, next pressing the d key, and finally releasing both keys.

Data Files for the Examples

Many of the examples in this book take their input from two sample data files. The first, called `BBS-list', represents a list of computer bulletin board systems together with information about those systems. The second data file, called `inventory-shipped', contains information about shipments on a monthly basis. In both files, each line is considered to be one record.

In the file `BBS-list', each record contains the name of a computer bulletin board, its phone number, the board's baud rate(s), and a code for the number of hours it is operational. An `A' in the last column means the board operates 24 hours a day. A `B' in the last column means the board operates evening and weekend hours, only. A `C' means the board operates only on weekends.

aardvark     555-5553     1200/300          B
alpo-net     555-3412     2400/1200/300     A
barfly       555-7685     1200/300          A
bites        555-1675     2400/1200/300     A
camelot      555-0542     300               C
core         555-2912     1200/300          C
fooey        555-1234     2400/1200/300     B
foot         555-6699     1200/300          B
macfoo       555-6480     1200/300          A
sdace        555-3430     2400/1200/300     A
sabafoo      555-2127     1200/300          C

The second data file, called `inventory-shipped', represents information about shipments during the year. Each record contains the month of the year, the number of green crates shipped, the number of red boxes shipped, the number of orange bags shipped, and the number of blue packages shipped, respectively. There are 16 entries, covering the 12 months of one year and four months of the next year.

Jan  13  25  15 115
Feb  15  32  24 226
Mar  15  24  34 228
Apr  31  52  63 420
May  16  34  29 208
Jun  31  42  75 492
Jul  24  34  67 436
Aug  15  34  47 316
Sep  13  55  37 277
Oct  29  54  68 525
Nov  20  87  82 577
Dec  17  35  61 401

Jan  21  36  64 620
Feb  26  58  80 652
Mar  24  75  70 495
Apr  21  70  74 514


Go to the first, previous, next, last section, table of contents.