These commands work with (or produce) sorted files.
sort
: Sort text files
sort
sorts, merges, or compares all the lines from the given
files, or standard input if none are given or for a file of
`-'. By default, sort
writes the results to standard
output. Synopsis:
sort [option]... [file]...
sort
has three modes of operation: sort (the default), merge,
and check for sortedness. The following options change the operation
mode:
A pair of lines is compared as follows: if any key fields have been
specified, sort
compares each pair of fields, in the order
specified on the command line, according to the associated ordering
options, until a difference is found or no fields are left.
If any of the global options `Mbdfinr' are given but no key fields
are specified, sort
compares the entire lines according to the
global options.
Finally, as a last resort when all keys compare equal (or if no
ordering options were specified at all), sort
compares the lines
byte by byte in machine collating sequence. The last resort comparison
honors the `-r' global option. The `-s' (stable) option
disables this last-resort comparison so that lines in which all fields
compare equal are left in their original relative order. If no fields
or global options are specified, `-s' has no effect.
GNU sort
(as specified for all GNU utilities) has no limits on
input line length or restrictions on bytes allowed within lines. In
addition, if the final byte of an input file is not a newline, GNU
sort
silently supplies one.
Upon any error, sort
exits with a status of `2'.
If the environment variable TMPDIR
is set, sort
uses its
value as the directory for temporary files instead of `/tmp'. The
`-T tempdir' option in turn overrides the environment
variable.
The following options affect the ordering of output lines. They may be specified globally or as part of a specific key field. If no key fields are specified, global options apply to comparison of entire lines; otherwise the global options are inherited by key fields that do not specify any special options of their own.
1.0e-34
and 10e100
. Use this option only if there
is no alternative; it is much slower than `-n' and numbers with
too many significant digits will be compared as if they had been
truncated. In addition, numbers outside the range of representable
double precision floating point numbers are treated as if they were
zeroes; overflow and underflow are not reported.
sort -n
uses what might be considered an unconventional method
to compare strings representing floating point numbers. Rather than
first converting each string to the C double
type and then
comparing those values, sort aligns the decimal points in the two
strings and compares the strings a character at a time. One benefit
of using this approach is its speed. In practice this is much more
efficient than performing the two corresponding string-to-double (or even
string-to-integer) conversions and then comparing doubles. In addition,
there is no corresponding loss of precision. Converting each string to
double
before comparison would limit precision to about 16 digits
on most systems.
Neither a leading `+' nor exponential notation is recognized.
To compare such strings numerically, use the `-g' option.
Other options are:
sort
copies
it to a temporary file before sorting and writing the output to
output-file.
sort
breaks it
into fields ` foo' and ` bar'. The field separator is
not considered to be part of either the field preceding or the field
following.
In addition, when GNU sort
is invoked with exactly one argument,
options `--help' and `--version' are recognized. See section Common options.
Historical (BSD and System V) implementations of sort
have
differed in their interpretation of some options, particularly
`-b', `-f', and `-n'. GNU sort follows the POSIX
behavior, which is usually (but not always!) like the System V behavior.
According to POSIX, `-n' no longer implies `-b'. For
consistency, `-M' has been changed in the same way. This may
affect the meaning of character positions in field specifications in
obscure cases. The only fix is to add an explicit `-b'.
A position in a sort field specified with the `-k' or `+' option has the form `f.c', where f is the number of the field to use and c is the number of the first character from the beginning of the field (for `+pos') or from the end of the previous field (for `-pos'). If the `.c' is omitted, it is taken to be the first character in the field. If the `-b' option was specified, the `.c' part of a field specification is counted from the first nonblank character of the field (for `+pos') or from the first nonblank character following the previous field (for `-pos').
A sort key option may also have any of the option letters `Mbdfinr' appended to it, in which case the global ordering options are not used for that particular field. The `-b' option may be independently attached to either or both of the `+pos' and `-pos' parts of a field specification, and if it is inherited from the global options it will be attached to both. Keys may span multiple fields.
Here are some examples to illustrate various combinations of options. In them, the POSIX `-k' option is used to specify sort keys rather than the obsolete `+pos1-pos2' syntax.
sort -nrSort alphabetically, omitting the first and second fields. This uses a single key composed of the characters beginning at the start of field three and extending to the end of each line.
sort -k3
sort -t : -k 2,2n -k 5.3,5.4Note that if you had written `-k 2' instead of `-k 2,2' `sort' would have used all characters beginning in the second field and extending to the end of the line as the primary numeric key. For the large majority of applications, treating keys spanning more than one field as numeric will not do what you expect. Also note that the `n' modifier was applied to the field-end specifier for the first key. It would have been equivalent to specify `-k 2n,2' or `-k 2n,2n'. All modifiers except `b' apply to the associated field, regardless of whether the modifier character is attached to the field-start and/or the field-end part of the key specifier.
sort -t : -k 5b,5 -k 3,3n /etc/passwdAn alternative is to use the global numeric modifier `-n'.
sort -t : -n -k 5b,5 -k 3,3 /etc/passwd
find src -type f -print0 | sort -t / -z -f | xargs -0 etags --appendThe use of `-print0', `-z', and `-0' in this case mean that pathnames that contain Line Feed characters will not get broken up by the sort operation. Finally, to ignore both leading and trailing white space, you could have applied the `b' modifier to the field-end specifier for the first key,
sort -t : -n -k 5b,5b -k 3,3 /etc/passwdor by using the global `-b' modifier instead of `-n' and an explicit `n' with the second key specifier.
sort -t : -b -k 5,5 -k 3,3n /etc/passwd
uniq
: Uniqify files
uniq
writes the unique lines in the given `input', or
standard input if nothing is given or for an input name of
`-'. Synopsis:
uniq [option]... [input [output]]
By default, uniq
prints the unique lines in a sorted file, i.e.,
discards all but one of identical successive lines. Optionally, it can
instead show only lines that appear exactly once, or lines that appear
more than once.
The input must be sorted. If your input is not sorted, perhaps you want
to use sort -u
.
If no output file is specified, uniq
writes to standard
output.
The program accepts the following options. Also see section Common options.
comm
: Compare two sorted files line by line
comm
writes to standard output lines that are common, and lines
that are unique, to two input files; a file name of `-' means
standard input. Synopsis:
comm [option]... file1 file2
The input files must be sorted before comm
can be used.
With no options, comm
produces three column output. Column one
contains lines unique to file1, column two contains lines unique
to file2, and column three contains lines common to both files.
Columns are separated by TAB.
The options `-1', `-2', and `-3' suppress printing of the corresponding columns. Also see section Common options.
Go to the first, previous, next, last section, table of contents.