This option causes all files to be put in the archive to be tested for
sparseness, and handled specially if they are. The --sparse (-S)
option is useful when many dbm
files, for example, are being
backed up. Using this option dramatically decreases the amount of
space needed to store such a file.
In later versions, this option may be removed, and the testing and treatment of sparse files may be done automatically with any special GNU options. For now, it is an option needing to be specified on the command line with the creation or updating of an archive.
Files in the filesystem occasionally have "holes." A hole in a file
is a section of the file's contents which was never written. The
contents of a hole read as all zeros. On many operating systems,
actual disk storage is not allocated for holes, but they are counted
in the length of the file. If you archive such a file, tar
could create an archive longer than the original. To have tar
attempt to recognize the holes in a file, use --sparse (-S). When
you use the --sparse (-S) option, then, for any file using less
disk space than would be expected from its length, tar
searches
the file for consecutive stretches of zeros. It then records in the
archive for the file where the consecutive stretches of zeros are, and
only archives the "real contents" of the file. On extraction (using
--sparse (-S) is not needed on extraction) any such files have
hols created wherever the continuous stretches of zeros were found.
Thus, if you use --sparse (-S), tar
archives won't take
more space than the original.
A file is sparse if it contains blocks of zeros whose existence is
recorded, but that have no space allocated on disk. When you specify
the --sparse (-S) option in conjunction with the --create (-c)
operation, tar
tests all files for sparseness while archiving.
If tar
finds a file to be sparse, it uses a sparse representation of
the file in the archive. See section How to Create Archives, for more information
about creating archives.
--sparse (-S) is useful when archiving files, such as dbm files, likely to contain many nulls. This option dramatically decreases the amount of space needed to store such an archive.
Please Note: Always use --sparse (-S) when performing file system backups, to avoid archiving the expanded forms of files stored sparsely in the system.
Even if your system has no sparse files currently, some may be created in the future. If you use --sparse (-S) while making file system backups as a matter of course, you can be assured the archive will never take more space on the media than the files take on disk (otherwise, archiving a disk filled with sparse files might take hundreds of tapes). @FIXME-xref{incremental when node name is set.}
tar
ignores the --sparse (-S) option when reading an archive.
However, users should be well aware that at archive creation time, GNU
tar
still has to read whole disk file to locate the holes, and
so, even if sparse files use little space on disk and in the archive, they
may sometimes require inordinate amount of time for reading and examining
all-zero blocks of a file. Although it works, it's painfully slow for a
large (sparse) file, even though the resulting tar archive may be small.
(One user reports that dumping a `core' file of over 400 megabytes,
but with only about 3 megabytes of actual data, took about 9 minutes on
a Sun Sparstation ELC, with full CPU utilisation.)
This reading is required in all cases and is not related to the fact the --sparse (-S) option is used or not, so by merely not using the option, you are not saving time(6).
Programs like dump
do not have to read the entire file; by examining
the file system directly, they can determine in advance exactly where the
holes are and thus avoid reading through them. The only data it need read
are the actual allocated data blocks. GNU tar
uses a more portable
and straightforward archiving approach, it would be fairly difficult that
it does otherwise. Elizabeth Zwicky writes to `comp.unix.internals',
on 1990-12-10:
What I did say is that you cannot tell the difference between a hole and an equivalent number of nulls without reading raw blocks.
st_blocks
at best tells you how many holes there are; it doesn't tell you where. Just as programs may, conceivably, care whatst_blocks
is (care to name one that does?), they may also care where the holes are (I have no examples of this one either, but it's equally imaginable).I conclude from this that good archivers are not portable. One can arguably conclude that if you want a portable program, you can in good conscience restore files with as many holes as possible, since you can't get it right.
Go to the first, previous, next, last section, table of contents.