The linker accesses object and archive files using the BFD libraries.
These libraries allow the linker to use the same routines to operate on
object files whatever the object file format. A different object file
format can be supported simply by creating a new BFD back end and adding
it to the library. To conserve runtime memory, however, the linker and
associated tools are usually configured to support only a subset of the
object file formats available. You can use objdump -i
(see section `objdump' in The GNU Binary Utilities) to
list all the formats available for your configuration.
As with most implementations, BFD is a compromise between several conflicting requirements. The major factor influencing BFD design was efficiency: any time used converting between formats is time which would not have been spent had BFD not been involved. This is partly offset by abstraction payback; since BFD simplifies applications and back ends, more time and care may be spent optimizing algorithms for a greater speed.
One minor artifact of the BFD solution which you should bear in mind is the potential for information loss. There are two places where useful information can be lost using the BFD mechanism: during conversion and during output. See section Information Loss.
When an object file is opened, BFD subroutines automatically determine the format of the input object file. They then build a descriptor in memory with pointers to routines that will be used to access elements of the object file's data structures.
As different information from the the object files is required, BFD reads from different sections of the file and processes them. For example, a very common operation for the linker is processing symbol tables. Each BFD back end provides a routine for converting between the object file's representation of symbols and an internal canonical format. When the linker asks for the symbol table of an object file, it calls through a memory pointer to the routine from the relevant BFD back end which reads and converts the table into a canonical form. The linker then operates upon the canonical form. When the link is finished and the linker writes the output file's symbol table, another BFD back end routine is called to take the newly created symbol table and convert it into the chosen output format.
Information can be lost during output. The output formats
supported by BFD do not provide identical facilities, and
information which can be described in one form has nowhere to go in
another format. One example of this is alignment information in
b.out
. There is nowhere in an a.out
format file to store
alignment information on the contained data, so when a file is linked
from b.out
and an a.out
image is produced, alignment
information will not propagate to the output file. (The linker will
still use the alignment information internally, so the link is performed
correctly).
Another example is COFF section names. COFF files may contain an
unlimited number of sections, each one with a textual section name. If
the target of the link is a format which does not have many sections (e.g.,
a.out
) or has sections without names (e.g., the Oasys format), the
link cannot be done simply. You can circumvent this problem by
describing the desired input-to-output section mapping with the linker command
language.
Information can be lost during canonicalization. The BFD internal canonical form of the external formats is not exhaustive; there are structures in input formats for which there is no direct representation internally. This means that the BFD back ends cannot maintain all possible data richness through the transformation between external to internal and back to external formats.
This limitation is only a problem when an application reads one
format and writes another. Each BFD back end is responsible for
maintaining as much data as possible, and the internal BFD
canonical form has structures which are opaque to the BFD core,
and exported only to the back ends. When a file is read in one format,
the canonical form is generated for BFD and the application. At the
same time, the back end saves away any information which may otherwise
be lost. If the data is then written back in the same format, the back
end routine will be able to use the canonical form provided by the
BFD core as well as the information it prepared earlier. Since
there is a great deal of commonality between back ends,
there is no information lost when
linking or copying big endian COFF to little endian COFF, or a.out
to
b.out
. When a mixture of formats is linked, the information is
only lost from the files whose format differs from the destination.
The greatest potential for loss of information occurs when there is the least overlap between the information provided by the source format, that stored by the canonical format, and that needed by the destination format. A brief description of the canonical form may help you understand which kinds of data you can count on preserving across conversions.
ZMAGIC
file would have both the demand pageable bit and the write protected
text bit set. The byte order of the target is stored on a per-file
basis, so that big- and little-endian object files may be used with one
another.
ld
can
operate on a collection of symbols of wildly different formats without
problems.
Normal global and simple local symbols are maintained on output, so an
output file (no matter its format) will retain symbols pointing to
functions and to global, static, and common variables. Some symbol
information is not worth retaining; in a.out
, type information is
stored in the symbol table as long symbol names. This information would
be useless to most COFF debuggers; the linker has command line switches
to allow users to throw it away.
There is one word of type information within the symbol, so if the
format supports symbol type information within symbols (for example, COFF,
IEEE, Oasys) and the type is simple enough to fit within one word
(nearly everything but aggregates), the information will be preserved.
Go to the first, previous, next, last section, table of contents.