iconv
Implementations
This is not really the place to discuss the iconv
implementation
of other systems but it is necessary to know a bit about them to write
portable programs. The above mentioned problems with the specification
of the iconv
functions can lead to portability issues.
The first thing to notice is that due to the large number of character sets in use it is certainly not practical to encode the conversions directly in the C library. Therefore the conversion information must come from files outside the C library. This is usually done in one or both of the following ways:
Some implementations in commercial Unices implement a mixture of these these possibilities, the majority only the second solution. Using loadable modules moves the code out of the library itself and keeps the door open for extensions and improvements. But this design is also limiting on some platforms since not many platforms support dynamic loading in statically linked programs. On platforms without his capability it is therefore not possible to use this interface in statically linked programs. The GNU C library has on ELF platforms no problems with dynamic loading in in these situations and therefore this point is moot. The danger is that one gets acquainted with this and forgets about the restrictions on other systems.
A second thing to know about other iconv
implementations is that
the number of available conversions is often very limited. Some
implementations provide in the standard release (not special
international or developer releases) at most 100 to 200 conversion
possibilities. This does not mean 200 different character sets are
supported. E.g., conversions from one character set to a set of, say,
10 others counts as 10 conversion. Together with the other direction
this makes already 20. One can imagine the thin coverage these platform
provide. Some Unix vendors even provide only a handful of conversions
which renders them useless for almost all uses.
This directly leads to a third and probably the most problematic point.
The way the iconv
conversion functions are implemented on all
known Unix system and the availability of the conversion functions from
character set @math{@cal{A}} to @math{@cal{B}} and the conversion from
@math{@cal{B}} to @math{@cal{C}} does not imply that the
conversion from @math{@cal{A}} to @math{@cal{C}} is available.
This might not seem unreasonable and problematic at first but it is a quite big problem as one will notice shortly after hitting it. To show the problem we assume to write a program which has to convert from @math{@cal{A}} to @math{@cal{C}}. A call like
cd = iconv_open ("@math{@cal{C}}", "@math{@cal{A}}");
does fail according to the assumption above. But what does the program do now? The conversion is really necessary and therefore simply giving up is no possibility.
This is a nuisance. The iconv
function should take care of this.
But how should the program proceed from here on? If it would try to
convert to character set @math{@cal{B}} first the two iconv_open
calls
cd1 = iconv_open ("@math{@cal{B}}", "@math{@cal{A}}");
and
cd2 = iconv_open ("@math{@cal{C}}", "@math{@cal{B}}");
will succeed but how to find @math{@cal{B}}?
Unfortunately, the answer is: there is no general solution. On some systems guessing might help. On those systems most character sets can convert to and from UTF-8 encoded ISO 10646 or Unicode text. Beside this only some very system-specific methods can help. Since the conversion functions come from loadable modules and these modules must be stored somewhere in the filesystem, one could try to find them and determine from the available file which conversions are available and whether there is an indirect route from @math{@cal{A}} to @math{@cal{C}}.
This shows one of the design errors of iconv
mentioned above. It
should at least be possible to determine the list of available
conversion programmatically so that if iconv_open
says there is
no such conversion, one could make sure this also is true for indirect
routes.
Go to the first, previous, next, last section, table of contents.