The functions described in the previous section only convert a single character at a time. Most operations to be performed in real-world programs include strings and therefore the ISO C standard also defines conversions on entire strings. However, the defined set of functions is quite limited, thus the GNU C library contains a few extensions which can help in some important situations.
mbsrtowcs
function ("multibyte string restartable to wide
character string") converts an NUL terminated multibyte character
string at *src
into an equivalent wide character string,
including the NUL wide character at the end. The conversion is started
using the state information from the object pointed to by ps or
from an internal object of mbsrtowcs
if ps is a null
pointer. Before returning the state object to match the state after the
last converted character. The state is the initial state if the
terminating NUL byte is reached and converted.
If dst is not a null pointer the result is stored in the array pointed to by dst, otherwise the conversion result is not available since it is stored in an internal buffer.
If len wide characters are stored in the array dst before reaching the end of the input string the conversion stops and len is returned. If dst is a null pointer len is never checked.
Another reason for a premature return from the function call is if the
input string contains an invalid multibyte sequence. In this case the
global variable errno
is set to EILSEQ
and the function
returns (size_t) -1
.
In all other cases the function returns the number of wide characters
converted during this call. If dst is not null mbsrtowcs
stores in the pointer pointed to by src a null pointer (if the NUL
byte in the input string was reached) or the address of the byte
following the last converted multibyte character.
This function was introduced in Amendment 1 to ISO C90 and is declared in `wchar.h'.
The definition of this function has one limitation which has to be
understood. The requirement that dst has to be a NUL terminated
string provides problems if one wants to convert buffers with text. A
buffer is normally no collection of NUL terminated strings but instead a
continuous collection of lines, separated by newline characters. Now
assume a function to convert one line from a buffer is needed. Since
the line is not NUL terminated the source pointer cannot directly point
into the unmodified text buffer. This means, either one inserts the NUL
byte at the appropriate place for the time of the mbsrtowcs
function call (which is not doable for a read-only buffer or in a
multi-threaded application) or one copies the line in an extra buffer
where it can be terminated by a NUL byte. Note that it is not in
general possible to limit the number of characters to convert by setting
the parameter len to any specific value. Since it is not known
how many bytes each multibyte character sequence is in length one always
could do only a guess.
There is still a problem with the method of NUL-terminating a line right after the newline character which could lead to very strange results. As said in the description of the mbsrtowcs function above the conversion state is guaranteed to be in the initial shift state after processing the NUL byte at the end of the input string. But this NUL byte is not really part of the text. I.e., the conversion state after the newline in the original text could be something different than the initial shift state and therefore the first character of the next line is encoded using this state. But the state in question is never accessible to the user since the conversion stops after the NUL byte (which resets the state). Most stateful character sets in use today require that the shift state after a newline is the initial state--but this is not a strict guarantee. Therefore simply NUL terminating a piece of a running text is not always an adequate solution and therefore never should be used in generally used code.
The generic conversion interface (see section Generic Charset Conversion)
does not have this limitation (it simply works on buffers, not
strings), and the GNU C library contains a set of functions which take
additional parameters specifying the maximal number of bytes which are
consumed from the input string. This way the problem of
mbsrtowcs
's example above could be solved by determining the line
length and passing this length to the function.
wcsrtombs
function ("wide character string restartable to
multibyte string") converts the NUL terminated wide character string at
*src
into an equivalent multibyte character string and
stores the result in the array pointed to by dst. The NUL wide
character is also converted. The conversion starts in the state
described in the object pointed to by ps or by a state object
locally to wcsrtombs
in case ps is a null pointer. If
dst is a null pointer the conversion is performed as usual but the
result is not available. If all characters of the input string were
successfully converted and if dst is not a null pointer the
pointer pointed to by src gets assigned a null pointer.
If one of the wide characters in the input string has no valid multibyte
character equivalent the conversion stops early, sets the global
variable errno
to EILSEQ
, and returns (size_t) -1
.
Another reason for a premature stop is if dst is not a null pointer and the next converted character would require more than len bytes in total to the array dst. In this case (and if dest is not a null pointer) the pointer pointed to by src is assigned a value pointing to the wide character right after the last one successfully converted.
Except in the case of an encoding error the return value of the function is the number of bytes in all the multibyte character sequences stored in dst. Before returning the state in the object pointed to by ps (or the internal object in case ps is a null pointer) is updated to reflect the state after the last conversion. The state is the initial shift state in case the terminating NUL wide character was converted.
This function was introduced in Amendment 1 to ISO C90 and is declared in `wchar.h'.
The restriction mentions above for the mbsrtowcs
function applies
also here. There is no possibility to directly control the number of
input characters. One has to place the NUL wide character at the
correct place or control the consumed input indirectly via the available
output array size (the len parameter).
mbsnrtowcs
function is very similar to the mbsrtowcs
function. All the parameters are the same except for nmc which is
new. The return value is the same as for mbsrtowcs
.
This new parameter specifies how many bytes at most can be used from the
multibyte character string. I.e., the multibyte character string
*src
need not be NUL terminated. But if a NUL byte is
found within the nmc first bytes of the string the conversion
stops here.
This function is a GNU extensions. It is meant to work around the problems mentioned above. Now it is possible to convert buffer with multibyte character text piece for piece without having to care about inserting NUL bytes and the effect of NUL bytes on the conversion state.
A function to convert a multibyte string into a wide character string and display it could be written like this (this is not a really useful example):
void showmbs (const char *src, FILE *fp) { mbstate_t state; int cnt = 0; memset (&state, '\0', sizeof (state)); while (1) { wchar_t linebuf[100]; const char *endp = strchr (src, '\n'); size_t n; /* Exit if there is no more line. */ if (endp == NULL) break; n = mbsnrtowcs (linebuf, &src, endp - src, 99, &state); linebuf[n] = L'\0'; fprintf (fp, "line %d: \"%S\"\n", linebuf); } }
There is no problem with the state after a call to mbsnrtowcs
.
Since we don't insert characters in the strings which were not in there
right from the beginning and we use state only for the conversion
of the given buffer there is no problem with altering the state.
wcsnrtombs
function implements the conversion from wide
character strings to multibyte character strings. It is similar to
wcsrtombs
but it takes, just like mbsnrtowcs
, an extra
parameter which specifies the length of the input string.
No more than nwc wide characters from the input string
*src
are converted. If the input string contains a NUL
wide character in the first nwc character to conversion stops at
this place.
This function is a GNU extension and just like mbsnrtowcs
is
helps in situations where no NUL terminated input strings are available.
Go to the first, previous, next, last section, table of contents.