[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.3.8.2 Non-ASCII Characters in Strings

You can include a non-ASCII international character in a string constant by writing it literally. There are two text representations for non-ASCII characters in Emacs strings (and in buffers): unibyte and multibyte. If the string constant is read from a multibyte source, such as a multibyte buffer or string, or a file that would be visited as multibyte, then the character is read as a multibyte character, and that makes the string multibyte. If the string constant is read from a unibyte source, then the character is read as unibyte and that makes the string unibyte.

You can also represent a multibyte non-ASCII character with its character code: use a hex escape, `\xnnnnnnn', with as many digits as necessary. (Multibyte non-ASCII character codes are all greater than 256.) Any character which is not a valid hex digit terminates this construct. If the next character in the string could be interpreted as a hex digit, write `\ ' (backslash and space) to terminate the hex escape--for example, `\x8e0\ ' represents one character, `a' with grave accent. `\ ' in a string constant is just like backslash-newline; it does not contribute any character to the string, but it does terminate the preceding hex escape.

Using a multibyte hex escape forces the string to multibyte. You can represent a unibyte non-ASCII character with its character code, which must be in the range from 128 (0200 octal) to 255 (0377 octal). This forces a unibyte string. See section 33.1 Text Representations, for more information about the two text representations.



This document was generated on May 2, 2002 using texi2html