[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Emacs has two text representations---two ways to represent text in a string or buffer. These are called unibyte and multibyte. Each string, and each buffer, uses one of these two representations. For most purposes, you can ignore the issue of representations, because Emacs converts text between them as appropriate. Occasionally in Lisp programming you will need to pay attention to the difference.
In unibyte representation, each character occupies one byte and
therefore the possible character codes range from 0 to 255. Codes 0
through 127 are ASCII characters; the codes from 128 through 255
are used for one non-ASCII character set (you can choose which
character set by setting the variable nonascii-insert-offset
).
In multibyte representation, a character may occupy more than one byte, and as a result, the full range of Emacs character codes can be stored. The first byte of a multibyte character is always in the range 128 through 159 (octal 0200 through 0237). These values are called leading codes. The second and subsequent bytes of a multibyte character are always in the range 160 through 255 (octal 0240 through 0377); these values are trailing codes.
Some sequences of bytes are not valid in multibyte text: for example, a single isolated byte in the range 128 through 159 is not allowed. But character codes 128 through 159 can appear in multibyte text, represented as two-byte sequences. All the character codes 128 through 255 are possible (though slightly abnormal) in multibyte text; they appear in multibyte buffers and strings when you do explicit encoding and decoding (see section 33.10.7 Explicit Encoding and Decoding).
In a buffer, the buffer-local value of the variable
enable-multibyte-characters
specifies the representation used.
The representation for a string is determined and recorded in the string
when the string is constructed.
nil
, the buffer contains multibyte text; otherwise,
it contains unibyte text.
You cannot set this variable directly; instead, use the function
set-buffer-multibyte
to change a buffer's representation.
(default-value
'enable-multibyte-characters)
, and setting this variable changes that
default value. Setting the local binding of
enable-multibyte-characters
in a specific buffer is not allowed,
but changing the default value is supported, and it is a reasonable
thing to do, because it has no effect on existing buffers.
The `--unibyte' command line option does its job by setting the
default value to nil
early in startup.
t
if string is a multibyte string.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |