GNU Emacs Lisp Reference Manual: String Basics

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

4.1 String and Character Basics

Characters are represented in Emacs Lisp as integers; whether an integer is a character or not is determined only by how it is used. Thus, strings really contain integers.

The length of a string (like any array) is fixed, and cannot be altered once the string exists. Strings in Lisp are not terminated by a distinguished character code. (By contrast, strings in C are terminated by a character with ASCII code 0.)

Since strings are arrays, and therefore sequences as well, you can operate on them with the general array and sequence functions. (See section 6. Sequences, Arrays, and Vectors.) For example, you can access or change individual characters in a string using the functions aref and aset (see section 6.3 Functions that Operate on Arrays).

There are two text representations for non-ASCII characters in Emacs strings (and in buffers): unibyte and multibyte (see section 33.1 Text Representations). An ASCII character always occupies one byte in a string; in fact, when a string is all ASCII, there is no real difference between the unibyte and multibyte representations. For most Lisp programming, you don't need to be concerned with these two representations.

Sometimes key sequences are represented as strings. When a string is a key sequence, string elements in the range 128 to 255 represent meta characters (which are large integers) rather than character codes in the range 128 to 255.

Strings cannot hold characters that have the hyper, super or alt modifiers; they can hold ASCII control characters, but no other control characters. They do not distinguish case in ASCII control characters. If you want to store such characters in a sequence, such as a key sequence, you must use a vector instead of a string. See section 2.3.3 Character Type, for more information about the representation of meta and other modifiers for keyboard input characters.

Strings are useful for holding regular expressions. You can also match regular expressions against strings (see section 34.3 Regular Expression Searching). The functions match-string (see section 34.6.2 Simple Match Data Access) and replace-match (see section 34.6.1 Replacing the Text that Matched) are useful for decomposing and modifying strings based on regular expression matching.

Like a buffer, a string can contain text properties for the characters in it, as well as the characters themselves. See section 32.19 Text Properties. All the Lisp primitives that copy text from strings to buffers or other strings also copy the properties of the characters being copied.

See section 32. Text, for information about functions that display strings or copy them into buffers. See section 2.3.3 Character Type, and 2.3.8 String Type, for information about the syntax of characters and strings. See section 33. Non-ASCII Characters, for functions to convert between text representations and to encode and decode character codes.