[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
For the most part, `\' followed by any character matches only that character. However, there are several exceptions: certain two-character sequences starting with `\' that have special meanings. (The character after the `\' in such a sequence is always ordinary when used on its own.) Here is a table of the special `\' constructs.
Thus, `foo\|bar' matches either `foo' or `bar' but no other string.
`\|' applies to the largest possible surrounding expressions. Only a surrounding `\( ... \)' grouping can limit the grouping power of `\|'.
Full backtracking capability exists to handle multiple uses of `\|', if you use the POSIX regular expression functions (see section 34.4 POSIX Regular Expression Searching).
For example, `c[ad]\{1,2\}r' matches the strings `car',
`cdr', `caar', `cadr', `cdar', and `cddr', and
nothing else.
`\{0,1\}' or `\{,1\}' is equivalent to `?'.
`\{0,\}' or `\{,\}' is equivalent to `*'.
`\{1,\}' is equivalent to `+'.
This last application is not a consequence of the idea of a parenthetical grouping; it is a separate feature that was assigned as a second meaning to the same `\( ... \)' construct because, in pratice, there was usually no conflict between the two meanings. But occasionally there is a conflict, and that led to the introduction of shy groups.
Shy groups are particulary useful for mechanically-constructed regular expressions because they can be added automatically without altering the numbering of any ordinary, non-shy groups.
In other words, after the end of a group, the matcher remembers the beginning and end of the text matched by that group. Later on in the regular expression you can use `\' followed by digit to match that same text, whatever it may have been.
The strings matching the first nine grouping constructs appearing in the entire regular expression passed to a search or matching function are assigned numbers 1 through 9 in the order that the open parentheses appear in the regular expression. So you can use `\1' through `\9' to refer to the text matched by the corresponding grouping constructs.
For example, `\(.*\)\1' matches any newline-free string that is composed of two identical halves. The `\(.*\)' matches the first half, which may be anything, but the `\1' that follows must match the same exact text.
If a particular grouping construct in the regular expression was never matched--for instance, if it appears inside of an alternative that wasn't used, or inside of a repetition that repeated zero times--then the corresponding `\digit' construct never matches anything. To use an artificial example,, `\(foo\(b*\)\|lose\)\2' cannot match `lose': the second alternative inside the larger group matches it, but then `\2' is undefined and can't match anything. But it can match `foobb', because the first alternative matches `foob' and `\2' matches `b'.
The following regular expression constructs match the empty string--that is, they don't use up any characters--but whether they match depends on the context.
`\b' matches at the beginning or end of the buffer regardless of what text appears next to it.
Not every string is a valid regular expression. For example, a string
with unbalanced square brackets is invalid (with a few exceptions, such
as `[]]'), and so is a string that ends with a single `\'. If
an invalid regular expression is passed to any of the search functions,
an invalid-regexp
error is signaled.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |