[Previous] [Contents] [Next]

Escaping special characters

We've already discussed the need to escape the special meaning of characters used as operators in a regular expression. However, when to escape the meaning depends on how the character is used. Escaping the special meaning of a character is done with the backslash character as with the expression "2\+3, which matches the string "2+3". If the + isn't escaped, the pattern matches one or many occurrences of the character 2 followed by the character 3. Another way to write this expression is to express the + in the list of characters as "2[+]3". Because + doesn't have the same meaning in a list, it doesn't need to be escaped in that context. Using character lists in this way can improve readability. The following examples show how escaping is used and avoided:

// need to escape ( and )
$phone = "(03) 9429 5555";
$found = ereg("^\([0-9]{2,3}\)", $phone); // true
// No need to escape (*.+?)| within parentheses
$special = "Special Characters are (, ), *, +, ?, |";
$found = ereg("[(*.+?)|]", $special); // true
// The back-slash always needs to be quoted to match
$backSlash = 'The backslash \ character';
$found = ereg('^[a-zA-Z \\]*$', $backSlash); //true
// Don't need to escape the dot within parentheses
$domain = "www.ora.com";
$found = ereg("[.]com", $domain); //true

Another complication arises due to the fact that a regular expression is passed as a string to the regular expression functions. Strings in PHP can also use the backslash character to escape quotes and to encode tabs, newlines, etc. Consider the following example, which matches a backslash character:

// single-quoted string containing a backslash
$backSlash = '\ backslash';
// Evaluates to true
$found = ereg("^\\\\ backslash\$", $backSlash);

The regular expression looks quite odd: to match a backslash, the regular expression function needs to escape the meaning of backslash, but because we are using a double-quoted string, each of the two backslashes needs to be escaped. The last complication is that PHP interprets the $ character as the beginning of a variable name, so we need to escape that. Using a single-quoted string can help make regular expressions easier to read and write.

Metacharacters

Metacharacters can also be used in regular expressions. For example, the tab character is represented as \t and the carriage-return character as \n. There are also shortcuts: \d means any digit, and \s means any whitespace. The following example returns true as the tab character, \t, is contained in the $source string:

$source = "fast\tfood";
$result = ereg('\s', $source); // true

[Previous] [Contents] [Next]