[Previous] [Contents] [Next]


Escaping special characters

We've already discussed the need to escape the special meaning of characters used as operators in a regular expression. However, when to escape the meaning depends on how the character is used. Escaping the special meaning of a character is done with the backslash character as with the expression "2\+3, which matches the string "2+3". If the + isn't escaped, the pattern matches one or many occurrences of the character 2 followed by the character 3. Another way to write this expression is to express the + in the list of characters as "2[+]3". Because + doesn't have the same meaning in a list, it doesn't need to be escaped in that context. Using character lists in this way can improve readability. The following examples show how escaping is used and avoided:

// need to escape ( and )
$phone = "(03) 9429 5555";
$found = ereg("^\([0-9]{2,3}\)", $phone); // true

// No need to escape (*.+?)| within parentheses
$special = "Special Characters are (, ), *, +, ?, |";
$found = ereg("[(*.+?)|]", $special); // true

// The back-slash always needs to be quoted to match
$backSlash = 'The backslash \ character';
$found = ereg('^[a-zA-Z \\]*$', $backSlash); //true

// Don't need to escape the dot within parentheses
$domain = "www.ora.com";
$found = ereg("[.]com", $domain); //true

Another complication arises due to the fact that a regular expression is passed as a string to the regular expression functions. Strings in PHP can also use the backslash character to escape quotes and to encode tabs, newlines, etc. Consider the following example, which matches a backslash character:

// single-quoted string containing a backslash
$backSlash = '\ backslash';

// Evaluates to true
$found = ereg("^\\\\ backslash\$", $backSlash);

The regular expression looks quite odd: to match a backslash, the regular expression function needs to escape the meaning of backslash, but because we are using a double-quoted string, each of the two backslashes needs to be escaped. The last complication is that PHP interprets the $ character as the beginning of a variable name, so we need to escape that. Using a single-quoted string can help make regular expressions easier to read and write.


[Previous] [Contents] [Next]