MySQL

Regular Expressions REGEXP

Regular expressions are created using the regular expression language, a specialized language designed to:

  • extract phone numbers
  • to locate all files with digits in the start, middle or end of their names
  • to find all repeated words in a block of text
  • to replace all URLs in a page with actual HTML links to those same URLs
  • and much more

Like any language, regular expressions have a special syntax and instructions that you must learn.

So what does this have to do with MySQL? As already explained, all regular expressions do is match text, comparing a pattern (the regular expression) with a string of text. MySQL provides rudimentary support for regular expressions with WHERE clauses, allowing you to specify regular expressions that are used to filter data retrieved using SELECT.

This will all become much clearer with some examples.

Basic Character Matching

We'll start with a very simple example. The following statement retrieves all rows where column prod_name contains the text 1000:

SELECT prod_name
FROM products
WHERE prod_name REGEXP '1000'
ORDER BY prod_name;

The above statement displayed the following output:

+--------------+
| prod_name    |
+--------------+
| product 1000 |
+--------------+

This statement looks much like the ones that used LIKE except that the keyword LIKE has been replaced with REGEXP. This tells MySQL that what follows is to be treated as a regular expression (one that just matches the literal text 1000).

So, why bother using a regular expression? Well, in the example just used, regular expressions really add no value (and probably hurt performance), but consider this next example:

SELECT prod_name
FROM products
WHERE prod_name REGEXP '.000'
ORDER BY prod_name;

The above statement displayed the following output:

+--------------+
| prod_name    |
+--------------+
| product 1000 |
| product 2000 |
+--------------+

Here the regular expression .000 was used. . is a special character in the regular expression language. It means match any single character, and so both 1000 and 2000 matched and were returned.

Of course, this particular example could also have been accomplished using LIKE and wildcards.