PHP

Validating email addresses

Email addresses are another common data entry item that requires field organization checking. There is a standard maintained by the Internet Engineering Task Force (IETF) called RFC-2822 that defines what a valid email address can be, and it's much more complex than might be expected. For example, an address such as the following is valid:

" <test> "@webdatabasebook.com

We use the following complex regular expression and network functions to validate an email address:

$validEmailExpr =
    "^[0-9a-z~!#$%&_-]([.]?[0-9a-z~!#$%&_-])*" .
    "@[0-9a-z~!#$%&_-]([.]?[0-9a-z~!#$%&_-])*$";
if (empty($formVars["email"]))
    // the user's email cannot be a null string
    $errorString .= "You must supply an email address.";
elseif (!eregi($validEmailExpr, $formVars["email"]))
    // The email must match the above regular expression
    $errorString .=
    "The email address must be in the name@domain format.";
elseif (strlen($formVars["email"]) > 50)
    // The length cannot exceed 50 characters
    $errorString . =
  "The email address can be no longer than 50 characters.";
elseif (!(getmxrr(substr(strstr($formVars["email"], '@'), 1), $temp)) ||
  checkdnsrr(gethostbyname(substr(strstr($formVars["email"], '@'), 1)),"ANY"))
    // There must be a Domain Name Server (DNS) record
    // for the domain name
    $errorString .= "The domain does not exist.";

If any email test fails, an error string is appended to the $errorString, and no further checks of the email value are made. A valid email passes all tests.

The first check tests to make sure that an email address has been entered. If it's omitted, an error is generated. It then uses a regular expression to check if the email address matches a template. It isn't RFC-2822-compliant but works reasonably for most email addresses:

  • It uses eregi( ), so either upper- or lowercase are matched by the use of a-z.

  • It expects the string to begin with a character from the set 0-9, a-z, and ~!#$%&_-. There has to be at least one character from this set at the beginning of the email address for it to be valid.

  • After the first character matches, there is an optional bracketed expression:

    ([.]?[0-9a-z~!#$%&_-])*
    

    This expression is optional since it's suffixed with the * operator. However, if it does match, it matches any number of the characters specified. There can only be one consecutive full-stop if a full-stop occurs, as determined by the expression [.]?. The expression, for example, matches the string fred.williams.test% but not fred..williams.

  • After the initial part of the email address, an @ character is expected. The @ has to occur after the first word for the string to be valid; our regular expression rejects email addresses that have only the initial or local component such as fred.

  • Our validation expects there to be another word of at least length 1 after the @ symbol, and this can be followed by any combination of the permitted characters. Strings of permitted characters can be separated by a single full-stop.

  • The function is imperfect. It allows several illegal email addresses and doesn't allow many that are legal but unusual.

The third step is to check the length of the email address. If it exceeds 50 characters, an error is generated. The fourth and final step is to check whether the domain of the email address actually exists:

elseif (!(getmxrr(substr(strstr($formVars["email"], '@'), 1), $temp)) ||
  checkdnsrr(gethostbyname(substr(strstr($formVars["email"], '@'), 1)),"ANY"))
    // There must be a Domain Name Server (DNS) record
    // for the domain name
    $errorString .= "The domain does not exist.";

The function getmxrr( ) queries an Internet domain name server (DNS) to check if there is a record of the email domain as a mail exchanger (MX). If the domain isn't an MX, the domain is checked with the DNS using the checkdnsrr( ) function, after converting the domain name to a numeric IP address with the gethostbyname( ) function. The second parameter to checkdnsrr( ) is the type of records to check, and ANY record is specified valid. If both tests fail, the domain of the email address isn't valid and we reject the email address.

by BrainBellupdated
Advertisement: