[Previous] [Contents] [Next]

URL encoding

The characters used in resource names, query strings, and parameters must not conflict with the characters that have special meanings or can't allowed in a URL. For example, a question mark character identifies the beginning of a query, and an ampersand (&) character separates multiple terms in a query. The meanings of these characters can be escaped using a hexadecimal encoding consisting of the percent character (%) followed by the two hexadecimal digits representing the ASCII encoded of the character. For example, an ampersand (&) character is encoded as %26.

The characters that need to be escape-encoded are the control, space, and reserved characters:

; / ? : @ & = + $ ,

Delimiter characters must also be encoded:

< > # % "

The following characters can cause problems with gateways and network agents, and should also be encoded:

{} | \ ^ [ ] `

rawurlencode function

rawurlencode returns URL-encode according to RFC 3986.

PHP provides the rawurlencode( ) function to protect them. For example, rawurlencode( ) can build the href attribute of an embedded link:

echo '<a href="search.php?q=' .
      rawurlencode("100% + more") .
      '">';

The result is an <a> element with an embedded URL correctly encoded:

<a href="search.php?q=100%25%20%2B%20more">

PHP also provides the urlencode( ) function that differs from the rawurlencode( ) function in that the former encodes spaces as a + sign whereas the latter encodes spaces as %20. The use of the + character to encode a space was an early HTTP way to encode spaces.

[Previous] [Contents] [Next]