Escaping Strings for HTML
<?php $input = '<script>alert("I have a bad Föhnwelle, ' . 'therefore I crack websites.");</script>'; echo htmlspecialchars($input) . '<br />'; echo htmlentities($input); ?>
<script> tag, but also in other HTML elements, such as
<img onabort="badCode()" />. Therefore, in most cases, all HTML must be removed.
The easiest way to do so is to call
htmlspecialchars(); this converts the string into HTML, including replacement of all
> characters by
>. Another option is to call
htmlentities(). This uses HTML entities for characters, if available. The preceding code shows the differences between these two methods. The German
ö (o umlaut) is not converted by
htmlentities() replaces it by its entity
The use of
htmlentities() just outputs what the user entered in the browser. So if the user entered HTML markup, this very markup is shown. So
htmlentities() please the browser, but might not please the user.
If you, however, want to prepare strings to be used within URLs, you have to use
urlencode() to properly encode special characters such as the space character that can be used in URLs.
However, the function
strip_tags() does completely get rid of all HTML elements. If you just want to keep some elements (for example, some limited formatting functionalities with
<br /> tags), you provide a list of allowed values in the second parameter for
strip_tags(). The following script shows this; figure depicts its output. As you can see, all unwanted HTML tags have been removed; however, its contents are still there.
Removing All HTML Tags
<?php $input = 'My parents <i>hate</i> me, <br />' . 'therefore I <b>crack</b> websites. ' . '<script>alert("Nice try!");</script>' . '<img src="explicit.jpg" />'; echo strip_tags($input, '<b><br><i>'); ?>