What are the best safe1(), safe2(), safe3(), and safe4() functions to avoid XSS for UTF8 encoded pages? Is it also safe in all browsers (specifically IE6)?
Many people say the absolute best that can be done is:
// safe1 & safe2 $s = htmlentities($s, ENT_QUOTES, "UTF-8"); // But how would you compare the above to: // https://github.com/shadowhand/purifier // OR http://kohanaframework.org/3.0/guide/api/Security#xss_clean // OR is there an even better if not perfect solution?
// safe3 $s = mb_convert_encoding($s, "UTF-8", "UTF-8"); $s = htmlentities($s, ENT_QUOTES, "UTF-8"); // How would you compare this to using using mysql_real_escape_string($s)? // (Yes, I know this is a DB function) // Some other people also recommend calling json_encode() before passing to htmlentities // What's the best solution?
There are a hell of a lot of posts about PHP and XSS. Most just say "use HTMLPurifier" or "use htmlspecialchars", or are wrong. Others say use OWASP -- but it is EXTREMELY slow. Some of the good posts I came across are listed below:
safe2() is clearly
In place of
safe1() you should really be using
htmlspecialchars(strip_tags($src)) would actually work fine.
safe3() screams regular expression. Here you can really only apply a whitelist to whatever you actually want:
var a = "<?php echo preg_replace('/[^-\w\d .,]/', "", $xss)?>";
You can of course use
json_encode here to get a perfectly valid JS syntax and variable. But then you've just delayed the exploitability of that string into your JS code, where you then have to babysit it.
Is it also safe in all browsers (specifically IE6)?
If you specify the charset explicitly, then IE won't do its awful content detection magic, so UTF7 exploits can be ignored.
The content is written by members of the stackoverflow.com community.
It is licensed under cc-wiki