Part of the EllisLab Network
   
2 of 2
2
xss_clean adds semicolon to anything with an &
Posted: 05 September 2008 10:41 AM   [ Ignore ]   [ # 16 ]  
Administrator
Avatar
RankRankRankRankRank
Total Posts:  3103
Joined  01-07-2008

If you plan on allowing absolutely no tags, htmlentities will do the trick [EDIT: .. in most cases].

Believe it or not, almost all browsers out there today are “seriously flawed”.  They make a lot of assumptions to render sloppy code.
So once you start allowing some tags (or modify to create tags - i.e. bbcode), you run into the lax browser rendering engines.  Under the right conditions, browsers will accept entities without semicolons.  Also, many accept the javascript directive in places where javascript simply does not belong.

A positive security policy is always the best approach (specify what is allowed).  Always strip as much as you can and validate where possible.  That means that a url field should be checked to contain a url. (! and if someone tells you it doesn’t, for god’s sake fix it !)

The xss_clean function primarily uses a very strict negative security policy (recognize script elements and other attack vectors and remove them).  In some cases, there is a little bit of whitelisting, such as validating image and anchor tags.  Since browser rendering is so weak and unstandardized, and blacklisting is such a tough thing to do, the only way for any of this to work effectively is to be very broad.  That does create some false positive we just have to live with.

As for those test cases, the SVN version allows one character behind the &, because there is no entity that looks like that.  I would agree though, that the whitespace character should be reinserted after the tag (at least \09, \10, and \13 - the other invisibles are stripped anyways).

 Signature 
Profile
MSG
 
 
Posted: 27 May 2010 05:30 PM   [ Ignore ]   [ # 17 ]  
Summer Student
Total Posts:  4
Joined  05-24-2010

Don’t mean to dig this one up, but I’m still having a problem with this:

As per the changelog, this was fixed after the last post in this thread (1.7 was released 10/08, the last post here was from 09/08):

Modified XSS sanitization to no longer add semicolons after &[single letter], such as in M&M’s, B&B, etc.

However, I have global_xss_filtering ON, and I am now using 1.7.2, and I can verify that this problem does (may) still exist. Perhaps what I’m looking at is slightly different. My (POST’d) string is as follows:

item[]=22&item;[]=18&item;[]=19&item;[]=20&item;[]=21 

However, when accessing it, I get:

item[]=22&item;[]=18&item;[]=19&item;[]=20&item;[]=21 

Unless, of course, this is correct behavior. If it is, could someone explain why?

Profile
 
 
Posted: 17 November 2011 07:06 PM   [ Ignore ]   [ # 18 ]  
Summer Student
Total Posts:  2
Joined  06-02-2011

so here’s what I did in MY_Security.php:

//$str = preg_replace('#(&\#?[0-9a-z]{2,})([\x00-\x20])*;?#i', "\\1;\\2", $str);
  
$matched preg_match_all('#(&\#?[0-9a-z]{2,})([\x00-\x20])*;?#i'$str$matchesPREG_OFFSET_CAPTURE);
  if (
$matched 0)
  
{
   
foreach($matches[0] as $match)
   
{
    $test_str 
strtolower($match[0].';');
    foreach (
get_html_translation_table(HTML_ENTITIES) as $entity)
    
{
     
if ($test_str == strtolower($entity))
      
$str substr_replace($str$entity$match[1]strlen($match[0]));
    
}
   }
  } 

This will only add an ampersand if the result will be an html entity. Otherwise it leaves it alone. I haven’t done anything to optimize this, and I suspect there are efficiencies to be found, but at least in most cases this will not add bogus ampersands.

Profile
 
 
   
2 of 2
2