Part of the EllisLab Network
   
1 of 2
1
xss_clean adds semicolon to anything with an &
Posted: 03 July 2008 01:29 PM   [ Ignore ]  
Research Assistant
RankRankRank
Total Posts:  325
Joined  09-09-2007

I’ve noticed that the xss_clean function seems to add a semi-colon to any string that contains an ampersand.  For example, if someone writes “AT&T;Park” in a textarea and I run it through the xss_clean function, it will return “AT&T;Park”.

Has anyone else seen this?  Any recommendations?  Thanks.

Edit: This is pretty funny…..they must use xss_clean for this forum, so both of the things above say AT&T;Park.  What it should say is:

  ATampersandT Park

 Signature 

Zenedy | Anyvite | Tweetvite

Profile
 
 
Posted: 03 July 2008 02:08 PM   [ Ignore ]   [ # 1 ]  
Administrator
Avatar
RankRankRankRankRank
Total Posts:  3097
Joined  01-07-2008

Encode it as an entity: AT&T

 Signature 
Profile
MSG
 
 
Posted: 03 July 2008 02:11 PM   [ Ignore ]   [ # 2 ]  
Research Assistant
RankRankRank
Total Posts:  325
Joined  09-09-2007

Thanks, that has been my fix for now.  It appears that the xss_clean provides additional security versus just html encoding the information though, so I have been running it through htmlspecialchars() first and then through xss_clean.  Does anyone know if this is necessary?

It seems like it’s an issue given that I couldn’t post the correct information to this forum…

 Signature 

Zenedy | Anyvite | Tweetvite

Profile
 
 
Posted: 16 July 2008 12:51 PM   [ Ignore ]   [ # 3 ]  
Summer Student
Total Posts:  14
Joined  04-16-2008

This is a problem. 

The ampersand character is a commonly used (VERY commonly used) string delimiter.  There are multiple scenarios where this “semicolon insertion” is unacceptable.

——
Scenario #1: $_GET data

Let’s say, for example, you want to pass a return_url in your querystring (assuming you’re running CI with uri_protocol set to PATH_INFO, and global_xss_filtering to TRUE). 

http://mysite.com/user/login?s=1&return_url=some_encoded_url 

Because its a good idea to cleanse GET data, let’s run xss_clean() on $_SERVER[‘QUERY_STRING’]. You’ll get:

s=1&return;_url;=some_encoded_url 

The “return_url” param becomes “return_url;”. 

——
Scenario #2. $_COOKIE data

Let’s say you’re reading a cookie with some delimited data (on a project I’m on now, we read top-level cookies [written by other apps] on subdomains).  Again, assume you’ve got global_xss_filtering set to TRUE. 

Let’s say our cookie name is “sessdata” and the cookie data is:

u=reconstrukt&fn=Matt&uid=234&ts=20080716120000 

Reading this through CI:

get_cookie('sessdata'); 

And you get back *altered key names*

u=reconstrukt&fn;=Matt&uid;=234&ts;=20080716120000 

——
Scenario #3. $_POST data

Let’s say you’ve got a form with a field called “homepage_url”, allowing a user to post their website (or any external URL) on their personal profile page.  So, the user copies and pastes their link:

http://www.myblog.com?page=something&sort=2&a=1 

Reading this through CI:

$this->input->post('homepage_url'

You get back a malformed URL

http://www.myblog.com?page=something&sort;=2&a;=1 

——

Now, I *love* CI.  I’ve built a buncha sites with it (a few fairly high-profile).  But this “add a semicolon so we can convert entities to ASCII later” is just plain wonky.

A recommendation?  Don’t make “global_xss_filtering” an all-or-nothing setting. 

Instead, if we made this setting an enumerated list, you could leave filtering on for $_POST data, but leave it off for $_COOKIE data.  From the config:

$config['global_xss_filtering'TRUE

Becomes

$config['global_xss_filtering'= array('POST''GET'); 

And a simple check in the Input class would leave some the $_COOKIE superglobal untouched.

(This doesn’t necessarily solve scenario #3 above, but is certainly an improvement.)

Matt

 Signature 

======================
Matthew Knight
http://reconstrukt.com

Profile
 
 
Posted: 16 July 2008 01:10 PM   [ Ignore ]   [ # 4 ]  
Administrator
Avatar
RankRankRankRankRank
Total Posts:  3097
Joined  01-07-2008

reconstrukt, did you even try this?

The function protects GET variables (line 541 and 563):

$str "http://mysite.com/user/login?s=1&return_url=some_encoded_url";
echo 
$this->input->xss_clean($str);

// Echos the url with no modifications 
 Signature 
Profile
MSG
 
 
Posted: 16 July 2008 01:31 PM   [ Ignore ]   [ # 5 ]  
Summer Student
Total Posts:  14
Joined  04-16-2008

What version CI are you on?  Not seeing that on my end.

Try the other scenarios, you’ll see what I mean.  Till this is fixed, I’m subclassing the Input lib and making the ‘global_xss_filtering’ config key an array

Cheers
M

 Signature 

======================
Matthew Knight
http://reconstrukt.com

Profile
 
 
Posted: 16 July 2008 01:33 PM   [ Ignore ]   [ # 6 ]  
Administrator
Avatar
RankRankRankRankRank
Total Posts:  3097
Joined  01-07-2008

I’m using CI 1.6.4

The regular expression it uses actually matches any &key=value&another=more pattern, so none of your examples should cause any problems at all.

EDIT: Note how the forum (which uses that same function) didn’t change anything in my post, because it looks like a query string.

 Signature 
Profile
MSG
 
 
Posted: 16 July 2008 01:41 PM   [ Ignore ]   [ # 7 ]  
Summer Student
Total Posts:  14
Joined  04-16-2008

We’re on 1.6.0

Was this fix recently added?  I’m familiar with xss_clean(), but not seeing a regex that’d match qs strings.

M

 Signature 

======================
Matthew Knight
http://reconstrukt.com

Profile
 
 
Posted: 16 July 2008 01:59 PM   [ Ignore ]   [ # 8 ]  
Administrator
Avatar
RankRankRankRankRank
Total Posts:  3097
Joined  01-07-2008

It got a pretty major overhaul for 1.6.3 (1.6.4 was EE, I’m not magically ahead smile ), if you have a test install of your current system, replace the input library with the newest version and see if you run into any problems.  Should work without any major issues.

 Signature 
Profile
MSG
 
 
Posted: 16 July 2008 02:27 PM   [ Ignore ]   [ # 9 ]  
Summer Student
Total Posts:  14
Joined  04-16-2008

We haven’t changed the core, subclassed libs if we needed to make mods.  So I will give it a shot starting next week and post here (if anyone cares.)

cheers inparo.
M

 Signature 

======================
Matthew Knight
http://reconstrukt.com

Profile
 
 
Posted: 03 September 2008 03:22 PM   [ Ignore ]   [ # 10 ]  
Summer Student
Total Posts:  7
Joined  06-12-2008

I do. smile I’m seeing the same problem. There doesn’t seem to be any improvement in 1.6.3. I’ve activated global XSS filtering in my config, the idea being to have all _POST data automatically secured all the time. But I’m having the same issue: stuff like “A&B” gets transformed into “A&B;”, basically there’s an extra semicolon added. And another case: “A&B C” becomes “A&B;C” (...so not only it appended a semicolon, it also ate up the space before C).

I’m surprised this issue hasn’t surfaced before, since it’s xss_clean() that does it. Don’t people enter stuff like this into forms?

BTW, I’m not seeing the issue with this forum, so if it’s built on CI I’d appreciate telling me what you did right to fix it. smile

PS: Ah, there it is. Not happening anymore using Input.php from SVN. Any idea about how safe it would be to use that file with CI 1.6.2?

Profile
 
 
Posted: 03 September 2008 03:46 PM   [ Ignore ]   [ # 11 ]  
Summer Student
Total Posts:  7
Joined  06-12-2008

Still no go, I’m afraid. I’ve upgraded to 1.6.3 and Input.php from SVN. A&B comes out fine now, but I’m still seeing this:

sales&marketing;, 

Notice the stubborn extra semicolon (I end the string with g and comma, I get a semicolon shoved between them). smile

Should I post a bug about this?

Profile
 
 
Posted: 04 September 2008 09:17 AM   [ Ignore ]   [ # 12 ]  
Summer Student
Total Posts:  14
Joined  04-16-2008

I’m afraid so. 

The scenarios outlined in my original post were essentially key value pairs delimited by ampersands (i.e. in URLs, etc.)  Upgrading to 1.6.3 solved these issues, however, I too have seen the “semicolon insertion” issue on simple text entry as you like you mention above.

The semicolon is inserted because of the regular expression that xss_clean() uses.  But the underlying problem is part of a larger design flaw - xss_clean looks for what it *thinks* to be mis-typed ASCII entities, and tries to automagically correct them for you.  If/when it finds such a “typo”, it inserts a semicolon.  I’m not sure why this auto-correction is done, but I’m sure there’s a reason - the CI guys wouldn’t bake that in for nothing. 

Maybe a better solution is to enumerate the ASCII characters to be converted via get_html_translation_table(), and using more precise regex’s based on this explicit list. ?

In any case, your “sales&marketing;” should be left untouched (as there’s no ASCII entity called “&marketing;”).  So yes, this is definitely a bug…

 Signature 

======================
Matthew Knight
http://reconstrukt.com

Profile
 
 
Posted: 04 September 2008 09:24 AM   [ Ignore ]   [ # 13 ]  
Summer Student
Total Posts:  14
Joined  04-16-2008

Note that this isn’t fixed on the boards…

See above in the last paragraph, a semicolon was inserted into the first instance of “sales&marketing;”

(there, it did it again)

Looks like we can get this to happen if we type some pattern like:

[alphanumeric][ampersand][alphanumeric]

OK, let’s try it. Any semicolons you may see below have been inserted by these boards, and were not typed by me.

1. sales&marketing;2. something&1something;3. a&b&c
4. 1&2&3
5. something& spacesomethingelse& anotherspacesomethingelse&
6. one1&two2;&three;&

...ok, clicking submit…

 Signature 

======================
Matthew Knight
http://reconstrukt.com

Profile
 
 
Posted: 04 September 2008 09:29 AM   [ Ignore ]   [ # 14 ]  
Summer Student
Total Posts:  14
Joined  04-16-2008

Results:

A semicolon was inserted on tests 1, 2, 6. 

3, 4, 5 were left untouched. 

Also note: the line break was gobbled up after #1 and #2.  So, another bug - when a semicolon is inserted, any trailing newline characters are also replaced.

 Signature 

======================
Matthew Knight
http://reconstrukt.com

Profile
 
 
Posted: 05 September 2008 04:52 AM   [ Ignore ]   [ # 15 ]  
Summer Student
Total Posts:  7
Joined  06-12-2008

Not just newline, also space and possibly other types of whitespace.

Doesn’t completing supposedly incomplete HTML entities go a bit beyond the call of duty, so to speak?

Speaking of which, why is a good old htmlentities() not enough to prevent XSS? XSS could be (simply) defined as remote-supplied JavaScript executed when it should not be. I would think that any browser that manages to somehow execute JavaScript in a piece of text that passed through htmlentities(), output on a document with Content-Type “text/html”, to be seriously flawed. Are there such browsers? Is that why just htmlentities() is not enough?

Profile
 
 
   
1 of 2
1