Part of the EllisLab Network
   
5 of 8
5
Making CodeIgniter faster, one line at a time
Posted: 04 May 2007 03:56 PM   [ Ignore ]   [ # 61 ]  
Lab Technician
Avatar
RankRankRankRank
Total Posts:  1163
Joined  08-06-2006
Geert De Deckere - 04 May 2007 03:18 PM

Well, seems like we’re done tweaking finally, right?

I like those changes. I just realized one problem with the addition of \n\r\t to the tokenizer: the string is rebuilt with space character so \n and \r and \t will be transformed into spaces. That could be a problem if you imagine this function just before a nl2br(). It might mess some things up for people expecting their newlines, carriage-returns, and tabs to be there.

I suggest just going back to the space tokenizer on the first pass and then see if there are any newlines, carriage-returns or tabs and then on second, third and 4th passes we incrementally replace the other whitespace chars.

I can’t think of any other way of doing it, can you?

 Signature 

imap_pop get email | site_migrate port sites | OOCalendar | PhotoBox2 gallery | CI/EE 2 word_limiter, yep, wrote it

Profile
 
 
Posted: 04 May 2007 03:59 PM   [ Ignore ]   [ # 62 ]  
Lab Assistant
RankRank
Total Posts:  248
Joined  02-10-2007

Next please. The _str_to_array() function from the Email library desperately cries for help.


system/libraries/Email.php - line 367


BEFORE: 10000 iterations take about 0.3634 seconds

function _str_to_array($email)
{
    
if ( ! is_array($email))
    
{    
        
if (ereg(',$', $email))
            
$email = substr($email, 0, -1);
        
        if (
ereg('^,', $email))
            
$email = substr($email, 1);    
                
        if (
ereg(',', $email))
        
{                    
            $x
= explode(',', $email);
            
$email = array();
            
            for (
$i = 0; $i < count($x); $i ++)
                
$email[] = trim($x[$i]);
        
}
        
else
        
{                
            $email
= trim($email);
            
settype($email, "array");
        
}
    }
    
return $email;
}

Note that if your email string contains duplicate comma’s (example: “email,,,”) it will create empty array elements right now. This is resolved in the updated function below as well.

AFTER: 10000 iterations take about 0.2024 seconds

function _str_to_array($email) {
    
    
// If $email is an array already, we're done.
    
if (is_array($email))
        return
$email;
    
    
// Initialize the email array...
    
$email_array = array();
    
    
// ...and fill it with clean emails.
    
foreach (explode(',', $email) as $email) {
        $email
= trim($email);
        if (
$email != '')
            
$email_array[] = $email;
    
}
    
    
return $email_array;
}
 Signature 

Kohana rocks!

Profile
 
 
Posted: 04 May 2007 04:09 PM   [ Ignore ]   [ # 63 ]  
Lab Assistant
RankRank
Total Posts:  248
Joined  02-10-2007
sophistry - 04 May 2007 03:56 PM

I like those changes. I just realized one problem with the addition of \n\r\t to the tokenizer: the string is rebuilt with space character so \n and \r and \t will be transformed into spaces. That could be a problem if you imagine this function just before a nl2br(). It might mess some things up for people expecting their newlines, carriage-returns, and tabs to be there.

True. Good Point. Again. The original word_limiter() function does the same thing. Could that be considered a bug? Also note that in case $str is shorter than twice $limit (first lines of function), the original string is returned with all original whitespace. It should at least return the same formatting as in the other case, just for the sake of consistency.

I suggest just going back to the space tokenizer on the first pass and then see if there are any newlines, carriage-returns or tabs and then on second, third and 4th passes we incrementally replace the other whitespace chars.

Could you translate that into some real code? Don’t clearly see where you’re going with that one.

 Signature 

Kohana rocks!

Profile
 
 
Posted: 04 May 2007 04:33 PM   [ Ignore ]   [ # 64 ]  
Administrator
Avatar
RankRankRankRankRankRankRank
Total Posts:  19293
Joined  06-03-2002
function _str_to_array($email) {
    
    
// If $email is an array already, we're done.
    
if (is_array($email))
        return
$email;
    
    
// Initialize the email array...
    
$email_array = array();
    
    
// ...and fill it with clean emails.
    
foreach (explode(',', $email) as $email) {
        $email
= trim($email);
        if (
$email != '')
            
$email_array[] = $email;
    
}
    
    
return $email_array;
}

Musing out loud, I wonder if you could shave a few more microseconds by using preg_split() with the PREG_SPLIT_NO_EMPTY flag instead of looping through the items and trim()ing them.

 Signature 
Profile
MSG
 
 
Posted: 04 May 2007 04:59 PM   [ Ignore ]   [ # 65 ]  
Lab Assistant
RankRank
Total Posts:  248
Joined  02-10-2007
Derek Jones - 04 May 2007 04:33 PM

Musing out loud, I wonder if you could shave a few more microseconds by using preg_split() with the PREG_SPLIT_NO_EMPTY flag instead of looping through the items and trim()ing them.

Great advice! Significant speed boost again and the function becomes even smaller. Regex appear to be faster here! No need for PREG_SPLIT_NO_EMPTY even.

UPDATE: 10000 iterations take about 0.1222 seconds

function _str_to_array($email) {
    
    
if (is_array($email))
        return
$email;
    
    
$email = trim($email, " \r\n\t,");
    
    return
preg_split('/[\s,]+/', $email);
}
 Signature 

Kohana rocks!

Profile
 
 
Posted: 04 May 2007 05:10 PM   [ Ignore ]   [ # 66 ]  
Administrator
Avatar
RankRankRankRankRankRankRank
Total Posts:  19293
Joined  06-03-2002

Ah yes, I guess since you’ll need to be using + to capture whitespace anyway, multiple commas will not create their own array items either.  If you use the PREG_SPLIT_NO_EMPTY flag, though, I believe you can eliminate the trim(), though I would not be entirely surprised if that turned out to be “slower”.

 Signature 
Profile
MSG
 
 
Posted: 04 May 2007 05:58 PM   [ Ignore ]   [ # 67 ]  
Lab Assistant
RankRank
Total Posts:  144
Joined  09-08-2006

at this rate CI will only be 100k in size and as fast as a static page with all these changes smile

keep up the good work.

Profile
 
 
Posted: 04 May 2007 07:21 PM   [ Ignore ]   [ # 68 ]  
Lab Assistant
Avatar
RankRank
Total Posts:  101
Joined  10-01-2006

Nice thread! :D

Are these getting updated in the SVN?

 Signature 

Check out the BlueFlame Project!

Profile
 
 
Posted: 05 May 2007 01:47 AM   [ Ignore ]   [ # 69 ]  
Lab Assistant
RankRank
Total Posts:  248
Joined  02-10-2007
Derek Jones - 04 May 2007 05:10 PM

Ah yes, I guess since you’ll need to be using + to capture whitespace anyway, multiple commas will not create their own array items either.  If you use the PREG_SPLIT_NO_EMPTY flag, though, I believe you can eliminate the trim(), though I would not be entirely surprised if that turned out to be “slower”.

I tried.

function _str_to_array($email) {
    
    
if (is_array($email))
        return
$email;
    
    return
preg_split('/[\s,]+/', $email, -1, PREG_SPLIT_NO_EMPTY);
}

I had to increase the amount of runs to 100.000 in order to get clear time difference in the benchmarks. Using PREG_SPLIT_NO_EMPTY indeed appears to be slower than the previous version of _str_to_array().

About 1.2236 seconds versus 1.2023 seconds. It is a tiny but consistent difference.

 Signature 

Kohana rocks!

Profile
 
 
Posted: 05 May 2007 02:17 AM   [ Ignore ]   [ # 70 ]  
Lab Assistant
RankRank
Total Posts:  248
Joined  02-10-2007

system/libraries/URI.php - line 213

BEFORE: 100.000 iterations take about 1.2795 seconds

function assoc_to_uri($array)
    
{    
        $temp
= array();
        foreach ((array)
$array as $key => $val)
        
{
            $temp[]
= $key;
            
$temp[] = $val;
        
}
        
        
return implode('/', $temp);
    
}

AFTER: 100.000 iterations take about 1.1822 seconds

function assoc_to_uri($array) {    
    $uri
= '';

    foreach ((array)
$array as $key => $val)
        
$uri .= $key .'/'. $val .'/';
    
    return
rtrim($uri, '/');
}
 Signature 

Kohana rocks!

Profile
 
 
Posted: 05 May 2007 05:12 AM   [ Ignore ]   [ # 71 ]  
Lab Assistant
RankRank
Total Posts:  248
Joined  02-10-2007
sophistry - 04 May 2007 12:52 PM

I nearly doubled the speed by making some regex mods - one of which I just learned yesterday from Geert (using single quotes)!

function valid_ip_sissy_better($ip)
{
    
if (!preg_match( '/^(?:\d{1,3}\.){3}\d{1,3}$/', $ip))
    
{
        
return FALSE;
    
}
    
    $result
= ip2long($ip);

    return !(
$result === -1 || $result === FALSE);
}

10000 reps.
original valid_ip code: 0.3537
old sissy: 0.1865
new sissy: 0.0998

I guess we’re not done with the valid_ip() function yet. Concerning PHP’s ip2long() function, I’ve never used it before but the manual makes it quite clear that this function is not ideal for validating purposes. I quote:

#1 - Because PHP’s integer type is signed, and many IP addresses will result in negative integers, you need to use the “%u” formatter of sprintf() or printf() to get the string representation of the unsigned IP address.

#2 - ip2long() should not be used as the sole form of IP validation. Combine it with long2ip().

#3 - ip2long() will also work with non-complete IP addresses (as Derek Jones mentioned already).

See: php.net/ip2long

These three arguments all mean extra processing overhead. Argument #3 has already been worked around by the regex in valid_ip_sissy_better(). However I’m going to give it a try without using ip2long() at all.

UPDATE: 10000 iterations take about 0.1420 seconds (with valid IP = passing all checks)

function valid_ip_geert($ip) {
    
    $ip_segments
= explode('.', $ip);
    
    
// Always 4 segments needed
    
if (count($ip_segments) != 4)
        return
FALSE;

    
// Only digits allowed
    
if (count(array_filter($ip_segments, 'ctype_digit')) != 4)
        return
FALSE;
    
    
// IP cannot start with a "0" segment
    
if ($ip_segments[0] == 0)
        return
FALSE;
        
    
// IP segments cannot be greater than 255
    
if (max($ip_segments) > 255)
        return
FALSE;
        
    
// All checks passed
    
return TRUE;
}

In my opinion this is a very clearly structured function which really allows only valid IPs. It is fast too.

 Signature 

Kohana rocks!

Profile
 
 
Posted: 05 May 2007 05:40 AM   [ Ignore ]   [ # 72 ]  
Research Assistant
Avatar
RankRankRank
Total Posts:  737
Joined  10-18-2006

OK, Geert… I will try it with ip2long again… I think this one can be good… simple

function valid_ip($ip) {
    
return $ip == long2ip(ip2long($ip));
}

checking IP 10.0.0.5

valid_ip_geert: 10000 iterations take about 0.1957 seconds

NEW valid_ip: 10000 iterations take about 0.0737 seconds

checking IP 10.0.0

valid_ip_geert: 10000 iterations take about 0.0785 seconds

NEW valid_ip: 10000 iterations take about 0.0706 seconds

checking IP 10.0.0.524

valid_ip_geert: 10000 iterations take about 0.2024 seconds

NEW valid_ip: 10000 iterations take about 0.0688 seconds

 Signature 

Once in a while I remember I use Twitter

Profile
 
 
Posted: 05 May 2007 05:41 AM   [ Ignore ]   [ # 73 ]  
Research Assistant
RankRankRank
Total Posts:  970
Joined  04-13-2006

The contributors in here deserve their own special release: CI, the Fast version. You’re going to need a lightning flash for a logo instead of a flame.

Profile
 
 
Posted: 05 May 2007 06:05 AM   [ Ignore ]   [ # 74 ]  
Lab Assistant
RankRank
Total Posts:  128
Joined  04-06-2007

But Seppo, using only ip2long and long2ip you will return a non-boolean value (which is not what validation expects) and partial IPs will be considered valid.  While in many networking style scripts converting a partial IP to an actual IP is probably a good thing, in validating whether a presented value is a valid IP I would think that is not a good thing.

Profile
 
 
Posted: 05 May 2007 06:06 AM   [ Ignore ]   [ # 75 ]  
Lab Assistant
RankRank
Total Posts:  248
Joined  02-10-2007
Seppo - 05 May 2007 05:40 AM

OK, Geert… I will try it with ip2long again… I think this one can be good… simple

function valid_ip_seppo($ip) {
    
return $ip == long2ip(ip2long($ip));
}

Your function is faster but is it accurate?

var_dump(valid_ip_geert('0.0.0.0')); // FALSE
var_dump(valid_ip_seppo('0.0.0.0')); // TRUE

var_dump(valid_ip_geert('0.1.1.1')); // FALSE
var_dump(valid_ip_seppo('0.1.1.1')); // TRUE

var_dump(valid_ip_geert('01.1.1.1')); // TRUE
var_dump(valid_ip_seppo('01.1.1.1')); // FALSE
 Signature 

Kohana rocks!

Profile
 
 
   
5 of 8
5
 
Post Marker Legend
New Topic New posts Hot Topic Hot Topic with new posts New Poll New Poll Moved Topic Moved Topic Sticky Topic Sticky topic
Old Topic No new posts Hot Old Topic Hot Topic with no new posts Old Poll Old Poll Closed Topic Closed Topic Announcement Announcements
Theme
Change Theme
Visitor Statistics
The most visitors ever was 819, on March 11, 2010 11:15 AM
Total Registered Members: 120562 Total Logged-in Users: 33
Total Topics: 126613 Total Anonymous Users: 1
Total Replies: 665591 Total Guests: 302
Total Posts: 792204    
Members ( View Memberlist )