Part of the EllisLab Network
   
 
Switch From REGEX to Filter_* in php 5.2+, 38% faster
Posted: 18 July 2008 06:01 PM   [ Ignore ]  
Lab Assistant
RankRank
Total Posts:  131
Joined  01-02-2008

There is a wonderful new function set in PHP 5.2+ called filter.

Filters in PHP
Excelent Tutorial on all functions and flags

It basically allows you validate like REGEXP without the confusion, general slowness, and inability to read the actual function for beginners to REGEXP.

For example in php 5.2+ i can validate an email with preg_match:

if(!preg_match("/^([a-z0-9])(([-a-z0-9._])*([a-z0-9]))*\@([a-z0-9])([-a-z0-9_])+([a-z0-9])*(\.([a-z0-9])([-a-z0-9_-])([a-z0-9])+)*$/i", $email))
{
    
return false;
}
return true;

or filter_var:

if(!$email = filter_var($email, FILTER_VALIDATE_EMAIL))
{
    
return false;
}
return true;

Much easier to read. Im doing speed testing now and will update this post with my results.

UPDATE: Benchmarked on Localhost(1.86ghz Core Duo, 1 gig ram, 5200RPM HDD)

  time index   ex time   %
Start   1216422803.37500400   -  0.00%
Eregi Call   1216422803.37510800   0.000104   52.79%
filter_var Call   1216422803.37517300   0.000065   32.99%
Stop   1216422803.37520100   0.000028   14.21%
total   -  0.000197   100.00%

On average I get a 38% performance increase, and that is only for email validation.
They also have functions for the following:

Array
(
    
[0] => int
    [1]
=> boolean
    [2]
=> float
    [3]
=> validate_regexp
    [4]
=> validate_url
    [5]
=> validate_email
    [6]
=> validate_ip
    [7]
=> string
    [8]
=> stripped
    [9]
=> encoded
    [10]
=> special_chars
    [11]
=> unsafe_raw
    [12]
=> email
    [13]
=> url
    [14]
=> number_int
    [15]
=> number_float
    [16]
=> magic_quotes
    [17]
=> callback
)

Im really starting to like php5 now, 35%+ performance increases and much easier to read? Sign me up!
-Matt

Profile
 
 
Posted: 19 July 2008 03:41 AM   [ Ignore ]   [ # 1 ]  
Research Assistant
Avatar
RankRankRank
Total Posts:  416
Joined  05-28-2008

Should be interesting to look into. Maybe some more testing and I’d certainly vote to include it.

Profile
 
 
Posted: 19 July 2008 06:57 AM   [ Ignore ]   [ # 2 ]  
Lab Assistant
RankRank
Total Posts:  294
Joined  10-17-2006

CI is php4 compatible and I was under the impression that the filter extension is not always installed.

Profile
 
 
Posted: 19 July 2008 07:07 AM   [ Ignore ]   [ # 3 ]  
Lab Assistant
RankRank
Total Posts:  131
Joined  01-02-2008

no, thats why it says php 5.2 in the title.
Its a PECL extention for the rest of php.
-Matt

Profile
 
 
Posted: 19 July 2008 07:08 AM   [ Ignore ]   [ # 4 ]  
Research Assistant
Avatar
RankRankRank
Total Posts:  616
Joined  01-13-2008

To keep the compatibility and performance the script could determine which version of php is in use then use the compatible item.

PHP 4 servers would use: preg_match

PHP 5.2 servers would use: filter_var

 Signature 

Yonti - Hosting Now Available
Babblemap - Moved, Please Update links.

Profile
 
 
Posted: 19 July 2008 10:34 AM   [ Ignore ]   [ # 5 ]  
Administrator
Avatar
RankRankRankRankRankRankRank
Total Posts:  15311
Joined  06-03-2002

It’s a nice extension but it’s a bit young and unproven to start using in the framework, I think.  PHP’s got a nasty habit about new features like this being moving targets with behavioral changes for many minor point releases.  One question though, your code shows you using PCRE regex vs. Filter, but your results indicate that you used POSIX, which are terribly slow by comparison.

Eregi Call 1216422803.37510800 0.000104 52.79%
filter_var Call 1216422803.37517300 0.000065 32.99%

 Signature 
Profile
MSG
 
 
Posted: 20 July 2008 06:11 AM   [ Ignore ]   [ # 6 ]  
Lab Assistant
RankRank
Total Posts:  131
Joined  01-02-2008

That’s an old function i was using, I switch the call to preg_match for the benchmark to see actual performance gains. I used the preg_match call in the benchmark.

If you want my code I can package it up and post it here.

It might be a good consideration for php 5.3, which is scheduled for final release some time in September.
-Matt

Profile
 
 
Posted: 20 July 2008 08:01 AM   [ Ignore ]   [ # 7 ]  
Administrator
Avatar
RankRankRankRankRankRankRank
Total Posts:  15311
Joined  06-03-2002

Sure, anytime you post benchmarks, particularly with a claim of x% increase, you should show your code.  In this case it’s particularly deceptive because you’re not talking about a piece of code that is commonly executed, so your application performance increase will be negligible unless of course you’re building an email validation application. wink

 Signature 
Profile
MSG
 
 
Posted: 20 July 2008 08:37 AM   [ Ignore ]   [ # 8 ]  
Lab Assistant
RankRank
Total Posts:  131
Joined  01-02-2008

The point was not only for email validation but that the same library could be used to replace many functions in the validation/sanitation routines. This is very important for me as testing shows that my CI application’s biggest bottleneck is data validation/sanitation. Regardless, here is the code: http://xtrafile.com/uploads/validation_benchmark.zip
-Matt

Profile
 
 
Posted: 20 July 2008 08:50 AM   [ Ignore ]   [ # 9 ]  
Administrator
Avatar
RankRankRankRankRankRankRank
Total Posts:  15311
Joined  06-03-2002

If you’re suggesting that you’d be able to eliminate the input security of xss_clean(), I’m afraid I have to strongly disagree with you.  There aren’t going to be many real world scenarios where you’ll gain perceivable differences over things like ctype_digit() or is_int(), or replacing two or three IP / Email validation routines.  So much of this is also process dependent.  3 out of the 10 times I ran your code, I got results like this:

Eregi Call        1216561665.02766700   0.00014996528625488  37.22%
--------------------------------------------------------------
filter_var Call   1216561665.02791300   0.00024604797363281  61.07%

It’s still an interesting topic, and I’m sure this extension will prove very handy in the future, so thank you for bringing it up.

 Signature 
Profile
MSG
 
 
Posted: 20 July 2008 08:58 AM   [ Ignore ]   [ # 10 ]  
Lab Assistant
RankRank
Total Posts:  131
Joined  01-02-2008

Its more of a consistency thing, as many of the filter_var(or other filter_*) calls just reference of overlay on top of current php functions Many of the filter functions just call existing functions to do validation/sanitation. Its mainly just constant programming if using filter_* elsewhere.
-Matt

Profile
 
 
Posted: 20 July 2008 05:11 PM   [ Ignore ]   [ # 11 ]  
Administrator
Avatar
RankRankRankRankRankRankRank
Total Posts:  15311
Joined  06-03-2002

Here’s an interesting twist.  When looped, PCRE is much faster.

PCRE Good Email      0.0336
Filter Good Email    0.0721
PCRE Bad Email       0.0210
Filter Bad Email     0.0606

And when you add consideration for PHP version compatibility to decide whether or not filter_var() can be used, which we would have to do in CodeIgniter, the disparity grows even further, even using an efficient means of doing a version comparison.

PCRE Good Email      0.0339
Filter Good Email    0.0953
PCRE Bad Email       0.0210
Filter Bad Email     0.0844

Similar results are found from the OP’s benchmark code if you loop it, and it should be mentioned that the regex in that benchmark is not the same that is used for validating email addresses in CI, which is shorter and more efficient to start with.  Even in the best case scenario, filter_var() is no where near 38% faster than CI’s PCRE email validation.  A fun exercise either way.

No arguments about code legibility though; and I bet some other CI users who host exclusively on PHP 5.2+ environments would dig if class extensions were made available that took advantage of filter*, hint hint. ;)

File Attachments
time.zip  (File Size: 1KB - Downloads: 32)
 Signature 
Profile
MSG
 
 
   
 
 
Post Marker Legend
New Topic New posts Hot Topic Hot Topic with new posts New Poll New Poll Moved Topic Moved Topic Sticky Topic Sticky topic
Old Topic No new posts Hot Old Topic Hot Topic with no new posts Old Poll Old Poll Closed Topic Closed Topic Announcement Announcements
Theme
Change Theme
Visitor Statistics
The most visitors ever was 719, on June 06, 2008 10:16 AM
Total Registered Members: 62793 Total Logged-in Users: 22
Total Topics: 77480 Total Anonymous Users: 0
Total Replies: 418200 Total Guests: 151
Total Posts: 495680    
Members ( View Memberlist )
Newest Members:  newbie boymarkrichesongerdyPorscheescapeexpyAnn BaileyTy BexDamien2k8cibbuser