Part of the EllisLab Network
   
6 of 7
6
Caching and headers
Posted: 12 November 2008 09:55 AM   [ Ignore ]   [ # 76 ]  
Lab Assistant
Avatar
RankRank
Total Posts:  114
Joined  09-09-2008
Arjen van Bochoven - 12 November 2008 11:28 AM

I still assume this extension stands a better chance of making it into the core when it is fully written out. What do you think?

I completely agree with you.
as soon as I’m beginning to benchmark things I forget everything else wink

Profile
 
 
Posted: 12 November 2008 12:44 PM   [ Ignore ]   [ # 77 ]  
Grad Student
Avatar
Rank
Total Posts:  98
Joined  06-07-2008

The code on the wiki is currently the serialize() code. Was my older version better? I don’t know how intensive the serialize operation is.

Please add this functionality - I don’t care how - into the core.

Profile
 
 
Posted: 13 November 2008 05:23 AM   [ Ignore ]   [ # 78 ]  
Lab Assistant
Avatar
RankRank
Total Posts:  114
Joined  09-09-2008

did anyone create a thread in the feature request section for this? wink

It’s more likely we get some sort of response in there.

Profile
 
 
Posted: 13 November 2008 06:48 AM   [ Ignore ]   [ # 79 ]  
Sr. Research Associate
Avatar
RankRankRankRankRank
Total Posts:  4777
Joined  03-23-2006
Arjen van Bochoven - 12 November 2008 02:22 PM

As I said, I am hoping this extension will make it into the core, so as much as I like your solution, I want to leave the code “as is” until we hear back from a CI dev.

-Derek?

Yes, this is something we’d want to add to core - great idea, and great work guys!

I’m sorry that I haven’t been more active… I’m swamped, and as you can imagine getting pulled in a hundred different directions wink

I’ll make efforts to take a look more closely at what you all have done.  On quick inspection let me say that a serialized array may be an approach you want to re-consider.  If output got large, PHP could blow up in our faces serializing arrays with that data..

 Signature 

DerekAllard.com - CodeIgniter, ExpressionEngine, and the World of Web Design

Profile
MSG
 
 
Posted: 13 November 2008 07:17 AM   [ Ignore ]   [ # 80 ]  
Grad Student
Avatar
Rank
Total Posts:  39
Joined  03-05-2007

Hey Allard,

Thank you for looking at this!

Derek Allard - 13 November 2008 11:48 AM

On quick inspection let me say that a serialized array may be an approach you want to re-consider.  If output got large, PHP could blow up in our faces serializing arrays with that data..

As far as I understand serialize() is not doing much with strings:

A string is encoded as
s:size:value;

But I must admit I haven’t done any benchmarks, can someone confirm the serialize() function can handle large chunks of data?

Arjen

Profile
 
 
Posted: 13 November 2008 07:41 AM   [ Ignore ]   [ # 81 ]  
Grad Student
Avatar
Rank
Total Posts:  98
Joined  06-07-2008

It looks like my original solution of just prepending the data with timestamp (like before) plus headers (unlike before) may be simpler and work better, even if less elegant.

I’m so glad finally this has gathered some support, I’ve been battling on this one on-off for months. I did originally post in the feature requests forum too but I believe it was closed as it was considered a double-post.

Please add this to the core, it’s sorely missing. The core code has a comment saying “we need to add header caching”!

Profile
 
 
Posted: 13 November 2008 02:17 PM   [ Ignore ]   [ # 82 ]  
Lab Assistant
Avatar
RankRank
Total Posts:  114
Joined  09-09-2008
Arjen van Bochoven - 13 November 2008 12:17 PM

But I must admit I haven’t done any benchmarks, can someone confirm the serialize() function can handle large chunks of data?

define “large”.

I’ve got an application that uses serialize and a database to cache data.
the biggest serialized string in the DB is currently about 12kb long,
runs fast and stable smile

the js-file I used for benchmarking the code is about 31kb big.


I’m really impressed by the effectiveness of (un)serialize.
I’ve never expected it to be so damn fast wink

Aquillyne - 13 November 2008 12:41 PM

It looks like my original solution of just prepending the data with timestamp (like before) plus headers (unlike before) may be simpler and work better, even if less elegant.

I’ll try to benchmark both versions soon, code-chunks welcome smile

Profile
 
 
Posted: 13 November 2008 02:27 PM   [ Ignore ]   [ # 83 ]  
Grad Student
Avatar
Rank
Total Posts:  39
Joined  03-05-2007

I would consider 10MB+ large. If it can handle files of this size faster and in less memory than the original cache functions, it would surely prove that it does not “explode in our face”.  grin

Profile
 
 
Posted: 13 November 2008 02:34 PM   [ Ignore ]   [ # 84 ]  
Lab Assistant
Avatar
RankRank
Total Posts:  114
Joined  09-09-2008

okay, I’ll put that on my “to-test-list” LOL

Profile
 
 
Posted: 17 November 2008 05:12 AM   [ Ignore ]   [ # 85 ]  
Lab Assistant
Avatar
RankRank
Total Posts:  114
Joined  09-09-2008

okay here are the results for displaying cached data:

1. a simple “string-disassembling” variant: ~ 0.00016 s
2. serialize: ~ 0.00018 s
3. RegExp: ~ 0.00076 s

memory consumption (measured using memory_get_peak_usage) is exactly the same with all 3 methods.

the “winner” looks like this wink

class MY_Output extends CI_Output {

    
/**
     * Write a Cache File
     *
     * @access    public
     * @return    void
     */    
    
function _write_cache($output)
    
{
        $CI 
=& get_instance();    
        
$path $CI->config->item('cache_path');

        
$cache_path = ($path == '') ? BASEPATH.'cache/' $path;

        if ( ! 
is_dir($cache_path) OR ! is_really_writable($cache_path))
        
{
            
return;
        
}

        $uri 
=    $CI->config->item('base_url').
                
$CI->config->item('index_page').
                
$CI->uri->uri_string();

        
$cache_path .= md5($uri);

        if ( ! 
$fp = @fopen($cache_pathFOPEN_WRITE_CREATE_DESTRUCTIVE))
        {
            log_message
('error'"Unable to write cache file: ".$cache_path);
            return;
        
}

        
// Prepare expiration time and headers
        
$expire time(+ ($this->cache_expiration 60);
        
$headers = array();
        foreach(
$this->headers as $header)
        
{
            $headers[] 
$header[0].(int)(boolean)$header[1];
        
}
        $headers 
implode("\n"$headers);

        if (
flock($fpLOCK_EX))
        
{
            fwrite
($fp$expire .'TS--->'$headers .'H--->'$output);
            
flock($fpLOCK_UN);
        
}
        
else
        
{
            log_message
('error'"Unable to secure a file lock for file at: ".$cache_path);
            return;
        
}
        fclose
($fp);
        @
chmod($cache_pathDIR_WRITE_MODE);

        
log_message('debug'"Cache file written: ".$cache_path);
    
}

    
/**
     * Update/serve a cached file
     *
     * @access    public
     * @return    void
     */
    
function _display_cache(&$CFG, &$URI)
    
{
        $cache_path 
= ($CFG->item('cache_path') == '') ? BASEPATH.'cache/' $CFG->item('cache_path');

        if ( ! 
is_dir($cache_path) OR ! is_really_writable($cache_path))
        
{
            
return FALSE;
        
}

        
// Build the file path.  The file name is an MD5 hash of the full URI
        
$uri =    $CFG->item('base_url').
                
$CFG->item('index_page').
                
$URI->uri_string;

        
$filepath $cache_path.md5($uri);

        if ( ! @
file_exists($filepath))
        
{
            
return FALSE;
        
}

        
if ( ! $fp = @fopen($filepathFOPEN_READ))
        {
            
return FALSE;
        
}

        flock
($fpLOCK_SH);

        
$cache '';
        if (
filesize($filepath) 0)
        {
            $cache 
fread($fpfilesize($filepath));
        
}

        flock
($fpLOCK_UN);
        
fclose($fp);

        
// Strip out the embedded timestamp and headers        
        
$ts strpos($cache'TS--->');
        
$h strpos($cache'H--->');
        if ( ! 
$ts || ! $h {
            
return FALSE;
        
}
        $match 
= array();
        
$match['1'substr($cache0$ts);
        
$match['2'substr($cache$ts+6$h-$ts-6);
        
$match['0'$match['1'].'TS--->'.$match['2'].'H--->';

        
// Has the file expired? If so we'll delete it.
        
if (time() >= trim(str_replace('TS--->'''$match['1'])))
        
{         
            
@unlink($filepath);
            
log_message('debug'"Cache file has expired. File deleted");
            return 
FALSE;
        
}

        
// Extract the headers
        
$headers explode("\n"$match['2']);
        foreach(
$headers as $header)
        {
            $this
->headers[] = array(substr($header0, -1)substr($header, -1));
        
}

        
// Display the cache
        
$cache str_replace($match['0']''$cache);
        
$this->_display(&$cache);
        
log_message('debug'"Cache file is current. Sending it to browser.");
        return 
TRUE;
    
}
Profile
 
 
Posted: 17 November 2008 07:01 AM   [ Ignore ]   [ # 86 ]  
Grad Student
Avatar
Rank
Total Posts:  39
Joined  03-05-2007

Ok, I did some benchmarking with a ‘reasonably large’ file (6,2MB)

As you can see the memory results are interesting: the CI builtin cache triples memory usage as well as narkaT’s strpos() method. My own proposed method roughly doubles memory usage (which could be improved)

Not cached:
Page rendered in 1.1956 seconds
Memory usage: 6.94MB

CI 1.7.0 caching using preg_match():
Page rendered in 0.4045 seconds
Memory usage: 18.86MB

My solution, using serialize():
Page rendered in 0.3594 seconds
Memory usage: 12.74MB

narkaT’s solution using strpos():
Page rendered in 0.4231 seconds
Memory usage: 18.9MB

@narkaT: You assume the second parameter of set_header() is a single char, but that is nowhere enforced, people can send in an empty string if they want. You should also set the type of the second var (which should be a boolean).
edit: Make sure you use the {memory_usage} pseudo variable in your view to measure memory usage, otherwise you get the cached value.

Profile
 
 
Posted: 17 November 2008 09:32 AM   [ Ignore ]   [ # 87 ]  
Lab Assistant
Avatar
RankRank
Total Posts:  114
Joined  09-09-2008
Arjen van Bochoven - 17 November 2008 12:01 PM

@narkaT: You assume the second parameter of set_header() is a single char, but that is nowhere enforced, people can send in an empty string if they want. You should also set the type of the second var (which should be a boolean).

thats right, I should have done some “cleaning” before posting the code wink
I’ve edited the above code.

Arjen van Bochoven - 17 November 2008 12:01 PM

edit: Make sure you use the {memory_usage} psuedo variable in your view to measure memory usage, otherwise you get the cached value.

I measured the memory usage directly in the extended output class using memory_get_peak_usage.
very “hacky” although.

I were confused that in my benchmark there where no difference between the memory consumptions,
even though I used the same php-function as CI to get the memory usage.

so I used the method with {memory_usage} you suggested.

I managed to detect the problem an cut down the memory usage drastically smile
my previous benchmarks reported the same memory usage because I measured the usage
before calling the _display-function.

So the call to the _display-function was the cause of the large memory usage.
passing the output by reference solved that issue for both, my strpos approach and
your serialize approach.


I’ve done some benchmaking after optimizing the scripts.
both used str_repeat for generating the data (I’ll leave out the non cached results).
one with 7Mb and one with 50Kb

7Mb
build in preg_match():
21.38MB - 0.0360 s

serialize():
7.4MB - 0.0239 s

strpos():
7.41MB - 0.0352 s


50Kb
build in preg_match():
0.53MB - 0.0041 s

serialize():
0.45MB - 0.0044 s

strpos():
0.46MB - 0.0045 s

serialize is clearly the memory-friendliest solution and when caching big chunks of
data its also the fastest smile

Profile
 
 
Posted: 17 November 2008 09:57 AM   [ Ignore ]   [ # 88 ]  
Grad Student
Avatar
Rank
Total Posts:  39
Joined  03-05-2007

Very nice optimization, passing the data as ref. I think we have a winner here!

I’ve updated the wiki page and added a link to this thread.


Arjen

Profile
 
 
Posted: 17 November 2008 05:26 PM   [ Ignore ]   [ # 89 ]  
Grad Student
Avatar
Rank
Total Posts:  39
Joined  03-05-2007

I’ve spoken too soon, I forgot Call-time pass-by-reference” is deprecated, so to make this work we have to change the function definition of _display():

function _display($output ''

to

function _display(&$output ''

which means we need to extend _display(), which is a rather large piece of code. This would surely be a candidate for a core update and I don’t know if adding the ref is breaking something (php4, anybody?)

I’ll revert the wiki to the last version until we sort this one out.

Arjen

Profile
 
 
Posted: 17 November 2008 05:45 PM   [ Ignore ]   [ # 90 ]  
Lab Assistant
Avatar
RankRank
Total Posts:  114
Joined  09-09-2008
Arjen van Bochoven - 17 November 2008 10:26 PM

I’ve spoken too soon, I forgot Call-time pass-by-reference” is deprecated, so to make this work we have to change the function definition of _display()

you’re right.

I wonder why my tests not triggered any warnings, I used a “recent php version”
for that.
the manual says that a warning message should have been generated.

Arjen van Bochoven - 17 November 2008 10:26 PM

I don’t know if adding the ref is breaking something (php4, anybody?)

I’ve found some interesting comment in the manual.

taking that into account the following would not work for php4

function _display(&$output ''

tricky… rolleyes

Profile
 
 
   
6 of 7
6