Part of the EllisLab Network

Bug Report

min_length & max_length utf-8 problem

Date: 06/29/2008 Severity: Minor
Status: Resolved Reporter: Zver1992
Version: 1.6.3
Keywords: Libraries, Validation Class

Description

Standart PHP function “strlen()“ count length of some utf-8 strings (russian language and some languages cyryllic too) wrong. This function count 1 russian symbol = 2. In manual ( http://www.php.net/strlen ) im found correctly function. Examples:

<?php
  function strlen_utf8($str)
  {
      $i = 0;
      $count = 0;
      $len = strlen($str);
      while($i < $len) {
        $chr = ord($str[$i]);
        $count++;
        $i++;
        if($i >= $len) {
          break;
        }
        if($chr & 0x80) {
          $chr <<= 1;
          while($chr & 0x80) {
              $i++;
              $chr <<= 1;
          }
        }
      }
      return $count;
  }
  $string = ‘авбгд‘;
  echo strlen($string); //10, wrong
  echo strlen_utf8($string); //5, right
?>

Im replace all strlen’s for strlen_utf8 in my Validation Class and my script work correctly now smile

Code Sample

<?php
    
function min_length($str, $val)
    
{
        
if (preg_match("/[^0-9]/", $val))
        
{
            
return FALSE;
        
}
    
        
return (strlen($str) < $val) ? FALSE : TRUE;
    
}
    $check
= min_length('авбгд', 6);
?>

Expected Result

FALSE

Actual Result

TRUE

Comment on Bug Report

Page 1 of 1 pages
Posted by: Sam Dark on 30 June 2008 10:29am
Sam Dark's avatar

It’s not only with strlen. As I know, there are also problems with:

trim, ltrim, rtrim
str_ireplace
ord
str_pad
str_split
strcasecmp
strcspn
stristr
strlen
strpos
strrev
strrpos
strspn
strtolower, strtoupper
substr
substr_replace
ucfirst, ucwords

Posted by: inparo on 1 July 2008 8:46am
inparo's avatar

That is what the multibite string functions are for.

So in your case it would be a simple mb_strlen($str, “UTF-8”).

Remember that you can add your own validation rules by extending the class, so there’s no need to alter core files.

Posted by: Sam Dark on 1 July 2008 9:05am
Sam Dark's avatar

inparo, multibite string functions aren’t perfect either.

Posted by: inparo on 1 July 2008 9:55am
inparo's avatar

No, but they solve the ‘bug’ outlined above.

Posted by: Mark (Germany) on 10 July 2008 5:28am
no avatar

Woudn’t it be way nicer to let CodeI. handle it?
So you can just use the strlen method and it handles itself, which function is the right one?
this can be done - like in other cms or cakephp - by simply setting the UTF8 parameter in config - and then checking if the mb_ functions are availible and using them instead of the primary ones.

thats, what you usually would expect using this functions.

Posted by: Sam Dark on 10 July 2008 5:34am
Sam Dark's avatar

This can be really useful.

Name:

Email:

Location:

URL:

Remember my personal information

Notify me of follow-up comments?