Issues

ZF-10891: Zend_Filter_StringTrim does not work correctly with a multibyte string (e.g. Fullwidth Space)

Description

Zend_Filter_StringTrim::_unicodeTrim has a problem with multibyte string.

This problem is fixed in Related issue ZF-7023, but incomplete.

When the multibyte character string of UTF8 is used, it is necessary to specify "u" qualifier in the preg_replace function of PHP. So it works correctly by adding "u" to 1st parameter of preg_replace.

diff --git a/library/Zend/Filter/StringTrim.php b/library/Zend/Filter/StringTrim.php
index f9a2bb7..b786f8b 100644
--- a/library/Zend/Filter/StringTrim.php
+++ b/library/Zend/Filter/StringTrim.php
@@ -119,6 +119,6 @@ class Zend_Filter_StringTrim implements Zend_Filter_Interface
         );  
 
         $pattern = '^[' . $chars . ']*|[' . $chars . ']*$';
-        return preg_replace("/$pattern/sSD", '', $value);
+        return preg_replace("/$pattern/sSDu", '', $value);
     }   
 }

Comments

This issue seems to happen only in Japanese for me.

Akihiro, do you confirm to happen on other languages? For example, Chinese, Korean.

( And Thai also might have same problem)

If Chinese and Korean have same solution, we could make a model of Zend_filter_Alpha for changing several expressions and using them.

Fixed with GH-107

Certainly, it happens in Japanese. In Korean and Chinese, the same problem was not able to be confirmed (Though the confirmation might be not enough).

However, I was embarrassed as a Japanese user, and I am glad of fix.

Thank you.

Zend Framework minimal version 1.11.11 Same error. Version of the StringTrim.php yonger than the message about fix - * @version $Id: StringTrim.php 23775 2011-03-01 17:25:24Z ralph $

Preg_replace - return preg_replace("/$pattern/sSD", '', $value); - without latter u

Reopen, because this fix is not applied to ZF1 trunk & 1.11.11. And "u" qualifier patch is not correct.Please see ZF2-170 http://framework.zend.com/issues/browse/ZF2-170

I do not think /u switch resolves this issue, because the "Fullwidth Space" is not based on international rule, but Japanese specific rule.

I think if we would make this filter fit for Japanese, It will be necessary to make conditional branching by locale as following, or to make new inherited class.


$this->_locale = new Zend_Locale('auto');

switch ($this->_locale->getLanguage()) {
    case 'ja':
        //Japanese specific rule
        $chars = preg_replace...;
        $pattern = ...;
        return preg_replace...;
    break;
/*
 if there would exist another character system, it would fall on here in future.
  case 'xx':
      ...;
    break;
*/
  default:
      //original statement
        $chars = preg_replace...;
        $pattern = ...;
        return preg_replace...;
    break;
}

Thanks.