ZF-3743: ShortWords token filter not working with utf-8 charset
Description
When using the ShortWords token filter with the UTF-8 Analyser, it fails to skip tokens containing UTF-8 characters.
For example, with a length of 2, the token "à" (common in french) is not skipped because strlen returns 2.
The solution would be to make a ShortWordsUtf8 that uses iconv_strlen instead of strlen.
Comments
Posted by Hugues Lismonde (hlidotbe) on 2008-07-24T12:46:07.000+0000
Working ShortWordsUtf8 using iconv_strlen instead of strlen (based on ShortWord.php from release-1.5.2)
Posted by Rob Allen (rob) on 2012-11-20T20:52:37.000+0000
Bulk change of all issues last updated before 1st January 2010 as "Won't Fix".
Feel free to re-open and provide a patch if you want to fix this issue.