ZF-3743: ShortWords token filter not working with utf-8 charset

Description

When using the ShortWords token filter with the UTF-8 Analyser, it fails to skip tokens containing UTF-8 characters.

For example, with a length of 2, the token "à" (common in french) is not skipped because strlen returns 2.

The solution would be to make a ShortWordsUtf8 that uses iconv_strlen instead of strlen.

Comments

Working ShortWordsUtf8 using iconv_strlen instead of strlen (based on ShortWord.php from release-1.5.2)

Bulk change of all issues last updated before 1st January 2010 as "Won't Fix".

Feel free to re-open and provide a patch if you want to fix this issue.