Issues

ZF-11719: <br> Tag handled incorrectly by Zend Search Lucene

Description

A HTML fragement like "Foo
Bar" will result in "FooBar" (as one Word) in the Lucene Index. Usually this should be 2 Words.

Patching method Zend_Search_Lucene_Document_Html::_retrieveNodeText() to this solves the issue:


    private function _retrieveNodeText(DOMNode $node, &$text)
    {
        if ($node->nodeType == XML_TEXT_NODE) {
            $text .= $node->nodeValue;
            if(!in_array($node->parentNode->tagName, $this->_inlineTags)) {
                $text .= ' ';
            }
        } else if ($node->nodeType == XML_ELEMENT_NODE  &&  $node->nodeName != 'script') {
            foreach ($node->childNodes as $childNode) {
                $text .= ' '; // patch
                $this->_retrieveNodeText($childNode, $text);
            }
        }
    }

Comments

Hello Bruno,

Thanks for the patch to contribute to ZF, but I notice that you don't appear to have a CLA on file, if you do, you should get in touch with Ralph Schindler and ask him to assign you the correct groups so that you can attach patches as an attachment rather than inline, otherwise, you should sign the cla (http://framework.zend.com/wiki/display/…) and return it before contributing code, otherwise your contributions may go unused!