ZF-7736: Re-design of Highlight Term gathering by use of a temporary in-memory Indexer (patch)
Search result highlighting in ZEND Lucene is a problem.
The way Lucene gathers to be highlighted strings, by cascading through internal highlightMatches() methods in all the different Zend_Search_Lucene_Search_Query... classes, often re-tokenizing the entire to be highlighted text, is very slow, error prone, and usually duplicates code from the rewrite() methods in the same classes.
This method of doing things is also is a blocker for some nifty features I did with other patches, dealing with multi-token ability (among th cool features related to stem and synonym searches that are listed in the current code behind @todo: lines) since they require calls to the rewrite() method of newly created sub queries, which is no problem when coming from a rewrite() method, but does not work within _highlightMatches() due to the unavailability of an indexer.
No longer !
This patch re-designs how highlighting terms are gathered, simply using rewrite()->getQueryTerms().
To be able to rewrite in the first place, the to be highlighted text needs to be stored in an index. While in 99.99% of all cases those texts would be in the main search index, the class definition calls for the highlight function to be index independent.
Rightfully so :)
Therefore I wrote a "dummy index class", which implements the whole Zend_Search_Lucene_Interface, but does not write to a real Lucene index on disk, instead stores an index simulated by PHP assotiative arrays in memory.
I gave Query->highlightMatches an optional 5th parameter to supply a different index to be used in rewrite(). Without that, the TempIndex is instantiated, the to be highlighted document added with ->addDocument, and then used for rewrite().
Then I removed all the _highlightMatches() functions from all Query Classes ;)