Zend Framework

skipData processing

Details

  • Type: Improvement Improvement
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Fixed
  • Affects Version/s: 1.0.0, 1.0.2
  • Fix Version/s: 1.7.0
  • Component/s: Zend_Search_Lucene
  • Labels:
    None
  • Fix Version Priority:
    Should Have

Description

skipData information is stored within Lucene index.
It's actually a "sub-index" of term documents list

Processing this info may help with a performance of some special query types.

If we process phrase query or multiterm query with several required terms and one term has very low selectivity (high cardinality), then we can process other terms first to limit result set. SkipData processing allows to avoid full document list scan for these high cardinality terms.

That makes sense in the case of huge indices (hundreds of thousands documents) and queries with terms having extremely low selectivity ('a', 'the', 'in', 'is', ...)
StopWords analyzer may be used as workaround for this problem.

Activity

Hide
Wil Sinclair added a comment -

I believe this was implemented for 1.7. If not, please reopen, Alex.

Show
Wil Sinclair added a comment - I believe this was implemented for 1.7. If not, please reopen, Alex.
Hide
Alexander Veremyev added a comment -

Yeah. That's closed. There were some additional ideas concerning skipData usage for performance improvement, so I didn't close these issues.

But I'll create another issue if these ideas become more concrete.

Show
Alexander Veremyev added a comment - Yeah. That's closed. There were some additional ideas concerning skipData usage for performance improvement, so I didn't close these issues. But I'll create another issue if these ideas become more concrete.

People

Vote (0)
Watch (0)

Dates

  • Created:
    Updated:
    Resolved:

Time Tracking

Estimated:
2w 3d
Original Estimate - 2 weeks, 3 days
Remaining:
2w 3d
Remaining Estimate - 2 weeks, 3 days
Logged:
Not Specified
Time Spent - Not Specified