Zend Framework

Undefined offset notice in Search/Lucene/Search/Query/MultiTerm.php

Details

  • Type: Bug Bug
  • Status: Resolved Resolved
  • Priority: Critical Critical
  • Resolution: Fixed
  • Affects Version/s: 1.7.1, 1.7.2, 1.9.2
  • Fix Version/s: 1.10.8
  • Component/s: Zend_Search_Lucene
  • Labels:
    None

Description

When doing a search using boolean operators and more than one search terms:

+PHP +Zend

Or

PHP AND Zend

A PHP notice is thrown: PHP Notice: Undefined offset: 2511 in /path/to/Zend/Search/Lucene/Search/Query/MultiTerm.php on line 467

Please note that the number 2511 changes for each "hit". The first notice has the lowest number, and the last notice the highest number.

I've not been able to spot any problems relating to this notice, other than the fact that it's quite annoying to look at in the errorlog.

Issue Links

Activity

Hide
Carl Simons added a comment -

I just received same error. It looks like it's a problem with my data.

Notice: Undefined offset: 39996 in Z:\Search\Lucene\Search\Query\MultiTerm.php on line 467

My table has 50 rows, some row's data (because of encoding and hyphenations) seem to have characters which aren't representable or convertible to any latin characters, I'm not sure if that is a prob, but anyways,

Here is an example of my search, the # is actually a Danish character, O with a slash thru it, but because of encoding probs, shows up as >> :

Koffe S#rensen A

In search/query/Boolean.php I did a print_r($this -> _subqueries) and here are some highlights:

-----------------------------------------------------
-----------------------------------------------------
[_terms:private] => Array
(
[0] => Zend_Search_Lucene_Index_Term Object
(
[field] => name
[text] => rensen
)

[1] => Zend_Search_Lucene_Index_Term Object
(
[field] => name
[text] => s
)

[2] => Zend_Search_Lucene_Index_Term Object
(
[field] => name
[text] => a
)

)

-----------------------------------------------------
-----------------------------------------------------

[_termInfoCache:private] => Array
(
[name�s] =>
[name�rensen] =>
[name�koffe] =>
[name�a] => Zend_Search_Lucene_Index_TermInfo Object
(
[docFreq] => 4
[freqPointer] => 0
[proxPointer] => 0
[skipOffset] => 0
[indexPointer] =>
)

)

-----------------------------------------------------
-----------------------------------------------------

I don't get this error when I do the same search with data from the same table that is all normal latin chars.

Show
Carl Simons added a comment - I just received same error. It looks like it's a problem with my data. Notice: Undefined offset: 39996 in Z:\Search\Lucene\Search\Query\MultiTerm.php on line 467 My table has 50 rows, some row's data (because of encoding and hyphenations) seem to have characters which aren't representable or convertible to any latin characters, I'm not sure if that is a prob, but anyways, Here is an example of my search, the # is actually a Danish character, O with a slash thru it, but because of encoding probs, shows up as >> : Koffe S#rensen A In search/query/Boolean.php I did a print_r($this -> _subqueries) and here are some highlights: ----------------------------------------------------- ----------------------------------------------------- [_terms:private] => Array ( [0] => Zend_Search_Lucene_Index_Term Object ( [field] => name [text] => rensen ) [1] => Zend_Search_Lucene_Index_Term Object ( [field] => name [text] => s ) [2] => Zend_Search_Lucene_Index_Term Object ( [field] => name [text] => a ) ) ----------------------------------------------------- ----------------------------------------------------- [_termInfoCache:private] => Array ( [name�s] => [name�rensen] => [name�koffe] => [name�a] => Zend_Search_Lucene_Index_TermInfo Object ( [docFreq] => 4 [freqPointer] => 0 [proxPointer] => 0 [skipOffset] => 0 [indexPointer] => ) ) ----------------------------------------------------- ----------------------------------------------------- I don't get this error when I do the same search with data from the same table that is all normal latin chars.
Hide
Garth Gillespie added a comment -

I'm seeing this too with 1.7.7. Upgrading from 1.6.2 - so indexes were created in 1.6.2. On a site where there is likely non-latin UTF-8 content in the index I see this - where content is strictly latin characters I'm not seeing it. Does this mean a switch in the analyzer to utf8 and i have to compile in the php mb library (which so far I have avoided)?

Show
Garth Gillespie added a comment - I'm seeing this too with 1.7.7. Upgrading from 1.6.2 - so indexes were created in 1.6.2. On a site where there is likely non-latin UTF-8 content in the index I see this - where content is strictly latin characters I'm not seeing it. Does this mean a switch in the analyzer to utf8 and i have to compile in the php mb library (which so far I have avoided)?
Hide
Garth Gillespie added a comment -

more information

a development server where this works does have mbstring compiled into php. this error appears on the production server without mbstring. also downloaded 1.7.0 and the same error appears there - so this was introduced between 1.6.2 and 1.7.0.

i looked at the index with Luke and the Undefined offset numbers correspond to the Doc. Id in Luke.

Each time it throws the 'Notice' warning the result corresponding to the Doc Id is not returned.

Show
Garth Gillespie added a comment - more information a development server where this works does have mbstring compiled into php. this error appears on the production server without mbstring. also downloaded 1.7.0 and the same error appears there - so this was introduced between 1.6.2 and 1.7.0. i looked at the index with Luke and the Undefined offset numbers correspond to the Doc. Id in Luke. Each time it throws the 'Notice' warning the result corresponding to the Doc Id is not returned.
Hide
Garth Gillespie added a comment -

new php with or without mbstring didn't make a difference on existing index

Show
Garth Gillespie added a comment - new php with or without mbstring didn't make a difference on existing index
Hide
Garth Gillespie added a comment -

last thing I can think of for the day - this happens when searching both multiple terms in the same field as well as a separate single terms in two different fields.

ie.

content:foo AND content:bar = error

title:foo AND content:bar = error

content:bar = ok

Show
Garth Gillespie added a comment - last thing I can think of for the day - this happens when searching both multiple terms in the same field as well as a separate single terms in two different fields. ie. content:foo AND content:bar = error title:foo AND content:bar = error content:bar = ok
Hide
Garth Gillespie added a comment -

Is this related to ZF-5554?

Show
Garth Gillespie added a comment - Is this related to ZF-5554?
Hide
Gianluca Zumaglini added a comment -

quick hack for this issue, checks if [$termId][$docId] isset in _termsFreqs array.

replace codeblock starting at line 472 in MultiTerm.php with this block ...

if (isset($this->_termsFreqs[$termId][$docId])) {			
	$score += $reader->getSimilarity()->tf($this->_termsFreqs[$termId][$docId]) *
	$this->_weights[$termId]->getValue() *
	$reader->norm($docId, $term->field);					  
}
else {
	$score += $this->_weights[$termId]->getValue() *
	$reader->norm($docId, $term->field);
}
Show
Gianluca Zumaglini added a comment - quick hack for this issue, checks if [$termId][$docId] isset in _termsFreqs array. replace codeblock starting at line 472 in MultiTerm.php with this block ...
if (isset($this->_termsFreqs[$termId][$docId])) {			
	$score += $reader->getSimilarity()->tf($this->_termsFreqs[$termId][$docId]) *
	$this->_weights[$termId]->getValue() *
	$reader->norm($docId, $term->field);					  
}
else {
	$score += $this->_weights[$termId]->getValue() *
	$reader->norm($docId, $term->field);
}
Hide
Jachim Coudenys added a comment -

The issue is still occuring in version 1.9.2.

The snippet of Gianluca Zumaglini is indeed 'fixing' the notice and actually displaying the results again, but I don't know why the $this->_termsFreqs becomes empty.

Show
Jachim Coudenys added a comment - The issue is still occuring in version 1.9.2. The snippet of Gianluca Zumaglini is indeed 'fixing' the notice and actually displaying the results again, but I don't know why the $this->_termsFreqs becomes empty.
Hide
Hying added a comment -

Hi,
till version 1.10.2, the $docsFilter initialized in line 343 of MultiTerm.php is used a second time in line 352. This second usage reduce the objects in this instance.
The same $docsFilter instance is later used in the funktion termDocs of the class SegmentInfo and reduce the results calling somthing like "if (isset($filter[$docId])) {"

Adding "$docsFilter = new Zend_Search_Lucene_Index_DocsFilter();" in line 351 of MultiTerm.ph resolves the problem.

Show
Hying added a comment - Hi, till version 1.10.2, the $docsFilter initialized in line 343 of MultiTerm.php is used a second time in line 352. This second usage reduce the objects in this instance. The same $docsFilter instance is later used in the funktion termDocs of the class SegmentInfo and reduce the results calling somthing like "if (isset($filter[$docId])) {" Adding "$docsFilter = new Zend_Search_Lucene_Index_DocsFilter();" in line 351 of MultiTerm.ph resolves the problem.
Hide
Alexander Veremyev added a comment -

Fixed.

Show
Alexander Veremyev added a comment - Fixed.

People

Vote (11)
Watch (11)

Dates

  • Created:
    Updated:
    Resolved: