Programmer's Reference Guide
| Rechercher dans l'index |
Query Language
Java Lucene and Zend_Search_Lucene provide quite powerful query language.
It mostly the same, but has some differences, which are mentioned below.
Full Java Lucene query language syntax documentation can be found » here.
Terms
A query is broken up into terms and operators. There are three types of terms: Single Terms, Phrases, and Subqueries.
A Single Term is a single word such as "test" or "hello".
A Phrase is a group of words surrounded by double quotes such as "hello dolly".
A Subquery is a query surrounded by parentheses such as "(hello dolly)".
Multiple terms can be combined together with Boolean operators to form a more complex query (see below).
Fields
Lucene supports fielded data. When performing a search you can either specify a field, or use the default field. The field names depend on indexed data and default field is defined by current settings.
The first and major difference from Java Lucene is that terms are searched through all fields by default.
There are two static methods in Zend_Search_Lucene class, which allow to operate with this settings:
<?php
$defaultSearchField = Zend_Search_Lucene::getDefaultSearchField();
...
Zend_Search_Lucene::setDefaultSearchField('contents');
null value means, that search is performed through all field. It's a default setting.
You can specify field by typing the field name followed by a colon ":" and then the term you are looking for.
As an example, let's assume a Lucene index contains two fields, title and text and text is the default field. If you want to find the document entitled "The Right Way" which contains the text "don't go this way", you can enter:
title:"The Right Way" AND text:go
or
title:"Do it right" AND go
If "text" is the default field, the field indicator is not required.
Note: The field is only valid for the term, phrase or subquery that it directly precedes, so the query
title:Do it right
null.
Wildcards
Lucene supports single and multiple character wildcard searches within single terms (not within phrase queries).
To perform a single character wildcard search use the "?" symbol.
To perform a multiple character wildcard search use the "*" symbol.
The single character wildcard search looks for terms that match that with the single character replaced. For example, to search for "text" or "test" you can use the search:
te?t
Multiple character wildcard searches looks for 0 or more characters. For example, to search for test, tests or tester, you can use the search:
test*
You can use "?", "*" or both at any place of the term:
*wr?t*
Term Modifiers
Lucene supports modifying query terms to provide a wide range of searching options.
Zend_Search_Lucene supports "~" modifier only for phrases now [1]
Range Searches
Range queries allow one to match documents whose field(s) values are between the lower and upper bound specified by the range query. Range Queries can be inclusive or exclusive of the upper and lower bounds. Sorting is done lexicographically.
mod_date:[20020101 TO 20030101]
title:{Aida TO Carmen}
Inclusive range queries are denoted by square brackets. Exclusive range queries are denoted by curly brackets.
If field is not specified then Zend_Search_Lucene searches for specified interval through all fields.
{Aida TO Carmen}
Proximity Searches
Lucene supports finding words are a within a specific distance away. To do a proximity search use the tilde, "~", symbol at the end of a Phrase. For example to search for a "Zend" and "Framework" within 10 words of each other in a document use the search:
"Zend Framework"~10
Boosting a Term
Java Lucene and Zend_Search_Lucene provide the relevance level of matching documents based on the terms found. To boost a term use the caret, "^", symbol with a boost factor (a number) at the end of the term you are searching. The higher the boost factor, the more relevant the term will be.
Boosting allows you to control the relevance of a document by boosting its term. For example, if you are searching for
PHP framework
PHP^4 framework
"PHP framework"^4 "Zend Framework"
Boolean Operators
Boolean operators allow terms to be combined through logic operators. Lucene supports AND, "+", OR, NOT and "-" as Boolean operators. Java Lucene needs boolean operators t obe ALL CAPS. Zend_Search_Lucene doesn't.
Boolean operators allow terms to be combined through logic operators. Lucene supports AND, "+", OR, NOT and "-" as Boolean operators.
AND, OR, and NOT operators and "+", "-" defines two styles to construct boolean queries. As opposed to Java Lucene Zend_Search_Lucene doesn't allow to mix these two styles.
If AND/OR/NOT style is used, then AND/OR operator must be present between all query terms. Each term may also be preceded by NOT operator. AND operator has higher precedence than OR. It differs from Java Lucene behavior.
AND
The AND operator means, that all terms in "AND group" must match document.
To search for documents that contain "PHP framework" and "Zend Framework" use the query:
"PHP framework" AND "Zend Framework"
OR
The OR operator divides query into several optional parts.
To search for documents that contain "PHP framework" or "Zend Framework" use the query:
"PHP framework" OR "Zend Framework"
NOT
The NOT operator excludes documents that contain the term after NOT. But "AND group", which contains only terms with NOT operator, gives empty result instead of full set indexed documents.
To search for documents that contain "PHP framework" but not "Zend Framework" use the query:
"PHP framework" AND NOT "Zend Framework"
&&, ||, and ! operators
&&, ||, and ! may be used instead of AND, OR, and NOT operators.
+
The "+" or required operator requires that the term after the "+" symbol must match the document.
To search for documents that must contain "Zend" and may contain "Framework" use the query:
+Zend Framework
-
The "-" or prohibit operator excludes documents that matches the term after the "-" symbol.
To search for documents that contain "PHP framework" but not "Zend Framework" use the query:
"PHP framework" -"Zend Framework"
no operator
If no operator is used, then behavior is defined by "default boolean operator".
It's OR by default.
That means, that term is optional. It may be or may not be presented within document, but documents with this term will have higher score.
To search for documents that requires "PHP framework" and may contain "Zend Framework" use the query:
+"PHP framework" "Zend Framework"
Default boolean operator may be set or retrieved with
Zend_Search_Lucene_Search_QueryParser::setDefaultOperator($operator) and
Zend_Search_Lucene_Search_QueryParser::getDefaultOperator() methods.
These methods operate with
Zend_Search_Lucene_Search_QueryParser::B_AND and
Zend_Search_Lucene_Search_QueryParser::B_OR constants.
Grouping
Java Lucene and Zend_Search_Lucene support using parentheses to group clauses to form sub queries. This can be useful if you want to control the boolean logic for a query or mix different boolean query styles:
+(framework OR library) +php
Field Grouping
Lucene supports using parentheses to group multiple clauses to a single field.
To search for a title that contains both the word "return" and the phrase "pink panther" use the query:
title:(+return +"pink panther")
Escaping Special Characters
Lucene supports escaping special characters that are part of the query syntax. The current list special characters are:
+ - && || ! ( ) { } [ ] ^ " ~ * ? : \
+ and - inside single terms are treated as common characters.
To escape these character use the \ before the character. For example to search for (1+1):2 use the query:
\(1\+1\)\:2
| Rechercher dans l'index |
