Class LuceneQueryUtil
- java.lang.Object
-
- com.atlassian.confluence.internal.search.v2.lucene.LuceneQueryUtil
-
public class LuceneQueryUtil extends Object
Utility class to help with creating queries. Also, useful for writing tests.- Since:
- 6.16.0
-
-
Constructor Summary
Constructors Constructor Description LuceneQueryUtil()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static String
safeEscape(String query)
static List<String>
tokenize(org.apache.lucene.analysis.Analyzer analyzer, String field, String value)
NOTE:The ordering of tokens in the collection, is guaranteed to be in the order the text is processed.static List<String>
tokenize(org.apache.lucene.analysis.TokenStream tokenStream)
NOTE:The ordering of tokens in the collection, is guaranteed to be in the order the text is processed.static List<List<String>>
tokenizeWithPositions(org.apache.lucene.analysis.TokenStream tokenStream)
Note before using: Tokens within the same enclosed list have the same position. The ordering of lists in the outside list, is guaranteed to reflect ordering of positions. Leading positional holes are ignored. Gaps between positions are represented by empty lists. Duplicated tokens are allowed.
-
-
-
Method Detail
-
tokenize
public static List<String> tokenize(org.apache.lucene.analysis.Analyzer analyzer, String field, String value)
NOTE:The ordering of tokens in the collection, is guaranteed to be in the order the text is processed. Duplicated tokens are allowed.e.g. "the quick brown fox the" guaranteed to be in the order ["the", "quick", "brown", "fox", "the"] if whitespace analysis is done.
For some use cases, it may also be useful to convert into an ordered set implementation, to avoid duplicates.
-
tokenize
public static List<String> tokenize(org.apache.lucene.analysis.TokenStream tokenStream)
NOTE:The ordering of tokens in the collection, is guaranteed to be in the order the text is processed. Duplicated tokens are allowed.e.g. "the quick brown fox the" guaranteed to be in the order ["the", "quick", "brown", "fox", "the"] if whitespace analysis is done.
For some use cases, it may also be useful to convert into an ordered set implementation, to avoid duplicates.
-
tokenizeWithPositions
public static List<List<String>> tokenizeWithPositions(org.apache.lucene.analysis.TokenStream tokenStream)
Note before using:- Tokens within the same enclosed list have the same position.
- The ordering of lists in the outside list, is guaranteed to reflect ordering of positions.
- Leading positional holes are ignored.
- Gaps between positions are represented by empty lists.
- Duplicated tokens are allowed.
e.g. "filename.txt notfilename" where all the tokens from filenames are in same position alongside whitespace analysis will be [["filename.txt", "filename", "txt"], ["notfilename"]]
-
-