Class LuceneQueryUtil


  • public class LuceneQueryUtil
    extends Object
    Utility class to help with creating queries. Also, useful for writing tests.
    Since:
    6.16.0
    • Constructor Summary

      Constructors 
      Constructor Description
      LuceneQueryUtil()  
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static String safeEscape​(String query)  
      static List<String> tokenize​(org.apache.lucene.analysis.Analyzer analyzer, String field, String value)
      NOTE:The ordering of tokens in the collection, is guaranteed to be in the order the text is processed.
      static List<String> tokenize​(org.apache.lucene.analysis.TokenStream tokenStream)
      NOTE:The ordering of tokens in the collection, is guaranteed to be in the order the text is processed.
      static List<List<String>> tokenizeWithPositions​(org.apache.lucene.analysis.TokenStream tokenStream)
      Note before using: Tokens within the same enclosed list have the same position. The ordering of lists in the outside list, is guaranteed to reflect ordering of positions. Leading positional holes are ignored. Gaps between positions are represented by empty lists. Duplicated tokens are allowed.
    • Constructor Detail

      • LuceneQueryUtil

        public LuceneQueryUtil()
    • Method Detail

      • tokenize

        public static List<String> tokenize​(org.apache.lucene.analysis.Analyzer analyzer,
                                            String field,
                                            String value)
        NOTE:The ordering of tokens in the collection, is guaranteed to be in the order the text is processed. Duplicated tokens are allowed.

        e.g. "the quick brown fox the" guaranteed to be in the order ["the", "quick", "brown", "fox", "the"] if whitespace analysis is done.

        For some use cases, it may also be useful to convert into an ordered set implementation, to avoid duplicates.

      • tokenize

        public static List<String> tokenize​(org.apache.lucene.analysis.TokenStream tokenStream)
        NOTE:The ordering of tokens in the collection, is guaranteed to be in the order the text is processed. Duplicated tokens are allowed.

        e.g. "the quick brown fox the" guaranteed to be in the order ["the", "quick", "brown", "fox", "the"] if whitespace analysis is done.

        For some use cases, it may also be useful to convert into an ordered set implementation, to avoid duplicates.

      • tokenizeWithPositions

        public static List<List<String>> tokenizeWithPositions​(org.apache.lucene.analysis.TokenStream tokenStream)
        Note before using:
        • Tokens within the same enclosed list have the same position.
        • The ordering of lists in the outside list, is guaranteed to reflect ordering of positions.
        • Leading positional holes are ignored.
        • Gaps between positions are represented by empty lists.
        • Duplicated tokens are allowed.

        e.g. "filename.txt notfilename" where all the tokens from filenames are in same position alongside whitespace analysis will be [["filename.txt", "filename", "txt"], ["notfilename"]]

      • safeEscape

        public static String safeEscape​(String query)