com.atlassian.confluence.search.didyoumean.lucene.tokenizers
Class BodyNGramTokenizer

java.lang.Object
  extended by org.apache.lucene.util.AttributeSource
      extended by org.apache.lucene.analysis.TokenStream
          extended by org.apache.lucene.analysis.Tokenizer
              extended by com.atlassian.confluence.search.didyoumean.lucene.tokenizers.BodyNGramTokenizer

public class BodyNGramTokenizer
extends org.apache.lucene.analysis.Tokenizer

Adaptation of NGramTokenizer that returns all n-grams excluding the edge n-grams.


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
org.apache.lucene.util.AttributeSource.AttributeFactory, org.apache.lucene.util.AttributeSource.State
 
Field Summary
static int DEFAULT_MAX_NGRAM_SIZE
           
static int DEFAULT_MIN_NGRAM_SIZE
           
 
Fields inherited from class org.apache.lucene.analysis.Tokenizer
input
 
Constructor Summary
BodyNGramTokenizer(Reader input)
          Creates NGramTokenizer with default min and max n-grams.
BodyNGramTokenizer(Reader input, int minGram, int maxGram)
          Creates NGramTokenizer with given min and max n-grams.
 
Method Summary
 org.apache.lucene.analysis.Token next()
          Returns the next token in the stream, or null at EOS.
 
Methods inherited from class org.apache.lucene.analysis.Tokenizer
close, correctOffset, reset
 
Methods inherited from class org.apache.lucene.analysis.TokenStream
end, getOnlyUseNewAPI, incrementToken, next, reset, setOnlyUseNewAPI
 
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, restoreState, toString
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

DEFAULT_MIN_NGRAM_SIZE

public static final int DEFAULT_MIN_NGRAM_SIZE
See Also:
Constant Field Values

DEFAULT_MAX_NGRAM_SIZE

public static final int DEFAULT_MAX_NGRAM_SIZE
See Also:
Constant Field Values
Constructor Detail

BodyNGramTokenizer

public BodyNGramTokenizer(Reader input,
                          int minGram,
                          int maxGram)
Creates NGramTokenizer with given min and max n-grams.

Parameters:
input - Reader holding the input to be tokenized
minGram - the smallest n-gram to generate
maxGram - the largest n-gram to generate

BodyNGramTokenizer

public BodyNGramTokenizer(Reader input)
Creates NGramTokenizer with default min and max n-grams.

Parameters:
input - Reader holding the input to be tokenized
Method Detail

next

public final org.apache.lucene.analysis.Token next()
                                            throws IOException
Returns the next token in the stream, or null at EOS.

Overrides:
next in class org.apache.lucene.analysis.TokenStream
Throws:
IOException


Copyright © 2003-2012 Atlassian. All Rights Reserved.