ASCIIFoldingTokenFilterDescriptor |
Folds a token into an ascii one.
|
ClassicTokenFilterDescriptor |
This filter removes the english possessive from the end of words, and it removes dots from acronym.
|
ClassicTokenizerDescriptor |
This tokenizer has heuristics for special treatment of acronyms, company names, email addresses, and internet host
names.
|
CommonGramsTokenFilterDescriptor |
Token filter that generates bigrams for frequently occurring terms.
|
CompoundWordTokenFilterDescriptor |
A token filter that decomposes compound words found in many Germanic languages based on dictionary.
|
HtmlStripCharFilterDescriptor |
Strip html tag.
|
KeepWordTokenFilterDescriptor |
A token filter that only keeps tokens with text contained in a predefined set of words.
|
KeywordTokenizerDescriptor |
The keyword tokenizer is a “noop” tokenizer that accepts whatever text it is given and outputs the exact same text
as a single term.
|
LetterTokenizerDescriptor |
The letter tokenizer breaks text into terms whenever it encounters a character which is not a letter.
|
LowerCaseTokenFilterDescriptor |
A token filter of type lowercase that normalizes token text to lower case.
|
MappingCharFilterDescriptor |
A char filter that maps one string to another.
|
NGramTokenizerDescriptor |
A tokenizer that produces a stream of n-gram.
|
PathHierarchyTokenizerDescriptor |
Tokenizer for path-like hierarchies.
|
PatternReplaceCharFilterDescriptor |
A char filter that replace a string matching the given pattern with the specified replacement.
|
PatternTokenizerDescriptor |
The pattern tokenizer uses a regular expression to either split text into terms whenever it matches a word separator,
or to capture matching text as terms.
|
ShingleTokenFilterDescriptor |
A token filter that constructs shingles (token n-grams) from a token stream.
|
StandardTokenFilterDescriptor |
A standard token filter.
|
StandardTokenizerDescriptor |
A standard tokenizer based on unicode segmentation standard.
|
UAXURLEmailTokenizerDescriptor |
Tokenizer is like the standard tokenizer except that it recognises URLs and email addresses as single tokens.
|
WhitespaceTokenizerDescriptor |
The whitespace tokenizer breaks text into terms whenever it encounters a whitespace character.
|