package

org.apache.lucene.analysis.icu.segmentation

Tokenizer that breaks text into words with the Unicode Text Segmentation algorithm.

Classes

DefaultICUTokenizerConfig	Default `ICUTokenizerConfig` that is generally applicable to many languages.
ICUTokenizer	Breaks text into words according to UAX #29: Unicode Text Segmentation (http://www.unicode.org/reports/tr29/) Words are broken across script boundaries, then segmented according to the BreakIterator and typing provided by the `ICUTokenizerConfig`
ICUTokenizerConfig	Class that allows for tailored Unicode Text Segmentation on a per-writing system basis.
LaoBreakIterator	Syllable iterator for Lao text.