package

org.apache.lucene.analysis.icu.segmentation

Tokenizer that breaks text into words with the Unicode Text Segmentation algorithm.

Classes

DefaultICUTokenizerConfig Default ICUTokenizerConfig that is generally applicable to many languages. 
ICUTokenizer Breaks text into words according to UAX #29: Unicode Text Segmentation (http://www.unicode.org/reports/tr29/)

Words are broken across script boundaries, then segmented according to the BreakIterator and typing provided by the ICUTokenizerConfig

 
ICUTokenizerConfig Class that allows for tailored Unicode Text Segmentation on a per-writing system basis. 
LaoBreakIterator Syllable iterator for Lao text.