DefaultICUTokenizerConfig | Default ICUTokenizerConfig that is generally applicable
to many languages. |
ICUTokenizer | Breaks text into words according to UAX #29: Unicode Text Segmentation
(http://www.unicode.org/reports/tr29/)
Words are broken across script boundaries, then segmented according to
the BreakIterator and typing provided by the |
ICUTokenizerConfig | Class that allows for tailored Unicode Text Segmentation on a per-writing system basis. |
LaoBreakIterator | Syllable iterator for Lao text. |