package

org.apache.lucene.analysis.cn

Analyzer for Chinese, which indexes unigrams (individual chinese characters).

Three analyzers are provided for Chinese, each of which treats Chinese text in a different way.

StandardAnalyzer: Index unigrams (individual Chinese characters) as a token.
CJKAnalyzer (in the analyzers/cjk package): Index bigrams (overlapping groups of two adjacent Chinese characters) as tokens.
SmartChineseAnalyzer (in the analyzers/smartcn package): Index words (attempt to segment Chinese text into words) as tokens.

Example phrase： "我是中国人"

Classes

ChineseAnalyzer	This class is deprecated. Use `StandardAnalyzer` instead, which has the same functionality. This analyzer will be removed in Lucene 5.0
ChineseFilter	This class is deprecated. Use `StopFilter` instead, which has the same functionality. This filter will be removed in Lucene 5.0
ChineseTokenizer	This class is deprecated. Use `StandardTokenizer` instead, which has the same functionality. This filter will be removed in Lucene 5.0