com.atlassian.jira.issue.index.analyzer
Class CJKTokenizer

java.lang.Object
  extended by org.apache.lucene.analysis.TokenStream
      extended by org.apache.lucene.analysis.Tokenizer
          extended by com.atlassian.jira.issue.index.analyzer.CJKTokenizer

public final class CJKTokenizer
extends org.apache.lucene.analysis.Tokenizer

CJKTokenizer was modified from StopTokenizer which does a decent job for most European languages. It performs other token methods for double-byte Characters: the token will return at each two charactors with overlap match.
Example: "java C1C2C3C4" will be segment to: "java" "C1C2" "C2C3" "C3C4" it also need filter filter zero length token ""
for Digit: digit, '+', '#' will token as letter
for more info on Asia language(Chinese Japanese Korean) text segmentation: please search google

Author:
Che, Dong

Field Summary
 
Fields inherited from class org.apache.lucene.analysis.Tokenizer
input
 
Constructor Summary
CJKTokenizer(Reader in)
          Construct a token stream processing the given input.
 
Method Summary
 org.apache.lucene.analysis.Token next()
          Returns the next token in the stream, or null at EOS.
 
Methods inherited from class org.apache.lucene.analysis.Tokenizer
close
 
Methods inherited from class org.apache.lucene.analysis.TokenStream
reset
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CJKTokenizer

public CJKTokenizer(Reader in)
Construct a token stream processing the given input.

Parameters:
in - I/O reader
Method Detail

next

public final org.apache.lucene.analysis.Token next()
                                            throws IOException
Returns the next token in the stream, or null at EOS. See http://java.sun.com/j2se/1.3/docs/api/java/lang/Character.UnicodeBlock.html for detail.

Specified by:
next in class org.apache.lucene.analysis.TokenStream
Returns:
Token
Throws:
IOException - - throw IOException when read error
hanppened in the InputStream


Copyright © 2002-2007 Atlassian. All Rights Reserved.