Class PatternTokenizerDescriptor

  • All Implemented Interfaces:
    TokenizerDescriptor

    public class PatternTokenizerDescriptor
    extends Object
    implements TokenizerDescriptor
    The pattern tokenizer uses a regular expression to either split text into terms whenever it matches a word separator, or to capture matching text as terms.
    Since:
    7.0
    • Constructor Detail

      • PatternTokenizerDescriptor

        public PatternTokenizerDescriptor​(Pattern pattern,
                                          int group)
        Parameters:
        pattern - A Java regular expression.
        group - Which capture group to extract as tokens. Defaults to -1 (split).
    • Method Detail

      • getPattern

        public Pattern getPattern()
      • getGroup

        public int getGroup()