public class LuceneTokenizer extends Object
Modifier and Type | Class and Description |
---|---|
static class |
LuceneTokenizer.TokenizerType |
Constructor and Description |
---|
LuceneTokenizer(String content,
LuceneTokenizer.TokenizerType tokenizer,
boolean useStopFilter,
LuceneAnalyzerUtil.StemFilterType stemFilterType)
Creates a tokenizer based on param values
|
LuceneTokenizer(String content,
LuceneTokenizer.TokenizerType tokenizer,
List<String> stopWords,
boolean addToDefault,
LuceneAnalyzerUtil.StemFilterType stemFilterType)
Creates a tokenizer based on param values
|
LuceneTokenizer(String content,
LuceneTokenizer.TokenizerType tokenizer,
LuceneAnalyzerUtil.StemFilterType stemFilterType,
int mingram,
int maxgram)
Creates a tokenizer for the ngram model based on param values
|
Modifier and Type | Method and Description |
---|---|
TokenStream |
getTokenStream()
Returns the tokenStream created by the Tokenizer
|
public LuceneTokenizer(String content, LuceneTokenizer.TokenizerType tokenizer, boolean useStopFilter, LuceneAnalyzerUtil.StemFilterType stemFilterType)
content
- - The text to tokenizetokenizer
- - the type of tokenizer to use CLASSIC or DEFAULTuseStopFilter
- - if set to true the token stream will be filtered using default Lucene stopsetstemFilterType
- - Type of stemming to performpublic LuceneTokenizer(String content, LuceneTokenizer.TokenizerType tokenizer, List<String> stopWords, boolean addToDefault, LuceneAnalyzerUtil.StemFilterType stemFilterType)
content
- - The text to tokenizetokenizer
- - the type of tokenizer to use CLASSIC or DEFAULTstopWords
- - Provide a set of user defined stop wordsaddToDefault
- - If set to true, the stopSet words will be added to the Lucene default stop set.
If false, then only the user provided words will be used as the stop setstemFilterType
- public LuceneTokenizer(String content, LuceneTokenizer.TokenizerType tokenizer, LuceneAnalyzerUtil.StemFilterType stemFilterType, int mingram, int maxgram)
content
- - The text to tokenizetokenizer
- - the type of tokenizer to use CLASSIC or DEFAULTstemFilterType
- - Type of stemming to performmingram
- - Value of mingram for tokenizingmaxgram
- - Value of maxgram for tokenizingCopyright © 2021 The Apache Software Foundation