|
|||||||||||
PREV NEXT | FRAMES NO FRAMES |
Packages that use Tokenizer | |
org.apache.lucene.analysis | API and code to convert text into indexable tokens. |
org.apache.lucene.analysis.cjk | Analyzer for Chinese, Japanese and Korean. |
org.apache.lucene.analysis.cn | Analyzer for Chinese. |
org.apache.lucene.analysis.ru | Analyzer for Russian. |
org.apache.lucene.analysis.standard | A grammar-based tokenizer constructed with JavaCC. |
Uses of Tokenizer in org.apache.lucene.analysis |
Subclasses of Tokenizer in org.apache.lucene.analysis | |
class |
CharTokenizer
An abstract base class for simple, character-oriented tokenizers. |
class |
KeywordTokenizer
Emits the entire input as a single token. |
class |
LetterTokenizer
A LetterTokenizer is a tokenizer that divides text at non-letters. |
class |
LowerCaseTokenizer
LowerCaseTokenizer performs the function of LetterTokenizer and LowerCaseFilter together. |
class |
WhitespaceTokenizer
A WhitespaceTokenizer is a tokenizer that divides text at whitespace. |
Uses of Tokenizer in org.apache.lucene.analysis.cjk |
Subclasses of Tokenizer in org.apache.lucene.analysis.cjk | |
class |
CJKTokenizer
CJKTokenizer was modified from StopTokenizer which does a decent job for most European languages. |
Uses of Tokenizer in org.apache.lucene.analysis.cn |
Subclasses of Tokenizer in org.apache.lucene.analysis.cn | |
class |
ChineseTokenizer
Title: ChineseTokenizer Description: Extract tokens from the Stream using Character.getType() Rule: A Chinese character as a single token Copyright: Copyright (c) 2001 Company: The difference between thr ChineseTokenizer and the CJKTokenizer (id=23545) is that they have different token parsing logic. |
Uses of Tokenizer in org.apache.lucene.analysis.ru |
Subclasses of Tokenizer in org.apache.lucene.analysis.ru | |
class |
RussianLetterTokenizer
A RussianLetterTokenizer is a tokenizer that extends LetterTokenizer by additionally looking up letters in a given "russian charset". |
Uses of Tokenizer in org.apache.lucene.analysis.standard |
Subclasses of Tokenizer in org.apache.lucene.analysis.standard | |
class |
StandardTokenizer
A grammar-based tokenizer constructed with JavaCC. |
|
|||||||||||
PREV NEXT | FRAMES NO FRAMES |