new Tokenizer(dic)
Tokenizer
Parameters:
Name | Type | Description |
---|---|---|
dic |
DynamicDictionaries | Dictionaries used by this tokenizer |
Methods
-
<static> splitByPunctuation(input)
-
Split into sentence by punctuation
Parameters:
Name Type Description input
string Input text
Returns:
Sentences end with punctuation
- Type
- Array.<string>
-
getLattice(text)
-
Build word lattice
Parameters:
Name Type Description text
string Input text to analyze
Returns:
Word lattice
- Type
- ViterbiLattice
-
tokenize(text)
-
Tokenize text
Parameters:
Name Type Description text
string Input text to analyze
Returns:
Tokens
- Type
- Array