Class: Tokenizer

Tokenizer


new Tokenizer(dic)

Tokenizer

Parameters:
Name Type Description
dic DynamicDictionaries

Dictionaries used by this tokenizer

Methods


<static> splitByPunctuation(input)

Split into sentence by punctuation

Parameters:
Name Type Description
input string

Input text

Returns:

Sentences end with punctuation

Type
Array.<string>

getLattice(text)

Build word lattice

Parameters:
Name Type Description
text string

Input text to analyze

Returns:

Word lattice

Type
ViterbiLattice

tokenize(text)

Tokenize text

Parameters:
Name Type Description
text string

Input text to analyze

Returns:

Tokens

Type
Array