Stanford NLP training documentpreprocessor -
does stanford nlp provide train method documentpreprocessor
train own corpora , creating own models sentence splitting?
i working german sentences , need create own german model sentence splitting tasks. therefore, need train sentence splitter, documentpreprocessor
.
is there way can it?
no. @ present, tokenization of european languages done (hand-written) finite automaton. machine learning-based tokenization used chinese , arabic. @ present, sentence splitting languages done rule, exploiting decisions of tokenizer. (of course, that's how things now, not how have be.)
at present have no separate german tokenizer/sentence splitter. current properties file re-uses english ones. sub-optimal. if wanted produce german, great have. (we may @ point, german development not @ top of list of priorities.)
Comments
Post a Comment