Stanford NLP training documentpreprocessor -


does stanford nlp provide train method documentpreprocessor train own corpora , creating own models sentence splitting?

i working german sentences , need create own german model sentence splitting tasks. therefore, need train sentence splitter, documentpreprocessor.

is there way can it?

no. @ present, tokenization of european languages done (hand-written) finite automaton. machine learning-based tokenization used chinese , arabic. @ present, sentence splitting languages done rule, exploiting decisions of tokenizer. (of course, that's how things now, not how have be.)

at present have no separate german tokenizer/sentence splitter. current properties file re-uses english ones. sub-optimal. if wanted produce german, great have. (we may @ point, german development not @ top of list of priorities.)


Comments

Popular posts from this blog

php - Permission denied. Laravel linux server -

google bigquery - Delta between query execution time and Java query call to finish -

python - Pandas two dataframes multiplication? -