Stanford NLP training documentpreprocessor -

does stanford nlp provide train method documentpreprocessor train own corpora , creating own models sentence splitting?

i working german sentences , need create own german model sentence splitting tasks. therefore, need train sentence splitter, documentpreprocessor.

is there way can it?

no. @ present, tokenization of european languages done (hand-written) finite automaton. machine learning-based tokenization used chinese , arabic. @ present, sentence splitting languages done rule, exploiting decisions of tokenizer. (of course, that's how things now, not how have be.)

at present have no separate german tokenizer/sentence splitter. current properties file re-uses english ones. sub-optimal. if wanted produce german, great have. (we may @ point, german development not @ top of list of priorities.)

Search This Blog

New Generation Education

Stanford NLP training documentpreprocessor -

Comments

Post a Comment

Popular posts from this blog

php - Displaying JSON data posts for blog using just the post id -

javascript - Angular2 intelliJ config error.. Cannot find module '@angular/core' -

google bigquery - Delta between query execution time and Java query call to finish -