nlp - Wapiti/CRF++ dataset format -


to on safe side , see fruits of hard work, features of data needs correctly formatted?

i have dataset , template, manually tagged ner.

as far training crf++ concerned, testing generated model yields 0% correct results. results same using wapiti.

question, should template file modified? or imperative add pos tags in training dataset well.

additionally, if model should discount word casing while labeling, should training dataset reflect in entirety, lower casing enforced. not affect sentences derive meaning uppercasing.
bit unclear in respect.

ps - targeting model like, http://cliff.mediameter.org/, ner labeled irrespective of casing. can't use model.

the training data small ( hardly 47 sentences ) , format incorrect because sentences end empty lines, yours end space-tab-space, might make crf++ learn whole file single sentence.

try http://paste.ubuntu.com/24537692/

also, share test data?


Comments

Popular posts from this blog

cookies - Yii2 Advanced - Share session between frontend and mainsite (duplicate of frontend for www) -

angular - password and confirm password field validation angular2 reactive forms -

php - Permission denied. Laravel linux server -