nlp - Wapiti/CRF++ dataset format -

to on safe side , see fruits of hard work, features of data needs correctly formatted?

i have dataset , template, manually tagged ner.

as far training crf++ concerned, testing generated model yields 0% correct results. results same using wapiti.

question, should template file modified? or imperative add pos tags in training dataset well.

additionally, if model should discount word casing while labeling, should training dataset reflect in entirety, lower casing enforced. not affect sentences derive meaning uppercasing.
bit unclear in respect.

ps - targeting model like, http://cliff.mediameter.org/, ner labeled irrespective of casing. can't use model.

the training data small ( hardly 47 sentences ) , format incorrect because sentences end empty lines, yours end space-tab-space, might make crf++ learn whole file single sentence.

try http://paste.ubuntu.com/24537692/

also, share test data?

Search This Blog

New Generation Education

nlp - Wapiti/CRF++ dataset format -

Comments

Post a Comment

Popular posts from this blog

php - Displaying JSON data posts for blog using just the post id -

cookies - Yii2 Advanced - Share session between frontend and mainsite (duplicate of frontend for www) -

javascript - Angular2 intelliJ config error.. Cannot find module '@angular/core' -