nlp - Wapiti/CRF++ dataset format -


to on safe side , see fruits of hard work, features of data needs correctly formatted?

i have dataset , template, manually tagged ner.

as far training crf++ concerned, testing generated model yields 0% correct results. results same using wapiti.

question, should template file modified? or imperative add pos tags in training dataset well.

additionally, if model should discount word casing while labeling, should training dataset reflect in entirety, lower casing enforced. not affect sentences derive meaning uppercasing.
bit unclear in respect.

ps - targeting model like, http://cliff.mediameter.org/, ner labeled irrespective of casing. can't use model.

the training data small ( hardly 47 sentences ) , format incorrect because sentences end empty lines, yours end space-tab-space, might make crf++ learn whole file single sentence.

try http://paste.ubuntu.com/24537692/

also, share test data?


Comments

Popular posts from this blog

php - Permission denied. Laravel linux server -

google bigquery - Delta between query execution time and Java query call to finish -

python - Pandas two dataframes multiplication? -