stanford nlp - Identifying dates of the form \d\d-\d\d-\d\d using regexner -
i using stanford regexner alongwith ner in pipeline. want identify strings of form [0-9][0-9]-[0-9][0-9]-[0-9][0-9] (e.g., 27-02-16) date, ner identifies number. so, defined regex in mapping file , gave regexner. regexner not able identify such strings dates. ner these tokens still number. following mapping file:
[0-9]{2}-[0-9]{2}-[0-9]{2} date number
i ensured columns tab-separated. tried several versions of regex \d\d-\d\d-\d\d , [0-9][0-9]-[0-9][0-9]-[0-9][0-9], none of them worked. pointers on can wrong? using stanford corenlp 3.7. here java code running.
properties props = new properties(); props.put("annotators", "tokenize, ssplit, pos, lemma, ner, regexner"); stanfordcorenlp pipeline = new stanfordcorenlp(props); pipeline.addannotator( new regexnerannotator("/home/jyoti/workspace-jee/qa_rest/src/main/resources/gazetter.txt"));
i further investigated , found regex not matching string if consists wholly of integers. tried prefixing alphabet , worked (i.e., a\d\d-\d\d-\d\d matched a14-07-12).
how running this, because original rule works fine me.
i issued command:
java -xmx8g edu.stanford.nlp.pipeline.stanfordcorenlp -annotators tokenize,ssplit,pos,lemma,ner,regexner -regexner.mapping date-rules.txt -file date-example.txt -outputformat text
Comments
Post a Comment