stanford nlp - Identifying dates of the form \d\d-\d\d-\d\d using regexner -


i using stanford regexner alongwith ner in pipeline. want identify strings of form [0-9][0-9]-[0-9][0-9]-[0-9][0-9] (e.g., 27-02-16) date, ner identifies number. so, defined regex in mapping file , gave regexner. regexner not able identify such strings dates. ner these tokens still number. following mapping file:

[0-9]{2}-[0-9]{2}-[0-9]{2}  date    number 

i ensured columns tab-separated. tried several versions of regex \d\d-\d\d-\d\d , [0-9][0-9]-[0-9][0-9]-[0-9][0-9], none of them worked. pointers on can wrong? using stanford corenlp 3.7. here java code running.

properties props = new properties();  props.put("annotators", "tokenize, ssplit, pos, lemma, ner, regexner");         stanfordcorenlp pipeline = new stanfordcorenlp(props);         pipeline.addannotator(                 new regexnerannotator("/home/jyoti/workspace-jee/qa_rest/src/main/resources/gazetter.txt")); 

i further investigated , found regex not matching string if consists wholly of integers. tried prefixing alphabet , worked (i.e., a\d\d-\d\d-\d\d matched a14-07-12).

how running this, because original rule works fine me.

i issued command:

java -xmx8g edu.stanford.nlp.pipeline.stanfordcorenlp -annotators tokenize,ssplit,pos,lemma,ner,regexner -regexner.mapping date-rules.txt -file date-example.txt -outputformat text 

Comments

Popular posts from this blog

php - Permission denied. Laravel linux server -

google bigquery - Delta between query execution time and Java query call to finish -

python - Pandas two dataframes multiplication? -