migration - Why parse failing after upgrading from Antlr 3 to Antlr 4? -
recently trying upgrade project antlr3 antlr4. after making change in grammar file, seems equations worked no longer working. new antlr4 unable understand whether change broke or not.
here original grammar file:
grammar equation; options { language=csharp2; output=ast; astlabeltype=commontree; } tokens { variable; constant; expr; parexpr; equation; unaryexpr; function; binaryop; list; } equationset: equation* eof!; equation: variable assign expression -> ^(equation variable expression) ; parexpression : lparen expression rparen -> ^(parexpr expression) ; expression : conditionalexpression -> ^(expr conditionalexpression) ; conditionalexpression : orexpression ; orexpression : andexpression ( or^ andexpression )* ; andexpression : comparisonexpression ( and^ comparisonexpression )*; comparisonexpression: additiveexpression ((eq^ | ne^ | lte^ | gte^ | lt^ | gt^) additiveexpression)*; additiveexpression : multiplicativeexpression ( (plus^ | minus^) multiplicativeexpression )* ; multiplicativeexpression : unaryexpression ( ( times^ | divide^) unaryexpression )* ; unaryexpression : not unaryexpression -> ^(unaryexpr not unaryexpression) | minus unaryexpression -> ^(unaryexpr minus unaryexpression) | exponentexpression; exponentexpression : primary (caret^ primary)*; primary : parexpression | constant | booleantok | variable | function; numeric: integer | real; constant: string -> ^(constant string) | numeric -> ^(constant numeric); booleantok : boolean -> ^(boolean); scopedidentifier : (identifier dot)* identifier -> identifier+; function : scopedidentifier lparen argumentlist rparen -> ^(function scopedidentifier argumentlist); variable: scopedidentifier -> ^(variable scopedidentifier); argumentlist: (expression) ? (comma! expression)*; ws : (' '|'\r'|'\n'|'\t')+ {$channel=hidden;}; comment : '/*' .* '*/' {$channel=hidden;}; line_comment : '//' ~('\n'|'\r')* '\r'? '\n' {$channel=hidden;}; string: (('\"') ( (~('\"')) )* ('\"'))+; fragment alpha: 'a'..'z'|'_'; fragment digit: '0'..'9'; fragment alnum: alpha|digit; eq : '=='; assign : '='; ne : '!=' | '<>'; or : 'or' | '||'; , : 'and' | '&&'; not : '!'|'not'; lte : '<='; gte : '>='; lt : '<'; gt : '>'; times : '*'; divide : '/'; boolean : 'true' | 'false'; identifier: alpha (alnum)* | ('[' (~(']'))+ ']') ; real: digit* dot digit+ ('e' (plus | minus)? digit+)?; integer: digit+; plus : '+'; minus : '-'; comma : ','; rparen : ')'; lparen : '('; dot : '.'; caret : '^';
and here have after changes:
grammar equation; options { } tokens { variable; constant; expr; parexpr; equation; unaryexpr; function; binaryop; list; } equationset: equation* eof; equation: variable assign expression ; parexpression : lparen expression rparen ; expression : conditionalexpression ; conditionalexpression : orexpression ; orexpression : andexpression ( or andexpression )* ; andexpression : comparisonexpression ( , comparisonexpression )*; comparisonexpression: additiveexpression ((eq | ne | lte | gte | lt | gt) additiveexpression)*; additiveexpression : multiplicativeexpression ( (plus | minus) multiplicativeexpression )* ; multiplicativeexpression : unaryexpression ( ( times | divide) unaryexpression )* ; unaryexpression : not unaryexpression | minus unaryexpression | exponentexpression; exponentexpression : primary (caret primary)*; primary : parexpression | constant | booleantok | variable | function; numeric: integer | real; constant: string | numeric; booleantok : boolean; scopedidentifier : (identifier dot)* identifier; function : scopedidentifier lparen argumentlist rparen; variable: scopedidentifier; argumentlist: (expression) ? (comma expression)*; ws : (' '|'\r'|'\n'|'\t')+ ->channel(hidden); comment : '/*' .* '*/' ->channel(hidden); line_comment : '//' ~('\n'|'\r')* '\r'? '\n' ->channel(hidden); string: (('\"') ( (~('\"')) )* ('\"'))+; fragment alpha: 'a'..'z'|'_'; fragment digit: '0'..'9'; fragment alnum: alpha|digit; eq : '=='; assign : '='; ne : '!=' | '<>'; or : 'or' | '||'; , : 'and' | '&&'; not : '!'|'not'; lte : '<='; gte : '>='; lt : '<'; gt : '>'; times : '*'; divide : '/'; boolean : 'true' | 'false'; identifier: alpha (alnum)* | ('[' (~(']'))+ ']') ; real: digit* dot digit+ ('e' (plus | minus)? digit+)?; integer: digit+; plus : '+'; minus : '-'; comma : ','; rparen : ')'; lparen : '('; dot : '.'; caret : '^';
a sample equation trying parse (which working ok before) is:
[a].[b] = 1.76 * [product_dc].[pdc_inbound_pallets] * if(product_dc.[pdc_dc] =="us84",1,0)
thanks in advance.
- tokens should listed comma
,
not semicolon;
. see token section paragraph in official doc. - since antlr 4.7 backslash not required double quote escaping.
string: (('\"') ( (~('\"')) )* ('\"'))+;
should rewrittenstring: ('"' ~'"'* '"')+;
. - you missed question mark in multiline comment token non-greedy matching:
'/*' .* '*/'
->'/*' .*? '*/'
.
so, fixed grammar looks this:
grammar equation; options { } tokens { variable, constant, expr, parexpr, equation, unaryexpr, function, binaryop, list } equationset: equation* eof; equation: variable assign expression ; parexpression : lparen expression rparen ; expression : conditionalexpression ; conditionalexpression : orexpression ; orexpression : andexpression ( or andexpression )* ; andexpression : comparisonexpression ( , comparisonexpression )*; comparisonexpression: additiveexpression ((eq | ne | lte | gte | lt | gt) additiveexpression)*; additiveexpression : multiplicativeexpression ( (plus | minus) multiplicativeexpression )* ; multiplicativeexpression : unaryexpression ( ( times | divide) unaryexpression )* ; unaryexpression : not unaryexpression | minus unaryexpression | exponentexpression; exponentexpression : primary (caret primary)*; primary : parexpression | constant | booleantok | variable | function; numeric: integer | real; constant: string | numeric; booleantok : boolean; scopedidentifier : (identifier dot)* identifier; function : scopedidentifier lparen argumentlist rparen; variable: scopedidentifier; argumentlist: (expression) ? (comma expression)*; ws : (' '|'\r'|'\n'|'\t')+ ->channel(hidden); comment : '/*' .*? '*/' -> channel(hidden); line_comment : '//' ~('\n'|'\r')* '\r'? '\n' ->channel(hidden); string: ('"' ~'"'* '"')+; fragment alpha: 'a'..'z'|'_'; fragment digit: '0'..'9'; fragment alnum: alpha|digit; eq : '=='; assign : '='; ne : '!=' | '<>'; or : 'or' | '||'; , : 'and' | '&&'; not : '!'|'not'; lte : '<='; gte : '>='; lt : '<'; gt : '>'; times : '*'; divide : '/'; boolean : 'true' | 'false'; identifier: alpha (alnum)* | ('[' (~(']'))+ ']') ; real: digit* dot digit+ ('e' (plus | minus)? digit+)?; integer: digit+; plus : '+'; minus : '-'; comma : ','; rparen : ')'; lparen : '('; dot : '.'; caret : '^';
Comments
Post a Comment