c# - RegEx - Match on XML declarations when the version # is not "1.0" -


i want use regex this.

i need find errant xml declarations , not version 1.0

the following valid matches:

bad declaration

<? xml ver="1.0" encoding="utf-8"?> 

bad declaration

<?xml version="1.0' encoding=utf-8> 

bad declaration

<?xml ?> 

bad declaration (doesn't start on first line)

 .....    <? xml ver="1.0" encoding="utf-8"?> 

version 1.1 (single quotes)

<?xml version='1.1' encoding='utf-8'?> 

version 1.1 (double quotes)

<?xml version="1.1" encoding="utf-8"?> 

erroneous version #

<?xml version='999999' encoding='utf-8'?> 

version 1.1 (multi-line) - not sure if multi-line formatting allowed i've seen done , need check it.

<?xml  version="1.1"  encoding="utf-8"  standalone="no" ?> 


want matches on invalid xml declarations or xml declarations version other 1.0

the following valid xml 1.0 declarations. these should never return match:

<?xml version="1.0" encoding="utf-8" standalone="no" ?>   <?xml version= "1.0" encoding= 'utf-8' standalone= "no" ?>  <?xml  version="1.0"  encoding="utf-8"  standalone="no" ?> 

xml 1.0's xml declaration grammar is:

xmldecl      ::=    '<?xml' versioninfo encodingdecl? sddecl? s? '?>' versioninfo  ::=    s 'version' eq ("'" versionnum "'" | '"' versionnum '"') eq           ::=    s? '=' s? versionnum   ::=    '1.0' encodingdecl ::=    s 'encoding' eq ('"' encname '"' | "'" encname "'" ) encname      ::=    [a-za-z] ([a-za-z0-9._] | '-')* sddecl       ::=    s 'standalone' eq (("'" ('yes' | 'no') "'") | ('"' ('yes' | 'no') '"')) s            ::=    (#x20 | #x9 | #xd | #xa)+ 

this can trivally converted c#'s regex notation, write regex matches valid declaration:

new regex(@" \a<\?xml [ \t\n\r]+version[ \t\n\r]*=[ \t\n\r]*([""'])1\.0\1 (?:[ \t\n\r]+encoding[ \t\n\r]*=[ \t\n\r]*([""'])[a-za-z][a-za-z0-9._-]*\2)? (?:[ \t\n\r]+standalone[ \t\n\r]*=[ \t\n\r]*([""'])(?:yes|no)\3)? [ \t\n\r]* \?> ", regexoptions.compiled | regexoptions.ignorepatternwhitespace) 

and can inverted using negative look-ahead make match if valid declaration missing.

new regex(@" \a(?!<\?xml [ \t\n\r]+version[ \t\n\r]*=[ \t\n\r]*([""'])1\.0\1 (?:[ \t\n\r]+encoding[ \t\n\r]*=[ \t\n\r]*([""'])[a-za-z][a-za-z0-9._-]*\2)? (?:[ \t\n\r]+standalone[ \t\n\r]*=[ \t\n\r]*([""'])(?:yes|no)\3)? [ \t\n\r]* \?>) ", regexoptions.compiled | regexoptions.ignorepatternwhitespace) 

(i've used back-reference simplify regex not necessary)

note when match, match beginning of string, won't match invalid declaration you. add (<[^>]*>) after lookahead if need non-empty match.


Comments

Popular posts from this blog

php - Permission denied. Laravel linux server -

google bigquery - Delta between query execution time and Java query call to finish -

python - Pandas two dataframes multiplication? -