Regex package:hxt-regex-xmlschema
csh style Glob Pattern Parser for Regular Expressions
W3C XML Schema Regular Expression Parser
This parser supports the full W3C standard, the complete grammar can
be found under
http://www.w3.org/TR/xmlschema11-2/#regexs and
extensions for all missing set operations, intersection, difference,
exclusive or, interleave, complement
A regular expression library for W3C XML Schema regular expressions
This library supports full W3C XML Schema regular expressions
inclusive all Unicode character sets and blocks. The complete grammar
can be found under
http://www.w3.org/TR/xmlschema11-2/#regexs.
It is implemented by the technique of derivations of regular
expressions.
The W3C syntax is extended to support not only union of regular sets,
but also intersection, set difference, exor. Matching of
subexpressions is also supported.
The library can be used for constricting lightweight scanners and
tokenizers. It is a standalone library, no external regex libraries
are used.
Extensions in 9.2: The library does nor only support String's, but
also ByteString's and Text in strict and lazy variants
parse a regular expression surrounded by contenxt spec
a leading
^ denotes start of text, a trailing
$
denotes end of text, a leading
\< denotes word start, a
trailing
\> denotes word end.
The 1. param ist the regex parser (
parseRegex or
parseRegexExt)
parse a standard W3C XML Schema regular expression
parse an extended syntax W3C XML Schema regular expression
The Syntax of the W3C XML Schema spec is extended by further useful
set operations, like intersection, difference, exor. Subexpression
match becomes possible with "named" pairs of parentheses. The multi
char escape sequence \a represents any Unicode char, The multi char
escape sequence \A represents any Unicode word, (\A = \a*). All
syntactically wrong inputs are mapped to the Zero expression
representing the empty set of words. Zero contains as data field a
string for an error message. So error checking after parsing becomes
possible by checking against Zero (
isZero predicate)
This function wraps the whole regex in a subexpression before starting
the parse. This is done for getting access to the whole parsed string.
Therfore we need one special label, this label is the Nothing value,
all explicit labels are Just labels.
The main scanner function
speedup version for splitWithRegex'
This function checks whether the input starts with a char from FIRST
re. If this is not the case, the split fails. The FIRST set can be
computed once for a whole tokenizer and reused by every call of split