Regular expression support builtin to Cobra
Posted: Tue Jan 04, 2011 1:56 am
I've added a patch to ticket:174 that makes cobra support Regular Expressions as builtins to the language.
Basically it adds a new type for a Regular expression Literal (regexp) supporting just a RE pattern or a RE pattern with flags
(this is expressed as a string (like a raw string) single or double delimited with an 're' prefix, patterns with flags are '/' delimited, patterns without flags may be '/' delimited but dont need to be)
Flags supported are
i - Ignorecase = case-insensitive matching.
c - Compile = compile the RE. This yields faster execution but increases startup time
s - Singleline = Single-line mode. Change so '.' matches every char instead of '[^\n]*' (every character except \n).
m - Multiline = Multiline mode. Change so '^' and '$' match start and end of lines instead of start and end of entire string
x - ExplicitCaptures - the only valid captures are explicitly named or numbered groups of the form (?<name>…).
W - Ignores unescaped white space in the pattern and enables comments marked with '#'. (Not very useful without multiline (string) support)
plus support for 3 new binary operators
~
~=
~|
These are supported both for typed and dynamic operands.
Theres a longer description on the ticket.
Heres an example
I'm not sure its any clearer than using the lib functions but it is less wordy.
Basically it adds a new type for a Regular expression Literal (regexp) supporting just a RE pattern or a RE pattern with flags
(this is expressed as a string (like a raw string) single or double delimited with an 're' prefix, patterns with flags are '/' delimited, patterns without flags may be '/' delimited but dont need to be)
re = re'\s([a-zA-Z]+)' # simple Regexp literal
re = re'/\s([a-z]+/i)' # Regexp with flags - case insensitive
Flags supported are
i - Ignorecase = case-insensitive matching.
c - Compile = compile the RE. This yields faster execution but increases startup time
s - Singleline = Single-line mode. Change so '.' matches every char instead of '[^\n]*' (every character except \n).
m - Multiline = Multiline mode. Change so '^' and '$' match start and end of lines instead of start and end of entire string
x - ExplicitCaptures - the only valid captures are explicitly named or numbered groups of the form (?<name>…).
W - Ignores unescaped white space in the pattern and enables comments marked with '#'. (Not very useful without multiline (string) support)
plus support for 3 new binary operators
~
- operator name is 'RE_match'
- generally equivalent to .Net match (overloaded in enumerable for to .Net matches)
- generates nil ( failure) or a Match ( or when overloaded a MatchCollection)
~=
- operator name is 'RE_hasMatch'
- tests if a RE matches a string ( .Net IsMatch).
- generates a bool
~|
- operator name is 'RE_splits'
- splits a string on the RE pattern
- Generates a List<of String>
These are supported both for typed and dynamic operands.
Theres a longer description on the ticket.
Heres an example
# Contrived example
str = '@param fare param1\n @param fare param 2\n@param fare last param'
re = re'/^\s*@param\s+(.*)$/m'
reX = re'/^no.match.evah$/m'
assert 'Regex' in re.typeOf.toString
# isMatch
if re ~= str, assert true
else, assert false, 'str match re - ismatch FAIL'
assert re ~= str
assert not reX ~= str
# Match
if re ~ str, assert true
else, assert false, 'str match re - match FAIL'
m = re ~ str
assert 'Match' in m.typeOf.toString
assert m and m.success
#print m
m = reX ~ str
assert not m
assert not reX ~ str
#Matches/MatchCollection
for m in re ~ str
assert m.groups[1].value.startsWith('fare')
# split
reSplit = re'\n?\s?@param '
#split = reSplit.split(str)
split = reSplit ~| str
assert split.count == 4
assert split[0] == ''
for i in 1 : split.count
assert split[i].startsWith('fare')
I'm not sure its any clearer than using the lib functions but it is less wordy.