Ticket #174 (assigned enhancement)

Last modified 14 years ago

Support for Regular Expression literals and matching

Reported by:	webnov8	Owned by:	Chuck
Priority:	major	Milestone:
Component:	Cobra Compiler	Version:
Keywords:		Cc:

Description

Given the importance of regular expressions in everyday programming a literal for Regex types would be a great convenience.

if /^\bcobra\b/i in 'cobra programming'
  print 'found cobra'

for /\b(\w+)\b/ in 'the quick brown fox'
  print $1 # support for groups/captures, maybe?

exp = /\bcobra\b/c # compile this regex

Attachments

regexp-lit-ops3.patch (35.6 KB) - added by hopscc 14 years ago.

Change History

Changed 14 years ago by hopscc

Step One. Support regexp Literals.

Instead of

# using .Net Library classes explicitly
re = Regex('<pattern>')
# or 
re = Regex('<pattern>', regexOpt | regexOpt...)

allow

re = re'<pattern>'  # same as re = re"<pattern>"
# or
re = re'/<pattern>/<flags>' # same as re = re"/<pattern>/<flags>"

Specifically

class RegexpLit
    is partial
    inherits AtomicLiteral
    """
    A Regular Expression Literal.

    Form is similar to that of a string; single or double quote delimited but with a 're' prefix.
    It may be either just a pattern (delimited or not)  re'<Ptn>' or re'/<Ptn>/' 
        or a delimited pattern with appended option flags           re'/<Ptn>/<flags>'.

    If flags are desired the pattern must be delimited (with matching /) and any flags follow the trailing /

    Flags can be one or more of:
        i - Ignorecase =  case-insensitive matching.
        c - Compile  =  compile the RE. This yields faster execution but increases startup time
        s - Singleline = Single-line mode. Change so '.' matches every char instead of '[^\n]*' (every character except \n).
        m - Multiline = Multiline mode. Change so '^' and '$' match start and end of lines instead of start and end of entire string
        x - ExplicitCaptures - the only valid captures are explicitly named or numbered groups of the form (?<name>…). 
        W - Ignores unescaped white space in the pattern and enables comments marked with '#'. (Not very useful without multiline (string) support)
    """

Examples

re = re'f.o.'
if re.isMatch('Feeling like a fool when your bike wont go')
    print 'Matched'

str = '@param farethee well\n @param fare thee well\n@param ftw my darling maid'
re = re'/^\s*@param\s+(.*)$/m'

assert re.isMatch(str)
m = re.match(str)
assert m.success 

reA = re'/^\s*@param\s+(.*)\n/' # non Multiline, match EOL
assert reA.isMatch(str)

reB = re'^\s*@param\s+(.*)'     # No flags at all
assert reB.isMatch(str)

assert not re'^\s*@PARAM\s+(.*)'.isMatch(str)  # case sensitive
assert  re'/^\s*@PARAM\s+(.*)/i'.isMatch(str)   # case insensitive

Changed 14 years ago by hopscc

Step Two
Support regex matching as binary Operations whose operands can only be a regexp and a string.
Generally '~' denotes a regexp operator ( with possible modifier)

Regexp Test same as .Net IsMatch

returns bool
op is '~=' ( cf == )
re ~= str

Regexp match/exec same as .Net Match

returns RegularExpressions.Match
op is '~'
re ~ str

Regexp all matches same as .Net Matches

returns RegularExpressions.MatchCollection
op is also '~' in for ... in (overload ~ )
for m in re ~ str

Either make the ~ operator return nil on failure OR augment truth test to recognise a Match and use Match.Success to indicate its truth value
so can say

#! cobra
if re ~ str
# or
m = re ~ str
if not m, print 'No Match'

( making ~= somewhat redundant except as a speed optimisation)

Examples

    str = '@param farethee well\n @param fare thee well\n@param fare tw my darling maid'
        
    re = re'/^\s*@param\s+(.*)$/m'
    reX = re'/^no.match.evah$/m'
    assert  'Regex' in re.typeOf.toString

    # isMatch
    if re ~= str, assert true 
    else, assert false, 'str match re - ismatch FAIL'
    assert re ~= str
    assert not reX ~= str
    
    # Match
    if re ~ str,  assert true
    else, assert false, 'str match re - match FAIL'
    m = re ~ str
    assert 'Match' in m.typeOf.toString
    assert m and m.success 
    #print m

    m = reX ~ str
    assert not m
    assert not reX ~ str
        
    #Matches/MatchCollection
    for m in re ~ str
        assert m.groups[1].value.startsWith('fare')

Note: See TracTickets for help on using tickets.

Download in other formats:

Ticket #174 (assigned enhancement)

Support for Regular Expression literals and matching

Description

Attachments

Change History

Changed 16 years ago by hopscc

Changed 16 years ago by Chuck

Changed 14 years ago by hopscc

Changed 14 years ago by hopscc

Changed 14 years ago by hopscc

Changed 14 years ago by hopscc

Changed 14 years ago by hopscc

Changed 14 years ago by hopscc

Download in other formats: