Wiki

Ticket #174 (assigned enhancement)

Opened 16 years ago

Last modified 14 years ago

Support for Regular Expression literals and matching

Reported by: webnov8 Owned by: Chuck
Priority: major Milestone:
Component: Cobra Compiler Version:
Keywords: Cc:

Description

Given the importance of regular expressions in everyday programming a literal for Regex types would be a great convenience.

if /^\bcobra\b/i in 'cobra programming'
  print 'found cobra'
for /\b(\w+)\b/ in 'the quick brown fox'
  print $1 # support for groups/captures, maybe?
exp = /\bcobra\b/c # compile this regex

Attachments

regexp-lit-ops3.patch Download (35.6 KB) - added by hopscc 14 years ago.

Change History

Changed 16 years ago by hopscc

  • summary changed from Support for Regular Expression literals to Support for Regular Expression literals and matching

As per the given code this should also include inbuilt language support for
regexp matching and associated features (groups, captures, substitution, ...)
(exact syntax YTBD)

Changed 16 years ago by Chuck

  • milestone Cobra 0.9 deleted
re'foo'
re"foo"

Changed 14 years ago by hopscc

  • owner set to hopscc
  • status changed from new to accepted

Changed 14 years ago by hopscc

Step One. Support regexp Literals.

Instead of

# using .Net Library classes explicitly
re = Regex('<pattern>')
# or 
re = Regex('<pattern>', regexOpt | regexOpt...)

allow

re = re'<pattern>'  # same as re = re"<pattern>"
# or
re = re'/<pattern>/<flags>' # same as re = re"/<pattern>/<flags>"

Specifically

class RegexpLit
    is partial
    inherits AtomicLiteral
    """
    A Regular Expression Literal.

    Form is similar to that of a string; single or double quote delimited but with a 're' prefix.
    It may be either just a pattern (delimited or not)  re'<Ptn>' or re'/<Ptn>/' 
        or a delimited pattern with appended option flags           re'/<Ptn>/<flags>'.

    If flags are desired the pattern must be delimited (with matching /) and any flags follow the trailing /

    Flags can be one or more of:
        i - Ignorecase =  case-insensitive matching.
        c - Compile  =  compile the RE. This yields faster execution but increases startup time
        s - Singleline = Single-line mode. Change so '.' matches every char instead of '[^\n]*' (every character except \n).
        m - Multiline = Multiline mode. Change so '^' and '$' match start and end of lines instead of start and end of entire string
        x - ExplicitCaptures - the only valid captures are explicitly named or numbered groups of the form (?<name>…). 
        W - Ignores unescaped white space in the pattern and enables comments marked with '#'. (Not very useful without multiline (string) support)
    """

Examples

re = re'f.o.'
if re.isMatch('Feeling like a fool when your bike wont go')
    print 'Matched'

str = '@param farethee well\n @param fare thee well\n@param ftw my darling maid'
re = re'/^\s*@param\s+(.*)$/m'

assert re.isMatch(str)
m = re.match(str)
assert m.success 

reA = re'/^\s*@param\s+(.*)\n/' # non Multiline, match EOL
assert reA.isMatch(str)

reB = re'^\s*@param\s+(.*)'     # No flags at all
assert reB.isMatch(str)

assert not re'^\s*@PARAM\s+(.*)'.isMatch(str)  # case sensitive
assert  re'/^\s*@PARAM\s+(.*)/i'.isMatch(str)   # case insensitive

Changed 14 years ago by hopscc

Step Two
Support regex matching as binary Operations whose operands can only be a regexp and a string.
Generally '~' denotes a regexp operator ( with possible modifier)

Regexp Test same as .Net IsMatch

  • returns bool
  • op is '~=' ( cf == )
  • re ~= str

Regexp match/exec same as .Net Match

  • returns RegularExpressions.Match
  • op is '~'
  • re ~ str

Regexp all matches same as .Net Matches

  • returns RegularExpressions.MatchCollection
  • op is also '~' in for ... in (overload ~ )
  • for m in re ~ str

Either make the ~ operator return nil on failure OR augment truth test to recognise a Match and use Match.Success to indicate its truth value
so can say

#! cobra
if re ~ str
# or
m = re ~ str
if not m, print 'No Match'

( making ~= somewhat redundant except as a speed optimisation)

Examples

    str = '@param farethee well\n @param fare thee well\n@param fare tw my darling maid'
        
    re = re'/^\s*@param\s+(.*)$/m'
    reX = re'/^no.match.evah$/m'
    assert  'Regex' in re.typeOf.toString

    # isMatch
    if re ~= str, assert true 
    else, assert false, 'str match re - ismatch FAIL'
    assert re ~= str
    assert not reX ~= str
    
    # Match
    if re ~ str,  assert true
    else, assert false, 'str match re - match FAIL'
    m = re ~ str
    assert 'Match' in m.typeOf.toString
    assert m and m.success 
    #print m

    m = reX ~ str
    assert not m
    assert not reX ~ str
        
    #Matches/MatchCollection
    for m in re ~ str
        assert m.groups[1].value.startsWith('fare')

Changed 14 years ago by hopscc

Step Three
Support regex split as a binary op 'splits'.
operands can only be a regexp and a string.

Regexp split op does same as .Net Regex.split

  • returns List<of String>
  • op is '~|'
  • re ~| str

Example:

str = '@param farethee well\n @param fare thee well\n@param fare tw my darling maid'
reSplit = re'\n?\s?@param '
#split = reSplit.split(str)
split = reSplit ~| str
assert split.count == 4
assert split[0] == ''
for i in 1 : split.count
    assert split[i].startsWith('fare')

Changed 14 years ago by hopscc

Changed 14 years ago by hopscc

  • owner changed from hopscc to Chuck
  • status changed from accepted to assigned

Patch for all of above including directory of additional tests.

Note: See TracTickets for help on using tickets.