Cobra Forums

Posted: **Tue Jan 04, 2011 1:56 am**

I've added a patch to ticket:174 that makes cobra support Regular Expressions as builtins to the language.

Basically it adds a new type for a Regular expression Literal (regexp) supporting just a RE pattern or a RE pattern with flags
(this is expressed as a string (like a raw string) single or double delimited with an 're' prefix, patterns with flags are '/' delimited, patterns without flags may be '/' delimited but dont need to be)

re = re'\s([a-zA-Z]+)'   # simple Regexp literal
re = re'/\s([a-z]+/i)'   # Regexp with flags - case insensitive

Flags supported are
i - Ignorecase = case-insensitive matching.
c - Compile = compile the RE. This yields faster execution but increases startup time
s - Singleline = Single-line mode. Change so '.' matches every char instead of '[^\n]*' (every character except \n).
m - Multiline = Multiline mode. Change so '^' and '$' match start and end of lines instead of start and end of entire string
x - ExplicitCaptures - the only valid captures are explicitly named or numbered groups of the form (?<name>…).
W - Ignores unescaped white space in the pattern and enables comments marked with '#'. (Not very useful without multiline (string) support)

plus support for 3 new binary operators
~

match

matches

~=

~|

In all three cases the operators expect the LHS operand to be a Regular Expression ( Regexp) and the RHS operand to be a string

These are supported both for typed and dynamic operands.

Theres a longer description on the ticket.

Heres an example

# Contrived example
        str = '@param fare param1\n @param fare param 2\n@param fare last param'
        
        re = re'/^\s*@param\s+(.*)$/m'
        reX = re'/^no.match.evah$/m'
        assert  'Regex' in re.typeOf.toString

        # isMatch
        if re ~= str, assert true 
        else, assert false, 'str match re - ismatch FAIL'
        assert re ~= str
        assert not reX ~= str
        
        # Match
        if re ~ str,  assert true
        else, assert false, 'str match re - match FAIL'
        m = re ~ str
        assert 'Match' in m.typeOf.toString
        assert m and m.success 
        #print m

        m = reX ~ str
        assert not m
        assert not reX ~ str
        
        #Matches/MatchCollection
        for m in re ~ str
            assert m.groups[1].value.startsWith('fare')
        
            
        # split
        reSplit = re'\n?\s?@param '
        #split = reSplit.split(str)
        split = reSplit ~| str
        assert split.count == 4
        assert split[0] == ''
        for i in 1 : split.count
            assert split[i].startsWith('fare')

I'm not sure its any clearer than using the lib functions but it is less wordy.

Posted: **Tue Jan 04, 2011 4:39 am**

Eh, I've had some thoughts on regexes for awhile, but I really didn't want to get into a new major feature right now because it detracts from bug fixes and refinements which I think we need more than regexes. Plus major features often introduce new bugs and new sorely needed refinements.

In any case, given that Cobra is keyword and method oriented, why would we use ~| instead of a method name or existing keyword operator?:

# "in" works for strings:
what = 'fox'
text = 'The quick brown fox jumps over the lazy dog.'
assert what in text

# why not for regexes?
assert someRE in text

The cryptic ~| could be done with a .split method which also opens up the possibility of an overload that takes options (max splits, etc.).

Posted: **Tue Jan 04, 2011 8:24 am**

Charles wrote:Eh, I've had some thoughts on regexes for awhile, but I really didn't want to get into a new major feature right now because it detracts from bug fixes and refinements which I think we need more than regexes. Plus major features often introduce new bugs and new sorely needed refinements.

In any case, given that Cobra is keyword and method oriented, why would we use ~| instead of a method name or existing keyword operator?:
# "in" works for strings:
what = 'fox'
text = 'The quick brown fox jumps over the lazy dog.'
assert what in text

# why not for regexes?
assert someRE in text
The cryptic ~| could be done with a .split method which also opens up the possibility of an overload that takes options (max splits, etc.).

Of the three operators added, as you said ~| can simply be .split, and "in" covers the ~= (ie the boolean hasMatch), but what about the ~ (ie the matches operation) ? Are you open to ~ as a new operator? If a keyword preferred, are there any that would be suitable and are reusable for such a purpose?

Posted: **Wed Jan 05, 2011 3:45 am**

Whoops - thought I;d posted this last night

Interesting... I have many questions

What bug fixes (tickets) and refinements ( enhancements) do we need (more) ?
A Specific listing, Annotation and/or augmentation of any of the tickets might be useful.

Are you saying we cant add major features cos they may cause bugs/need further refinement ?
Are 'bug fixes and refinements' somehow immune from this possibility
- Why are we not equally paralysed from making changes for them also.

If cobra is keyword and method oriented, Why do we use any operators at all rather than all method names and keywords ?

re ~| (re_splits) , it is already available as methods on the .Net class (regex.split) as are all the other regex capabilities (match ~, isMatch ~=) These of course have different names on other platforms and in other languages, though the use of '~' (with modofiers) seems a reasonably common choice.
The ticket mentions convenience at having them built in
and making them built in hoists that support away from the library implementation and is more succinct (and arguably more readable)

What existing keyword would be an intelligible/natural substitute for ~| (re_splits) ?

How would overloading existing keywords be any more clear than a small related set of additional operators (using the same prefix ~) ?

I cant say that I see a RE as being in a string, it may or may not match some, parts or all of the string though....
leaving that aside perhaps thats marginally OK for a boolean match/not match test - what happens for the rest of the capabilities (groups and captures) ?

All operators start off being cryptic ( but succinct) , its only familiarity with common usage or experience that makes them less so
( <> vs != , ?=, >>,...)

I think I'll defer mentioning use of augmented RE pattern literals and a ~: operator for a RE substitution/replacement.

Posted: **Thu Jan 06, 2011 6:33 pm**

Great addition hops.

I can see in for isMatch and even in the for statement (provided it returns a MatchCollection in that context), but as mentioned before I don't see how matches fits into the picture so that groups and captures can be sucked out. This keyword would probably be the better fit for the for statement anyway. I'm not really a fan of the operators either--they tend to be forgotten unless you use them daily--and I can't say they're intuitive from the point of view of the traditional operator use (neither bitwise nor logical).

Another common operation would be replace. Seems fitting for an overload for methods in the String class (where applicable).

What about the syntax not requiring re'...' and just /../ like JavaScript? I'm not pushing for additional parsing headaches

. It would also be nice for the compiler to provide errors/warnings for the expression (dunno if this is already done) so that we don't have to wait till run-time.

Anyway, it's a great addition and it's nice seeing Regexes make their way in the language as a first-class citizen.

Cobra Forums

Regular expression support builtin to Cobra

Regular expression support builtin to Cobra

Re: Regular expression support builtin to Cobra

Re: Regular expression support builtin to Cobra

Re: Regular expression support builtin to Cobra

Re: Regular expression support builtin to Cobra