Forums

Escaping substitution in strings

General discussion about Cobra. Releases and general news will also be posted here.
Feel free to ask questions or just say "Hello".

Escaping substitution in strings

Postby hopscc » Tue Apr 08, 2008 7:04 am

I've been fiddling with cobra - its coming along great.

One thing that was bugging me is that while the [] syntax in strings for string expression substitution is very clean when you do want it,
when you dont, (but do want a '[' literal) your string starts to become unintelligible ....
the code for the displayed string gets obscured by either string catenation with ns'' or r'' syntax or various forms of
additional formatting or substitution

e.g.
Code: Select all
        a=99
        print String.format('a={0}[a]]', c'[')
        print 'a='+ ns'[' +'[a]]'
        a1=String.format('{0}[a]]', ns'[')
        print 'a=[a1]'
        print 'a=[c'['][a]]'
        # any of above to get 'a=[99]'


I want a cleaner syntax like
Code: Select all
print 'a=\[[a]]'


I fiddled a bit with the Tokeniser to get some regexes and supporting code do what I wanted but I unfortunately couldnt get
something working for all valid cases (multiple mixed subst and non substitution in strings);
Either with modifying existing tokens or adding a whole lot of new tokens and munging the tokenising
behaviour to fit into the existing expression handling... redoing the string and substitution tokenising and expression parsing seemed to be the only way within
the existing structure and that is a bit excessive ( for me for the moment anyway)... so I resorted to a nasty little hack which seems to work fine;
If nothing else its a starting point - giving the desired effect in a perhaps non optimal way (:-)
Its not affected any of the existing tests and I have a new testfile (063-esc-string-subst) exercising the new escaped substitution behaviour.

i.e all the following compile and run
Code: Select all
 
            a=99
            s0="a=[a]"
            assert s0=="a=99"
            s1="1:a=\[..[a]..]"
            assert s1==r"1:a=[..99..]"
            s2="2:a=\[[a]]"
            assert s2==r"2:a=[99]"
            s2a="2:a=\\[a]"
            assert s2a==r"2:a=\99"
            s3="3:a=\[xx: [a]]"
            assert s3==r"3:a=[xx: 99]"
            s4="4:a=\[33xx: [a]]"
            assert s4==r"4:a=[33xx: 99]"
            s4a='4:a=\[33xx: [a]]'
            assert s4a==r'4:a=[33xx: 99]'
            s5="4a:a=\[33xx] \[yy]: [a]]"
            assert s5==r"4a:a=[33xx] [yy]: 99]"
            s6="5:a=\[valueof(a)] \[b]"
            assert s6==r"5:a=[valueof(a)] [b]"
            s7="6:a=[a] a=\[..] [a] "
            assert s7==r"6:a=99 a=[..] 99 "
            s7a="7: \[a=][a]] bigjobs \[[a]] crivens"
            assert s7a == r'7: [a=]99] bigjobs [99] crivens'
            s8=ns"7:a=\[a][a]"
            assert s8==r"7:a=[a][a]"
            s9= r"8:a=\[[a]]"
            assert s9==r"8:a=\[[a]]"
            # xtn: \{ gets subst to '[' in \ expanded strings
            s13='SUBST: a=\{[a]]'
            assert s13==r'SUBST: a=[99]'
            s13a=ns'a=\{[a]]'
            assert s13a == r'a=[[a]]'
            # except when it doesnt
            s13b=r'a=\{[a]]'
            assert s13b == r'a=\{[a]]'
            b=ns"\\["
            assert b==r'\['
            assert b == r"\["
            assert b == ns"\\\{" 
            assert b <> r'\\{'
            b2="123 \[ 456"
            assert b2==r"123 [ 456"
            b2='123 \[ 456'
            assert b2==r"123 [ 456"
            b3=r'123 \[ 456'
            assert b3==r"123 \[ 456"


Will you take the patches for the changes ( for examination at least ) and the new test case?
If so how would you like them?
file attachment here? ( I thought there was a file upload option somewhere on here )
email to cobra mail addr?
ftp drop somewhere?
hopscc
 
Posts: 632
Location: New Plymouth, Taranaki, New Zealand

Re: Escaping substitution in strings

Postby Charles » Tue Apr 08, 2008 8:50 am

Glad to hear Cobra is working out for you.

I too would like to be able to escape left brackets, so I will definitely take a look at your patch. Use "svn diff > mypatch.patch" and attach here. When you post you should see an "Upload attachment" tab towards the bottom of the page.

Thanks.
Charles
 
Posts: 2515
Location: Los Angeles, CA

Re: Escaping substitution in strings

Postby hopscc » Wed Apr 09, 2008 2:23 am

upload gives 'the extension patch is not allowed'

heres the patch inline
Code: Select all
Index: Source/Tokenizer.cobra
===================================================================
--- Source/Tokenizer.cobra   (revision 1454)
+++ Source/Tokenizer.cobra   (working copy)
@@ -563,11 +563,52 @@
                assert _sourceLine.endsWith('\n')
             else
                assert false, 'Expecting readLine to return one line instead of many.'
+               
+         # hack for escaped [ '\[' in strings
+         if _sourceLine.contains(ns'\\[')
+            _sourceLine = _substEsc(_sourceLine to !)
+            #print _sourceLine
+
          _sourceLineIndex = 0
          _lineNum += 1
          _colNum = 1
          return true
-
+       
+   #convert escaped subst ('\[') in strings to something that the subst tokeniser wont pickup ('\{')
+   # corresponding reversion in CobraTokenizer.tokValueForString       
+   def _substEsc( line as String) as String
+      chars = StringBuilder()  # TODO: make an array of chars
+      last = c'\0'
+      inDString = false
+      inSString = false
+      raw = false
+      next as char?
+      for c in line
+         next = c #nil
+         if last <> c'\\'
+            if c == c'"'
+               if not inSString
+                  inDString = not inDString
+                  raw = if(raw, false, last =='r')
+            else if c == c'\''
+               if not inDString
+                  inSString = not inSString
+                  raw = if(raw, false, last =='r')
+         if last==c'\\' and (inSString or inDString) and not raw
+            branch c
+               on c'['
+                  next = c'{'
+               on c'\\'
+                  chars.append(c'\\')
+                  last = c'\0'
+                  continue
+               else, next = c
+         #if next is nil
+         #   next = c
+         chars.append(next)
+         last = c
+      return chars.toString
+   
    var _narrowTokenDefs = true
    var _minNumTokenDefsToNarrow = 4
 
Index: Source/CobraTokenizer.cobra
===================================================================
--- Source/CobraTokenizer.cobra   (revision 1454)
+++ Source/CobraTokenizer.cobra   (working copy)
@@ -475,6 +475,7 @@
                   on c'r', next = c'\r'
                   on c't', next = c'\t'
                   on c'0', next = c'\0'
+                  on c'{', next = c'['
                   on c'\\'
                      chars.append(c'\\')
                      # cannot have `last` being a backslash anymore--it's considered consumed now

hopscc
 
Posts: 632
Location: New Plymouth, Taranaki, New Zealand

Re: Escaping substitution in strings

Postby hopscc » Wed Apr 09, 2008 2:28 am

And heres the test file
Tests/100-basics/063-esc-string-subst.cobra
Code: Select all

namespace Test
    # Tests for escaping of substitution syntax in strings
    class Test
        def main
            is shared
           
            a=99
            s0="a=[a]"
            assert s0=="a=99"
           
            s1="1:a=\[..[a]..]"
            assert s1==r"1:a=[..99..]"
           
            s2="2:a=\[[a]]"
            assert s2==r"2:a=[99]"
           
            s2a="2:a=\\[a]"
            assert s2a==r"2:a=\99"
           
            s3="3:a=\[xx: [a]]"
            assert s3==r"3:a=[xx: 99]"
           
            s4="4:a=\[33xx: [a]]"
            assert s4==r"4:a=[33xx: 99]"
            s4a='4:a=\[33xx: [a]]'
            assert s4a==r'4:a=[33xx: 99]'
           
            s5="4a:a=\[33xx] \[yy]: [a]]"
            assert s5==r"4a:a=[33xx] [yy]: 99]"
           
            s6="5:a=\[valueof(a)] \[b]"
            assert s6==r"5:a=[valueof(a)] [b]"
           
            s7="6:a=[a] a=\[..] [a] "
            assert s7==r"6:a=99 a=[..] 99 "
       
            s7a="7: \[a=][a]] bigjobs \[[a]] crivens"
            assert s7a == r'7: [a=]99] bigjobs [99] crivens'
           
            s8=ns"7:a=\[a][a]"
            assert s8==r"7:a=[a][a]"
           
            s9= r"8:a=\[[a]]"
            assert s9==r"8:a=\[[a]]"
       
       
            # test some the old way without the escape syntax.
            t=String.format('a={0}[a]]', c'[')
            assert t==r'a=[99]'
            s10='FMT:a='+ ns'[' +'[a]]'
            assert s10==r'FMT:a=[99]'
       
            a1=String.format('{0}[a]]', ns'[')
            s11='a=[a1]'
            assert s11==r'a=[99]'
           
            s12='a=[c'['][a]]'
            assert s12==r'a=[99]'
           
            # xtn: \{ gets subst to '[' in non raw strings
            s13='SUBST: a=\{[a]]'
            assert s13==r'SUBST: a=[99]'
            s13a=ns'a=\{[a]]'
            assert s13a == r'a=[[a]]'
            # except when it doesnt
            s13b=r'a=\{[a]]'
            assert s13b == r'a=\{[a]]'
           
            b=ns"\\["
            assert b==r'\['
            assert b == r"\["
            assert b == ns"\\\{" 
            assert b <> r'\\{'
       
            b2="123 \[ 456"
            assert b2==r"123 [ 456"
            b2='123 \[ 456'
            assert b2==r"123 [ 456"
           
            b3=r'123 \[ 456'
            assert b3==r"123 \[ 456"

hopscc
 
Posts: 632
Location: New Plymouth, Taranaki, New Zealand

Re: Escaping substitution in strings

Postby Charles » Wed Apr 09, 2008 9:56 am

Thanks. I've added .patch to the list of allowed extensions. But I can pull this one from your post. I'll look at it tonight or tomorrow.
Charles
 
Posts: 2515
Location: Los Angeles, CA

Re: Escaping substitution in strings

Postby hopscc » Tue Apr 15, 2008 3:55 am

Did anyone get a chance to look at this and try it out?
Any feedback on whether this is worth keeping or not?
hopscc
 
Posts: 632
Location: New Plymouth, Taranaki, New Zealand

Re: Escaping substitution in strings

Postby Charles » Tue Apr 15, 2008 9:29 am

I'll look at it within the next 5 days.
Charles
 
Posts: 2515
Location: Los Angeles, CA

Re: Escaping substitution in strings

Postby Charles » Sat Apr 19, 2008 6:36 pm

It seems a tad expensive. Also, won't this cause problems if someone uses "\{" in their string, say for a literal { in a regex? c'\0' would be a safer character to use, but I'm wondering if there is a different approach to this.
Charles
 
Posts: 2515
Location: Los Angeles, CA

Re: Escaping substitution in strings

Postby hopscc » Sun Apr 20, 2008 12:03 am

Its an additional string contents lookup on input for cases that dont have embedded \[ (misses) which
I expect to be the 90% case - Havent checked but I doubt its noticeable. There are possibly faster ways of checking than the one I used but this seems adequate

yes it will cause problems if there is an explicitly set \{ in the string and a user expects to see \{ in the output
However the conversion isnt done for raw ( or ns strings(?)) which I'd expect regexes to be usually (otherwise you will have more problems than this).

I chose \{ simply because it looks 'similar' to '[' with the idea that it just gets treated like any other escape sequence ( like \n, \r etc); just one that maps onto a literal [',
so you can also get '[' in your output string using \{ as well as \[ if you would prefer to only see expanded expressions enclosed in [] ( i.e its a feature :) )
- any other character would work as well..
(conversion to a weird high value unicode character perhaps - i tend to stay away from user of c'\0' for anything other than a terminator)

There are different (better,cleaner) approaches ( thats why I called this a hack) but SFAICT it requires a whole lot more cleverness in the string matching token patterns
than I could get by augmenting the existing string token regexes and minimally changing the code....

A cleaner solution (for this at least) would be switching the tokeniser to recognise whole strings and pass them to a stream based (rather than regex)
sub tokeniser but I didnt really feel up to that much of a rewrite of the tokeniser code just on the pretext of supporting this (at this point anyway).

Its a hack, its a placeholder and it gives (mostly ) the desired effect.
hopscc
 
Posts: 632
Location: New Plymouth, Taranaki, New Zealand

Re: Escaping substitution in strings

Postby Charles » Thu Apr 24, 2008 12:54 am

I did a little research and found that regexes support "negative lookbehind":
r'STRING_START_DOUBLE   "[^"\n\[]*(?<!\)\[',
r'STRING_PART_DOUBLE \][^"\n\[]*(?<!\)\[',

The new part is the (?<!\\) which says that if the last char was a backslash then don't match.

This works fairly well. print 'blah \[foo]" works.

However this breaks down on a case like "blah \[blah] blah [foo]" so isn't ready for prime time yet.

But we're getting close. Maybe an OR and another regex will solve it.

Another entirely different approach is to make a subclass of TokenDef and implement .match by hand. That hasn't been done before so a little work would be required to get the custom token def in the list, but its doable.

-Chuck
Charles
 
Posts: 2515
Location: Los Angeles, CA

Next

Return to Discussion

Who is online

Users browsing this forum: No registered users and 7 guests