Forums

Multiline strings

General discussion about Cobra. Releases and general news will also be posted here.
Feel free to ask questions or just say "Hello".

Re: Multiline strings

Postby Charles » Sun Aug 30, 2009 8:51 pm

Regarding the syntax design:

-- I think I like your the idea that if the string contents are not indented then no indentation is expected or required. If they are indented then the indentation is stripped. I suppose someone might want the indentation in the resulting string though, and it's not clear how Cobra would know when. This really came up in my contract work where I wrote a Python program to generate C# code. Some of the strings were indented enough that under this rule, Cobra would assume that the indentation was to be stripped, when in fact, that was not desired at all.

-- I still like my idea that the string starts on the following line, not the same line as the opening quote. Otherwise the contents get skewed.

-- It's not clear why rawness is being conflated with multilineness; they are separate issues. From the Python interactive prompt:
>>> """
... foo
... \t
... bar
... """
'\nfoo\n\t\nbar\n'
>>> r"""
... foo
... \t
... bar"""
'\nfoo\n\\t\nbar'
>>>

I can't see any advantage in combining the two. C# combines them, but in using both C# and Python, I have not found any advantage to the C# approach.

-- I prefer the use of triple quotes for multiline strings.

-- Sorry, but I dislike the R'+| syntax. Not only is it a bit more arcane and symbolic that Cobra normally is, but I don't see an advantage over this proposal:
a = ' here is line 1 ' _
' here is line 2 ' _
' line3'
# or
b = _
' here is line 1 ' _
' here is line 2 ' _
' line3'

If I'm missing something, let me know.

-- One further thought on indented contents. Perhaps if non-whitespace string contents start on the same line as the opening """ or ''' then no indentation is expected, otherwise it is:
a = """this does
not
require indentation"""
b = """
this does
require indentation
""""

Those two strings would be the same except for the extra line of "not" in first.

-- We haven't dealt with newlines varying between platforms. Windows is \r\n while Mac, Linux and friends are just \n. Do we use the newline of the compiler's platform? Or do we fetch the newline at run-time so that moving a .NET/CLR program from one platform to the other will respect the local convention? This question also comes up for the \N proposal.
Charles
 
Posts: 2515
Location: Los Angeles, CA

Re: Multiline strings

Postby hopscc » Mon Aug 31, 2009 1:11 am

Right - covering your points:
idea that if the string contents are not indented then no indentation is expected or required

Thats not quite it - the trigger for indentation being stripped is that the line after the opening line is indented exactly one level more than the opening line...
this serves as an implicit trigger that indentation removal (on all lines) to that level is to be done - as I noted its a convenience for what I envisage as the most common
formatting choice.
You can suppress the indentation auto removal by merely adjusting the first line indentation (back to flush left or indented same as opening line or indented more to the right)

If it turns out that you actually want the indentation given by the natural one level in indentation you can get that with the patch using the alternative (+|) processing if not use
the '|' edge bars on edges within content
e.g.
Code: Select all
assert  R'+|
    indented
    indented' == r'\tindented\n\tindented'
..
Actually it could be setup so that explicit is better so could support another option to explicitly enable/disable the auto indent removal (see below...)

start string on following line

yes but for short multiline strings it adds an extra unused code line which isnt part of the resulting formatting ( 2 lines in result from 3 lines in code)
... I made up a page of examples to see what it looked like.. It seemed to me the skewedness is only really apparent for a smallish number of following lines ( 2-6) - for 2 (opening line + 1 following) its not really apparent and after about 5-6 after the opening line the formatting of the following lines is what the eye is drawn to (YMMV).
The other point with this is that thats what the '+|' option setting syntax on the first line gives you anyway

rawness and multilineness

I'm conflating this because this isnt a implementation/proposal for all the existing varieties of single string lines supporting multiline - its specifically for the equivalent of a multiline raw string
given that - for (at least) a raw string the only difference single vs multiline is capturing the embedded newline ( and some parsing issues (:-))
i.e the raw string should have supported multiline/embedded newline from the get-go
(as a comparison consider the example of some imaginary string literal variants ; one that supports \t for tab char insertion expansion but doesnt allow an embedded Tab character and one that doesnt support either - seems an artificial and unnecessary distinction right?)

triple quotes for multiline strings

Yeah well as already noted my preference would be to fold the
Code: Select all
R'...'
into the existing raw string syntax
Code: Select all
r'...'
.
The caps R is just to indicate that its still a raw string

Hmm are you considering that supporting the existing string literals syntax in multiline versions is a reasonable way to go ?
(in both
Code: Select all
" and '
delimited variants )
Code: Select all
r""" ..."""   # multiline raw  vs r""
"""..."        # multiline string (with and without subs vs "" single line
ns"""..."""  # non substitutable multiline vs ns""
sharp""" ..."""     # multiple csharp codelines vs sharp''
c"""....."""         # character stream ((:-)  vs c'.'

that seems a little .... multitudinal

R'+| syntax

its actually
R'
syntax - like
r'', ns'', sharp""
exactly as arcane and symbolic as the existing string literals
- the 'R' is cos its like r'' ( raw) but bigger ( and r'' is already used which brings us back to the conflation thing above)
The '+|\n' part is an option switch that sez we want to so some different/additional processing from the default (simpler) processing
= '+' turnOn '|' bar formatting processing
I dont really have much of a preference on how internal option changes could be indicated ( see patches for trace expr and position suppression) - thats as simple and
obvious as any
e.g for explicitly specifying indentation auto removal any of : '+ir', -i, -indentation, .... on opening line.
and I think its preferable to a entirely new syntax that does the same thing ( as multiline raw) with some additional post processing

advantage VS the single quoted line continued proposal

- explicit embedded newline ( within the terminators) - raw - whats there is what you get
- No line continuation hackery
- '|' bars are optional either side
- easy to explain, obvious to see result (less visual clutter)
- easy to implement

--Re triggering off use of opening line vs stepping to after newline for content for indent auto prune
yeeees maybe I'll have to think about that a bit more....
OTTOMH tho' it precludes any sort of option settings and
for a raw string version it requires a bit of explanation as to why a raw string is having some ( the first) newline
suppressed (like the explanation around '|' processing)

Lets take the varying newline thing to a new topic.
hopscc
 
Posts: 632
Location: New Plymouth, Taranaki, New Zealand

Re: Multiline strings

Postby Charles » Thu Sep 03, 2009 1:04 am

hopscc wrote:yes but for short multiline strings it adds an extra unused code line
-- My multiline strings haven't typically been short because ... well ... they're multiline! I'm not sure we should optimize for the case of short ones.

-- Forget about sharp'foo'. The current form is sharp('foo') and the old form is just there so code doesn't break. Plus the use of sharp('foo') has become increasingly rare over time.

-- Forget about c"""...""". No multiline characters.

-- That leaves 'r' and 'ns' as the prefixes for strings. I say that R'+| ... is "a bit more arcane" because of the "+|" infix code which is new.

hopscc wrote:- explicit embedded newline ( within the terminators) - raw - whats there is what you get
- No line continuation hackery
- '|' bars are optional either side
- easy to explain, obvious to see result (less visual clutter)
- easy to implement

How is line continuation more hackery than +| |...| ?? Line continuation is already there so you'll end up learning it anyway. Ditto for string literals. So I don't see anything hard to explain or implement about the use of contiguous string literals.

hopscc wrote:for a raw string version it requires a bit of explanation as to why a raw string is having some ( the first) newline
suppressed (like the explanation around '|' processing)

Okay, so the tutorial will point out that when a multiline string is kicked off, you can start the contents right away or you can start them on the next line which sets the indentation level for remaining string contents. And this is true regardless of prefix (r, ns or none).

You're still fixed on "raw" taking first precedence and gobbling up everything. I'm still fixed on (a) there being single-line strings and multi-line strings that (b) may have a prefix of 'r', 'ns' or none. If 'r', then \ and [ are not treated special. If 'ns' then [ is not treated special.

Btw I wouldn't mind a single character instead of "ns", which stands for "no substitution", but I never came up with a good one.
Charles
 
Posts: 2515
Location: Los Angeles, CA

Re: Multiline strings

Postby hopscc » Thu Sep 03, 2009 5:26 am

optimize for the case of short ones (multiline strings)

Its not an optimization though - its dropping (or not) a standard whitespace character in one (set of) variants for a string literal - where else do we do that?

I've already mentioned the "+|" setting as a behaviour modifying switch and the reasoning for it ( would '-with-bar-formatting' be more palatable?)
Code: Select all
R'+with-bar-formatting
    | content     ...x|'


Re the string prefixes:
fine - we only have 3 variants ( rather than 4 ( or 5) ) doubled for single/multiline dichotomy, doubled again for ' and " delimiters - such an improvement
I guess the point was that the different forms for single vs multiline is artificial and unnecessary (particularly for raw strings).

How is line continuation more hackery than +| |...| ??

a) it uses line continuation rather than a single multi line string
b) It artificially breaks up the content ( or appears to) by lines rather than being in one 'chunk'
c) Its close in appearance (1 token difference) to multiline string concatenation which does something entirely different
d) It invisibly inserts whitespace (\n chars) in whats purported to be a "multiline" literal.
e) It requires extra character ( line continuation) to code it which is more detritus obscuring what the actual line output will be

( e) probably isnt a hackery point - its just another reason why its not a good idea (:-) )


Code: Select all
# re c) above
a='                  dshkjhdkhdj'+_
   'dskjhjksd jhdsjkhsjhjsdh'+ _
    '  jhsdkjhskjdhjsh sdjhs+'+_
   ' ksdlksdlsdksjdksdk jsd_'_
  +'kldjlksdk ksdjkjsdkds'

a='                  dshkjhdkhdj'_
   'dskjhjksd jhdsjkhsjhjsdh'_
   '   jhsdkjhskjdhjsh sdjhs+'_
   ' ksdlksdlsdksjdksdk jsd_ '_
  ' kldjlksdk ksdjkjsdkds'


Line continuation is already there so you'll end up learning it anyway. Ditto for string literals.

yes and you'll learn that nowhere else (currently at least) does it silently concat (strings) together or insert additional characters into the
character stream.
So this superficially looks like both (familiar) line continuation and string literals but does in fact something else...... ( a nice newbie gotcha)

So I don't see anything hard to explain or implement about the use of contiguous string literals

This doesnt look like a contiguous string literal though - it looks like multiple contiguous literals ( which on a single line would be a syntax error (currently))
except that its not.
It would be preferable if it didnt need explanation - it looked like what it was..Or as close as possible - minimal extraneous characters

I cant think of a good single replacement char for 'ns' either
hopscc
 
Posts: 632
Location: New Plymouth, Taranaki, New Zealand

Re: Multiline strings

Postby agustech » Thu Sep 03, 2009 11:35 pm

bar formating does not add anything useful for multipleline strings as we already have:

Code: Select all
str  = "asdasdasdas"
str += "asdasdasdas"
str += "asdasdasdas"
str += "asdasdasdas"
str += "asdasdasdas"
str += "asdasdasdas"
agustech
 
Posts: 37

Re: Multiline strings

Postby hopscc » Fri Sep 04, 2009 1:11 am

augustech re your example and assertion

a) your example doesnt do the same as the bigraw ( barred or not) multiline string ( its missing embedded newlines)
obviously an easy mistake to make writing (and even easier reading)
b) By that assertion theres no need for any multiline strings at all
( in reality there probably isnt but it would sure be convenient, obvious and more intelligible)
c) Which expresses intent more clearly?
Code: Select all
str  =  "asdasdasdas\n"
str += "asdasdasdas\n"
str += "asdasdasdas\n"
str += "asdasdasdas\n"
str += "asdasdasdas\n"
str += "asdasdasdas\n"

# or
str = R"asdasdasdas
    asdasdasdas
    asdasdasdas
    asdasdasdas
    asdasdasdas"


note that the barred form probably isnt useful unless you want a consistantly formatted text literal aligned within whitespace on each line
i.e for your example
Code: Select all
str =  "   asdasdasdas \n"
str += "   asdasdasdas \n"
str += "   asdasdasdas \n"
str += "   asdasdasdas \n"
str += "   asdasdasdas \n"
str += "   asdasdasdas \n"
hopscc
 
Posts: 632
Location: New Plymouth, Taranaki, New Zealand

Re: Multiline strings

Postby agustech » Sat Sep 05, 2009 7:44 am

I think i did not express myself correctly. I see mainly 2 uses for multiline string:

1. code formating for long lines e.g:
Code: Select all
  #this is just one line of code
  str = "this is a veeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeery long liiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiine\nwith a new line character."

perhaps better as

Code: Select all
  str  = "this is a veeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeery long"
  str += " liiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiine\n"
  str += "with a new line character."


2. real multiline string literals eg:

Code: Select all
  str  = "literal\n"
  str += "indented line\n"
  str += "that skips indented part\n"
  str += "<with>\n"
  str += "  <some>\n"
  str += "    formatted code inside\n"
  str += "  </some>\n"
  str += "</with>\n"


better as:

Code: Select all
  str = """literal
    indented line
    that skips indented part
    <with>
      <some>
        formatted code inside
      </some>
    </with>
    """


or as you suggest

Code: Select all
  str = R"literal
    indented line
    that skips indented part
    <with>
      <some>
        formatted code inside
      </some>
    </with>
    "



What I do not see useful or clear is this:

Code: Select all
  #really ugly
  str ="aaaaaaaaaaaaaaaaaaaaaaaaaaaaa" + _
       "bbbbbbbbbbbbbbbbbbbbbbbbbbbbb" + _
       "ccccccccccccccccccccccccccccc"


or

Code: Select all
  str = R'+|
      | here is line 1 |
      | here is line 2 |
      | line3'


I think this is also ugly (uglier as indent increases):

Code: Select all
  str = """literal
indented line
that skips indented part
<with>
  <some>
    formatted code inside
  </some>
</with>
"""
agustech
 
Posts: 37

Re: Multiline strings

Postby helium » Sun Sep 06, 2009 2:05 am

One very important aspect of multi-line strings is that you often copy & paste them from somewhere else. You can easily mark and indent the pasted block to whatever level you want but to automatically out a |...| or '...' or str += "..." around each line you'd have to come up with some little regex (or you do it manually which would be a total waste of time for a longer string). That would be a totally unnecessary annoyance. Make strings user friendly.
helium
 
Posts: 14

Re: Multiline strings

Postby hopscc » Sun Sep 06, 2009 5:28 am

Use of bar processing is optional
Maybe I should have highlighted this
The '+|\n' part is an option switch that sez we want to do some different/additional processing from the default (simpler) processing
= '+' turnOn '|' bar formatting processing


augustech
We're in agreement about the multiple '....' _ form at least

Your example looks Ok as given as the codeline is 1 indent level in and its desire is for each content line to have a single indent level.
If that wasnt the case or the expression was already well indented you'd lose the code indentation as the
string literal was slammed left to get the desired indent
which I think on rereading it was your last point re 'ugliness'

Code: Select all
                # code indented 4 indent levels
                str = """literal
    indented line
    that skips indented part
    <with>
      <some>
        formatted code inside
      </some>
    </with>"""

Heres the same thing using the bar processing
Code: Select all
                str=R'+|
                    |    literal
                    |    indented line
                    |    that skips indented part
                    |    <with>
                    |      <some>
                    |           formatted code inside
                    |       </some>
                    |    </with>'

Not a particularly good example since you can get the same effect more clearly using the non barred form with the content indented one level from the 'str=' line
so that it auto removes the leadin-line indentation....
It does explicitly show where the desired whitespace starts from though..
even more explicit where you want trailing whitespace on a line - can see precisely where the whitespace ends
Code: Select all
                # All the lines below are same length (trailing whitespace).
                str=R'+|
                    |    literal                     |
                    |    indented line               |
                    |    that skips indented part    |
                    |    <with>                      |
                    |      <some>                    |
                    |           formatted code inside|
                    |       </some>                  |
                    |    </with>                      '   
hopscc
 
Posts: 632
Location: New Plymouth, Taranaki, New Zealand

Re: Multiline strings

Postby Charles » Wed Sep 09, 2009 2:16 am

Actually I didn't mean to suggest that putting the string literals adjacent to each other would automatically add \n (or the infamous \N), so really I should have written:
s = 'foo\n' _
'bar'

And the advantage over saying "s +=" for subsequent lines is performance. "s +=" happens at run-time with extra allocations for each intermediate string.

And I didn't mean to imply that the concatenation would only come if the literals were on separate lines. This feature is so common, I just thought more people already knew about it:
Code: Select all
# Python
>>> s = 'a' 'b'
>>> s
'ab'
>>>

# Ruby
irb> s = 'a' 'b'
=> "ab"
irb>

# C, C++, Objective-C
const char *text =
  "This text is pretty long, but will be "
  "concatenated into just a single string. "
  "The disadvantage is that you have to quote "
  "each part, and newlines must be literal as "
  "usual.";

Some Cobra newbies may require no explanation at all, having already gained familiarity with one of the above languages.
Charles
 
Posts: 2515
Location: Los Angeles, CA

PreviousNext

Return to Discussion

Who is online

Users browsing this forum: No registered users and 10 guests