Forums

before,after and token

General discussion about Cobra. Releases and general news will also be posted here.
Feel free to ask questions or just say "Hello".

before,after and token

Postby hopscc » Sun Jul 31, 2011 11:56 pm

I see you've rolled before and after (String Extensions) into the Cobra library String Extensions.
This is great but theres another one that goes with and is built on those two called token.

Basically it pulls a token up to a separator string off the front of a string and returns the token (if any) and the remainder of the string.
allowing such simplified inanities as

s = 'command:=p1,p2,p3' # a line fulled from a config file , say
tag = s.token(':=', out s)
assert tag == 'command' and s.startsWith('p1')
arg = s.token(',', out s)
assert arg == 'p1'
arg = s.token(',', out s)
assert arg == 'p2'
arg = s.token(',', out s)
assert arg == 'p3'
assert not s.length


Heres the implementation and tests

My original version had just token returning a single value and using an out variable
but I thought it may be a better direction to take to use the cobra multi return value capability so
theres one doing that as well ( tokenl) - your call as to which if any is preferable.

namespace Cobra.Lang
extend String
# ... before and after

def token(sep as String, rest as out String) as String
"""
Return substring before 'sep' and set the remainder of the string
after the separator into 'rest'.
If no separator in string return the entire string and set rest to an
empty string.
"""
test
s = 'cmd1:p1,p2'
c = s.token(':', out s)
assert c == 'cmd1' and s.startsWith('p1')
arg = s.token(',', out s)
assert arg == 'p1'
arg = s.token(',', out s)
assert arg == 'p2'
assert not s.length

s = 'cmd1: p1, p2'
c = s.token(': ', out s)
assert c == 'cmd1' and s.startsWith('p1')
arg = s.token(', ', out s)
assert arg == 'p1'
arg = s.token(', ', out s)
assert arg == 'p2'
assert not s.length

s0='pp,qq'
ss,s = s0.tokenl('=')
assert ss == 'pp,qq'
assert ss == s0
assert not s.length


body
t = .before(sep)
rest = .after(sep)
return t

def tokenl(sep as String) as List<of String>
"""
Return 2 element list, first element is first token from the string
up to the separator and the second element is the remainder of
the string after the separator.
If no separator in string returns the entire string as first element and
an empty string as second element.
"""
test
s = 'cmd1: p1, p2, p3'
c,s = s.tokenl(':')
assert c == 'cmd1'
assert s.startsWith(' p1')
s1,s = s.tokenl(',')
assert s1 == ' p1'
s2,s = s.tokenl(',')
assert s2 == ' p2'
s3,s = s.tokenl(',')
assert s3 == ' p3'
assert not s.length
s3,s = s.tokenl(',')
assert not s3.length
assert not s.length

s='pp,qq'
s3,s = s.tokenl('=')
assert s3 == 'pp,qq'
assert not s.length
body
t = .before(sep)
s = .after(sep)
return [t,s]


# tests and examples
class Entry
def main is shared
Entry.testTkn

def testTkn is shared, private
s = 'filename.exe'
assert s.before('.') == 'filename'
assert s.after('.') == 'exe'
s = 'filename1'
assert s.before('.') == 'filename1'
assert s.after('.') == ''
s = '.ext'
assert s.before('.') == ''
assert s.after('.') == 'ext'
s = 'wookie'
assert s.before('.') == 'wookie'
assert s.after('.') == ''
assert 'exeter.exe'.before('.') + '.pag' == 'exeter.pag'
sw='sent1.sent2'
assert sw.after('.') +'.' + sw.before('.') == 'sent2.sent1'

s = 'command:=p1,p2,p3'
c = s.token(':=', out s)
assert c == 'command' and s.startsWith('p1')
arg = s.token(',', out s)
assert arg == 'p1'
arg = s.token(',', out s)
assert arg == 'p2'
arg = s.token(',', out s)
assert arg == 'p3'
assert not s.length

s = 'this is a set of word tokens' # 1 spc seperated
res = ['this', 'is', 'a', 'set', 'of', 'word', 'tokens']
i =0
while true
w = s.token(' ', out s)
if not w.length, break
assert w == res[i]
i += 1
assert not s.length

s = 'simple,comma,sep words,and, phrases'
while (w = s.token(',', out s)).length
assert w in {'simple', 'comma', 'sep words', 'and', ' phrases'}
assert not s.length
hopscc
 
Posts: 632
Location: New Plymouth, Taranaki, New Zealand

Re: before,after and token

Postby Charles » Mon Aug 01, 2011 9:08 am

When I see a method with "token" in the name, I expect it to take a regex (or token class id) to match against. The method above is just doing a split. Also, coming from Python, I have to say that I would prefer:
tag, s = s.token(':=') 
# which you pointed out was possible

# but we already have that:
tag, s := s.split(':=', 2)

I think using .split is more clear.

The only thing we're missing is a bugfix since Cobra binds to the wrong .split in the above case! It's there in the std lib as an extension method, but Cobra picks a different one.
Charles
 
Posts: 2515
Location: Los Angeles, CA

Re: before,after and token

Postby hopscc » Mon Aug 01, 2011 10:40 am

I disagree about it just doing a split and split being more clear.
It (token) breaks a substring ( a delimited token) off the front of a string which you can contort split to do but split notionally breaks up a string on all occurrences of the separator not just the first one.

Its a moot comparison anyway if the implementation for split isnt working....

casting back ( to c) perhaps its not so much a token method as a complemented break ( strpbrk vs strtok).
These came from an old c language implementation hence the 'token' naming.
# instead of  token /  tokenl
tag, s = s.break(':=')
hopscc
 
Posts: 632
Location: New Plymouth, Taranaki, New Zealand

Re: before,after and token

Postby torial » Mon Aug 01, 2011 4:10 pm

hopscc wrote:
casting back ( to c) perhaps its not so much a token method as a complemented break ( strpbrk vs strtok).
These came from an old c language implementation hence the 'token' naming.
# instead of  token /  tokenl
tag, s = s.break(':=')


I would also suggest nvp (ie namevaluepair)
name, value = s.nvp(':=')


It is a common enough paradigm that I use, that whatever it is called, it will be useful -- break or split is better than token (IMO).
torial
 
Posts: 229
Location: IA

Re: before,after and token

Postby Charles » Mon Aug 01, 2011 4:49 pm

hopscc wrote:split notionally breaks up a string on all occurrences of the separator not just the first one.

That's not correct. C#, Java, Python and Ruby all provide an optional 2nd argument to limit the splitting and have for some time.

hopscc wrote:Its a moot comparison anyway if the implementation for split isnt working....

Well you can guess what I'm working on right now. :)

The plot thickens... If I call a different string extension method like .repeat, before the .split, then the correct .split overload is chosen and the program works fine. Obviously I want something like this fixed before it causes further distractions.
Charles
 
Posts: 2515
Location: Los Angeles, CA

Re: before,after and token

Postby hopscc » Tue Aug 02, 2011 3:51 am

That's not correct. C#, Java, Python and Ruby all provide an optional 2nd argument to limit the splitting and have for some time.


That is correct. the notional (and simplest and shortest form) of all these splits on all occurrences. The presence of variants (additional parameters, more complex forms) to modify, adjust or limit this doesnt change that that is the base notion.
hopscc
 
Posts: 632
Location: New Plymouth, Taranaki, New Zealand

Re: before,after and token

Postby Charles » Tue Aug 02, 2011 8:38 am

???

If there are overloads available that do what you want by simply adding a "2" then you don't need a new method with new name to do the same thing.

I think you just need to sleep on it some more. :D
Charles
 
Posts: 2515
Location: Los Angeles, CA

Re: before,after and token

Postby Charles » Fri Aug 05, 2011 11:00 pm

Charles wrote:When I see a method with "token" in the name, I expect it to take a regex (or token class id) to match against. The method above is just doing a split. Also, coming from Python, I have to say that I would prefer:
tag, s = s.token(':=') 
# which you pointed out was possible

# but we already have that:
tag, s := s.split(':=', 2)

I think using .split is more clear.

The only thing we're missing is a bugfix since Cobra binds to the wrong .split in the above case! It's there in the std lib as an extension method, but Cobra picks a different one.

This is fixed now.
Charles
 
Posts: 2515
Location: Los Angeles, CA


Return to Discussion

Who is online

Users browsing this forum: No registered users and 38 guests