before,after and token

by **hopscc** » Sun Jul 31, 2011 11:56 pm

I see you've rolled before and after (String Extensions) into the Cobra library String Extensions.
This is great but theres another one that goes with and is built on those two called token.

Basically it pulls a token up to a separator string off the front of a string and returns the token (if any) and the remainder of the string.
allowing such simplified inanities as

s = 'command:=p1,p2,p3' # a line fulled from a config file , say
tag = s.token(':=', out s) 
assert tag == 'command' and s.startsWith('p1')
arg = s.token(',', out s)
assert arg == 'p1'
arg = s.token(',', out s)
assert arg == 'p2'
arg = s.token(',', out s)
assert arg == 'p3'
assert not s.length

Heres the implementation and tests

My original version had just token returning a single value and using an out variable
but I thought it may be a better direction to take to use the cobra multi return value capability so
theres one doing that as well ( tokenl) - your call as to which if any is preferable.

namespace Cobra.Lang
    extend String
                # ... before and after 
    
        def token(sep as String, rest as out String) as String
            """
            Return substring before 'sep' and set the remainder of the string 
            after the separator into 'rest'.
            If no separator in string return the entire string and set rest to an
            empty string.
            """
            test
                s = 'cmd1:p1,p2'
                c = s.token(':', out s) 
                assert c == 'cmd1' and  s.startsWith('p1')
                arg = s.token(',', out s)
                assert arg == 'p1'
                arg = s.token(',', out s)
                assert arg == 'p2'
                assert not s.length

                s = 'cmd1: p1, p2'
                c = s.token(': ', out s) 
                assert c == 'cmd1' and s.startsWith('p1')
                arg = s.token(', ', out s)
                assert arg == 'p1'
                arg = s.token(', ', out s)
                assert arg == 'p2'
                assert not s.length

                s0='pp,qq'
                ss,s =  s0.tokenl('=')
                assert ss == 'pp,qq'
                assert ss == s0
                assert not s.length
            
                
            body
                t = .before(sep)
                rest = .after(sep)
                return t
        
        def tokenl(sep as String) as List<of String>
            """
            Return 2 element list, first element is first token from the string 
            up to the separator and the second element is the remainder of 
            the string after the separator.
            If no separator in string returns the entire string as first element and 
            an empty string as second element.
            """
            test
                s = 'cmd1: p1, p2, p3'
                c,s = s.tokenl(':') 
                assert c == 'cmd1'
                assert s.startsWith(' p1')
                s1,s = s.tokenl(',')
                assert s1 == ' p1'
                s2,s = s.tokenl(',')
                assert s2 == ' p2'
                s3,s = s.tokenl(',')
                assert s3 == ' p3'
                assert not s.length 
                s3,s =  s.tokenl(',')
                assert not s3.length
                assert not s.length
        
                s='pp,qq'
                s3,s =  s.tokenl('=')
                assert s3 == 'pp,qq'
                assert not s.length
            body
                t = .before(sep)
                s = .after(sep)
                return [t,s]
            

# tests and examples
class Entry
    def main is shared
        Entry.testTkn
        
    def testTkn is shared, private
        s = 'filename.exe'
        assert s.before('.') == 'filename'
        assert s.after('.') == 'exe'
        s = 'filename1'
        assert s.before('.') == 'filename1'
        assert s.after('.') == ''
        s = '.ext'
        assert s.before('.') == ''
        assert s.after('.') == 'ext'
        s = 'wookie'
        assert s.before('.') == 'wookie'
        assert s.after('.') == ''
        assert 'exeter.exe'.before('.') + '.pag' == 'exeter.pag'
        sw='sent1.sent2'
        assert sw.after('.') +'.' + sw.before('.') == 'sent2.sent1'
    
        s = 'command:=p1,p2,p3'
        c = s.token(':=', out s) 
        assert c == 'command' and s.startsWith('p1')
        arg = s.token(',', out s)
        assert arg == 'p1'
        arg = s.token(',', out s)
        assert arg == 'p2'
        arg = s.token(',', out s)
        assert arg == 'p3'
        assert not s.length
        
        s = 'this is a set of word tokens' # 1 spc seperated
        res = ['this', 'is', 'a', 'set', 'of', 'word', 'tokens']
        i =0
        while true
            w = s.token(' ', out s)
            if not w.length, break
            assert w == res[i]
            i += 1
        assert not s.length
        
        s = 'simple,comma,sep words,and, phrases'
        while (w = s.token(',', out s)).length
            assert w in {'simple', 'comma', 'sep words', 'and', ' phrases'}
        assert not s.length

by **Charles** » Mon Aug 01, 2011 9:08 am

When I see a method with "token" in the name, I expect it to take a regex (or token class id) to match against. The method above is just doing a split. Also, coming from Python, I have to say that I would prefer:

tag, s = s.token(':=') 
# which you pointed out was possible

# but we already have that:
tag, s := s.split(':=', 2)

I think using .split is more clear.

The only thing we're missing is a bugfix since Cobra binds to the wrong .split in the above case! It's there in the std lib as an extension method, but Cobra picks a different one.

by **hopscc** » Mon Aug 01, 2011 10:40 am

I disagree about it just doing a split and split being more clear.
It (token) breaks a substring ( a delimited token) off the front of a string which you can contort split to do but split notionally breaks up a string on all occurrences of the separator not just the first one.

Its a moot comparison anyway if the implementation for split isnt working....

casting back ( to c) perhaps its not so much a token method as a complemented break ( strpbrk vs strtok).
These came from an old c language implementation hence the 'token' naming.

# instead of  token /  tokenl
tag, s = s.break(':=')

by **torial** » Mon Aug 01, 2011 4:10 pm

hopscc wrote:
casting back ( to c) perhaps its not so much a token method as a complemented break ( strpbrk vs strtok).
These came from an old c language implementation hence the 'token' naming.
# instead of  token /  tokenl
tag, s = s.break(':=')

I would also suggest nvp (ie namevaluepair)

name, value = s.nvp(':=')

It is a common enough paradigm that I use, that whatever it is called, it will be useful -- break or split is better than token (IMO).

by **Charles** » Mon Aug 01, 2011 4:49 pm

hopscc wrote:split notionally breaks up a string on all occurrences of the separator not just the first one.

That's not correct. C#, Java, Python and Ruby all provide an optional 2nd argument to limit the splitting and have for some time.

hopscc wrote:Its a moot comparison anyway if the implementation for split isnt working....

Well you can guess what I'm working on right now.

The plot thickens... If I call a different string extension method like .repeat, before the .split, then the correct .split overload is chosen and the program works fine. Obviously I want something like this fixed before it causes further distractions.

by **hopscc** » Tue Aug 02, 2011 3:51 am

That's not correct. C#, Java, Python and Ruby all provide an optional 2nd argument to limit the splitting and have for some time.

That is correct. the notional (and simplest and shortest form) of all these splits on all occurrences. The presence of variants (additional parameters, more complex forms) to modify, adjust or limit this doesnt change that that is the base notion.

by **Charles** » Tue Aug 02, 2011 8:38 am

???

If there are overloads available that do what you want by simply adding a "2" then you don't need a new method with new name to do the same thing.

I think you just need to sleep on it some more.

by **Charles** » Fri Aug 05, 2011 11:00 pm

Charles wrote:When I see a method with "token" in the name, I expect it to take a regex (or token class id) to match against. The method above is just doing a split. Also, coming from Python, I have to say that I would prefer:
tag, s = s.token(':=') 
# which you pointed out was possible

# but we already have that:
tag, s := s.split(':=', 2)
I think using .split is more clear.

The only thing we're missing is a bugfix since Cobra binds to the wrong .split in the above case! It's there in the std lib as an extension method, but Cobra picks a different one.

This is fixed now.

before,after and token

before,after and token

Re: before,after and token

Re: before,after and token

Re: before,after and token

Re: before,after and token

Re: before,after and token

Re: before,after and token

Re: before,after and token

Who is online