Ticket #145 (accepted enhancement)

Opened 9 years ago

Last modified 9 years ago

complete the collection of string.split extension methods

Reported by: jonathandavid Owned by: jonathandavid
Priority: minor Milestone:
Component: Cobra Compiler Version: 0.8.0
Keywords: string, split Cc:


.NET's core string class has versions that take the separators as a char[].

Cobra's January 09 update adds extension methods that take a List<of char> separator instead, which is more convenient.

Currently missing:

* All the versions that take the max. number of tokens in addition to the separators.

* Versions that take the separator as a string (this is the easiest to use).

* Versions that take the separator as char* (this is the most generic way, the more specific ones would be left mainly for efficiency). The char* version would probably lead us to remove the one that currently takes a IList<of T>.

See forum thread:

Change History

Changed 9 years ago by jonathandavid

I'm working on this, but the work on Ticket 146 has distracted me.

My current opinion is we should only add one new version of split, taking the separators as a char*. This way, there will be two versions:

* The one in the BCL, in which the separators are "vari char"

* The one added by Cobra, that will accommodate any collection of char that is used to pass the seperators (String, Set<of char>, List<of char>).

This would imply removing the current versions added by Cobra that take a List and an IList (this will not break any existing code, because char* can take Lists and ILists).

Another possible approach is to define specialized versions for the different types of containers (one for List, another for IList, another for String, etc.), but I don't think that the benefits in terms of performance pay off for the burden of having so many overloads (remember that for each version, we have to write several "subversions", e.g. one that takes the max. number of tokens, one that take SplitOptions?, etc.). Note that the inefficiency is small because the separators are usually very few (i.e. we are talking about collections of 1, 2, 3... elements, rarely more)

Examples of use in case we go the way I suggest:

str.split(c':') # vari version
str.split(@[c':']) # vari version
str.split([c':']) # char* version
str.split(':') # char* version

Changed 9 years ago by jonathandavid

Another example of use:

str.split() # vari version

Changed 9 years ago by jonathandavid

It seems that the BCL offers another overload of Split that takes a String[], and which allows the use of string (as opposed to char) separators. This version is the true equivalent of Python's split, which uses a string separator.

Anyway, since there is String[] version, I no longer think that we should a split(String) version in which the chars forming the string are treated as independent separators. The reason is that it would be confusing that split(bar?) would use "bar" as separator, where as split("bar") would use "b", "a" or "r".

Therefore, I think the best option is to go ahead with adding split(char*), but add another overload that takes a String and that is functionally equivalent to passing a list with one element, that string:

str.split("bar") # one separator, foo
str.split(bar?) # one separator, foo
str.split("bar".toCharArray) # three char separators, b a and r.

Changed 9 years ago by Chuck

I agree with your final conclusion in your last comment except that I may still want the List and IList overloads for performance. I know it's extra maintenance, but it's fairly stable code once written and I would like Cobra to be as fast as reasonable possible at such string processing.

Changed 9 years ago by jonathandavid

  • status changed from new to accepted

Changed 9 years ago by jonathandavid

OK, we'll have all the different versions. They're for the most part one-liners, so no big deal.

One concern I have is that the BCL versions will return String[] where us ours will return List<of String>. Won't that be confusing? Shouldn't our added versions return String[] as well? I wish .NET wasn't so array-biased...I wonder how they even manage to make split return a String[], what do they do to know the size on advance? I hope they don't build a list internally and then convert it to array!

Changed 9 years ago by Chuck

If the arguments are generic then the return result should be generic as well. Even if that is somewhat inconsistent. The idea is that users of generics should be punished with arrays.

Regarding their internal implementation, I don't know. They probably resize the array when its capacity limit is hit which is what the list class in various OO libraries usually does internally anyway. If you make the resize a multiple like 1.5 or 2.0 of the current size then the overhead of resizing is negligible.

Changed 9 years ago by jonathandavid

If the arguments are generic then the return result should be generic as wel

I'm afraid I don't follow you. Please remember that I was talking about String.split, which returns a collection with all the tokens (strings). For me String[] is just as generic as List<of String>, so your argument does no seem to apply here. Or I'm I missing your point?

Anyway, I understand from your answer that you want all our new overloads to return List<of String>, even if the BCL original method returns String[]. Is that correct?

Changed 9 years ago by Chuck

Correct. What kind of thing goes in should come back out.

When I use the term "generic" I mean in the technical .NET sense. So IList<of T> is generic, and T[] is array. IList<of T> is not an array, and T[] is not generic.

Note: See TracTickets for help on using tickets.