Forums

MultiList

General discussion about Cobra. Releases and general news will also be posted here.
Feel free to ask questions or just say "Hello".

Re: MultiList

Postby Charles » Fri Jul 27, 2012 8:45 pm

Re: staging, if you think it's not ready for a patch, you could put it up on bitbucket or github.

Re: .enumerate, thanks for the clarification. Btw I wonder if there should be a public .generateIndices. A user of multi-list may want the indices for assignment:

for index in ml.indices
ml[index] = .foo(ml[index])
# or
for index in ml.indices
ml[index] = ml[index].bar

Re: numpy speed, it probably gets an initial boost from being compiled C code, but then gets a drag from operating with the Python interpreter. My guess is that micro-benchmarks on mult-lists/arrays will be faster in numpy, but that in practice, the Cobra applications would be faster.

Re: LINQ, it is not yet available, though we have some similar extension methods on IEnumerable and IList, as well as list comprehensions, so the demand has not been very high. I am aware of the need to support LINQ in the long term and it's fairly high in my priorities.

Re: LINQ syntax, I'm really glad to hear someone else say that they don't like the syntax mixing in C# as I've always felt this way as well. Re: Cobra we'll start with things like picking up C# extension methods (Cobra currently only picks up its own), lambda arg type inference, etc. and see how we like the result. If Cobra does add syntactic sugar for LINQ, I would prefer something that can work with the extension methods such that they receive uniform treatment.

Re: zip with a selector function, I also noticed that .NET/LINQ has one, but Python does not. Did the .NET guys think it was valuable for performance or something? It doesn't seem strictly necessary.

Re: arrays vs. lists, they both share a common ancestor: IList<of T>. However, the Cobra compiler does not currently recognize this for arrays because when I enable that, problems with method overload resolution create false compilation errors. This is my top priority after the 0.9 release.

For now, I suggest using IList<of T> everywhere and live with the list comprehensions for now. When I fix the method overload resolution, I will fix the array+IList relationship and we can revisit comprehensions.

Re: Python's treatment of:
Code: Select all
ml[a:b][c:d][e:f]
ml[a:b, c:d, e:f]

You said that it treats them both the same. I was guessing that maybe the first line is 3 calls and the 2nd is one call. But you're saying they are both one call? And do these call __slice__ or what? I forget.

Re: operator overloading, I'm sad to report that this does not work so well with generic classes. This appears to be a fundamental limitation of .NET generics. See https://www.google.com/search?q=C%23+op ... g+generics

Consider in your example "_data[i] + other._data[i]" where each operand is of type T. At compile-time, it is unknown if T is an int, decimal, float64, another MultiList, a vector, etc. So the compiler cannot know what code to generate.

We might be able to overcome this problem with some compile-time extra code gen and some warnings about cons that result, but that's a whole other project. Believe me, I've thought about this issue and it's not trivial.

Re: your "oper []" examples, they don't show a difference between getting and setting.
Charles
 
Posts: 2515
Location: Los Angeles, CA

Re: MultiList

Postby jaegs » Sat Jul 28, 2012 10:05 am

MultiList should be in a stable state now - kudos to the language for not making this a pain to implement. See if you have any more comments or suggestions.

OK, I see what you mean about Python. This should be what you are looking for:
Code: Select all
>>> class Foo:
...   def __getitem__(self, *args):
...     print args
...
>>> x = Foo()
>>> x[1]
(1,)
>>> x[1,2,3]
((1, 2, 3),)
>>> x[1][2][3]
(1,)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'NoneType' object is unsubscriptable


I made a property called "indices" that returns _generateIndices. The ML code still calls _generateIndices directly because there are too many other variables named index and indices.

I actually reworked the ML equals method to no longer require zip. Also, in the equals function, is there a way to call == on primitive types and .equals on objects?


I can see how the .NET selector function in Zip could be useful. For example
Code: Select all
 lst.Zip( lst2, (e1, e2) => e1.compareTo(e2));


On the other hand, Python's zip is useful in rotating 2 dimensional arrays 90 degrees. Ex.
Code: Select all
>>> mat = [(1,2,3),(4,5,6),(7,8,9)]
>>> zip(*mat)
[(1, 4, 7), (2, 5, 8), (3, 6, 9)]
Attachments
MultiList.cobra
(14.44 KiB) Downloaded 440 times
jaegs
 
Posts: 58

Re: MultiList

Postby Charles » Sat Jul 28, 2012 10:43 am

Kudos accepted. :)

Btw I get a compilation error:
Code: Select all
MultiList.cobra(497): error: The left and right sides of the "==" expression cannot be equated because of their types ("MultiList<of int>" and "String").

for:
assert ml.transpose.reshape([4,3,2]) == _
r'[[[3,11],[19,5],[13,21]],[[7,15],[23,9],[17,25]],[[4,12],[20,6],[14,22]],[[8,16],[24,10],[18,26]]]'

Easy enough to fix (add .toString).

I'll make some tweaks and check in. Then if you do further dev, you can just work out of a Cobra workspace like so:

-- Check out a Cobra workspace
-- cd CobraWorkspace/Source/Cobra.Core
-- Edit MultiList.cobra
-- Change "namespace Cobra.Core" to "namespace Cobra.CoreX"
-- At the command line, use: cobra -test MultiList.cobra
-- When ready to submit a patch, revert the namespace name and see HowToSubmitAPatch.

I'll post here again when it's done along with a couple minor Cobra tips.
Charles
 
Posts: 2515
Location: Los Angeles, CA

Re: MultiList

Postby Charles » Sat Jul 28, 2012 11:16 am

This is checked in now. As all checkins go through me, I include a "credit:" tag in the checkin comment:

Numerous improves to MultiList including .clone, .fill, .slice, .permute, .reshape and .transpose.
credit:jaegs
Charles
 
Posts: 2515
Location: Los Angeles, CA

Re: MultiList

Postby Charles » Sat Jul 28, 2012 11:24 am

-- You don't need the line continuation character here:
throw IndexOutOfRangeException(_
"data position of [index] plus start of [start] exceeds MultiList count of [.count]")

Like Python, an open paren allows you to continue to the next line.

-- To match the style of the rest of the Cobra compiler and std lib code base, please do the first and not the second:
# different style
def foo( a as int, b as int )

# existing style
def foo(a as int, b as int)
.foo(a, b)
foo[a]

It saves me some time. Of course, in your own projects do whatever you like.

-- FYI, I changed the .toString to put a space after a comma to match what is done for lists and arrays.

-- In pro [indicies as vari int], I changed the contracts to use public members. I think I'll really need to tweak Cobra regarding that.
class ...

pro [indices as vari int] as T
get
require
indices.length == _numDims
body
return _data[_address(indices)]
set
require
indices.length == _numDims
not _isReadOnly
body
_data[_address(indices)] = value
# --->
class ...

pro [indices as vari int] as T
get
require
indices.length == .numDims
body
return _data[_address(indices)]
set
require
indices.length == .numDims
not .isReadOnly
body
_data[_address(indices)] = value


-- In the doc string, this statement doesn't feel informative to someone who is new to MultiList:
"""
All methods that return MultiList<of T> are in place except for slice
which returns a readonly view and clone which returns a shallow copy.
"""
What does it mean that "All methods that return MultiList<of T> are in place" ?

That might be an implementation/historical note. Maybe we could come up with some docs for users. What would you tell a programmer about MultiList regarding how and why to use it?


Finally, great work, jaegs. Thanks for advancing MultiList! It functions like a real multi-list now instead of the basic starter class I put out there.
Charles
 
Posts: 2515
Location: Los Angeles, CA

Re: MultiList

Postby Charles » Sat Jul 28, 2012 11:42 am

The doc string here stood out to me:
class ...
var _isReferred = false
""" If a referrer is GC'ed, _isReferred will still be true """

Basically that var can be incorrect and additionally that var is used later in .reshape. Thinking out loud here:

-- We could add a "var _refCount = 0" which gets incremented. In the gc finalizer, notify the owner which will decrement its ref count. Change the .isReferred propert to "get isReferred as bool; return _refCount > 0".

-- Although this statement bugs me: "The finalizers of two objects are not guaranteed to run in any specific order, even if one object refers to the other. That is, if Object A has a reference to Object B and both have finalizers, Object B might have already finalized when the finalizer of Object A starts." from http://msdn.microsoft.com/en-us/library ... ze(v=vs.90).aspx#Y0 ... although I think this just means an extra check somewhere.

-- Um, Cobra doesn't have finalizers yet. Though we could add "cue finalize" if we wanted to.
Charles
 
Posts: 2515
Location: Los Angeles, CA

Re: MultiList

Postby jaegs » Sat Jul 28, 2012 1:06 pm

:o Unfortunately, I think that's a bad idea.

Well, first a little back story on reshaping, because I'm realizing now that my implementation is slightly off.

Numpy has two methods, np.reshape (link) and np.resize (link).

np.reshape returns a view and requires that the total number of objects (count) in the new shape be the same as the old shape.

np.resize changes the shape in place. The new shape can have a different count. However, if the new count is less than the old count, any ML's with views on the owner might try to access now non-existing locations. Hence Numpy and Cobra check if any views refer to the current ML. np.resize has a refcheck argument to resize even if it is unsafe.

I imagine that the difference between np.resize and np.reshape will confuse most people (including me) and hence I would like to have only one method if possible. For better or for worse I've implemented most methods to be in place and return "this." So it seems the best solution is to rework the .reshape method to the following logic:
reshape if (newCount == oldCount) or ( not .isReadonly and (not .isReferred or unsafe)) and internally the method will copy _data if (newCount <> oldCount) or .isPermuted

If you want a view with a new shape then call ml.slice.reshape(shape). If you want a (shallow) copy with a new shape then call ml.clone(shape)

Back to finalize --
ml1 = new MultiList<of int>([2, 2], [1, 2, 3, 4]) #obj1
ml2 = ml1.slice(Pair<of int>(1, 2), Pair<of int>(1, 2) #obj2
ml2 = new MultiList<of int>([2, 2]) #obj3
ml1.reshape([1,2])


So according to the refcount idea, ml1.isReferred will be false once obj2 is finalized. I believe "The finalizers of two objects are not guaranteed to run in any specific order, even if one object refers to the other." means that there is no guarantee when obj2 will be finalized. If the GC is quick as finalizes obj2 before the last line, then the reshape call will succeed. If the GC is busy then the refcount will still be 1 and reshape will fail. The behavior is nondeterministic.

Similar to Numpy, I added in an "unsafe" argument so that the coder can override the referral check if they know that they will no longer be using the view.

I don't think finalize should be a cue because it is a frequently misused method. I may be wrong, but I think the only time finalize should be used is when dealing with native code (Ex. using JNI in Java) and even then there is potential to misuse it.
Some links:
http://kenwublog.blogspot.com/2008/06/a ... -in-c.html
http://www.javaworld.com/jw-06-1998/jw- ... iques.html

I'm not sure if you saw this question from a post back
Also, in the equals function, is there a way to call == on primitive types and .equals on objects?

Maybe I should use a switch statement.

I'll look into making some sample code and a better introduction for the ML class.
jaegs
 
Posts: 58

Re: MultiList

Postby Charles » Sat Jul 28, 2012 2:34 pm

-- I find the difference between numpy's resize and reshape to be natural. Basically, .resize is pretty much what I'm expecting based on the name. That leaves .reshape as an optimized version for those cases where the total count will remain the same.

-- Regarding the fact that the GC does not guarantee when it will collect a mult-list view, we could add a .destroy method that would basically kill its _owner connection and various fields. But at this point, I'm probably over-thinking this. Outside of the resize+reshape question, further enhancements should probably be born out of real world anger, I mean, application.

-- Sorry I forgot about your equals question. Cobra treats == and <> in a higher level fashion than C#. It will "do the right thing" whether the objects in question are ints, strings, lists, dictionaries, etc. We have test cases to back this.

-- Also, let me show you a trick. Keep in mind that the code in question for == is on line 373. So I tweaked the namespace per my earlier instructions and issued this command:
Code: Select all
cobra -kif -test MultiList.cobra

-kif is the abbrev. for -keep-intermediate-files which for the C# back-end means the .cs files. I then opened MultiList.cobra.cs and searched for "line 373" which gives:
Code: Select all
#line 373
   if (!(CobraCoreInternal.CobraImp.Equals((this[indices]),(m[indices])))) {

You'll find that method in CobraWorkspace/Cobra.Core/Native.cs. Actually it's an overloaded method so that in some cases it can run faster. But in the case of type "Object" or "dynamic" it will end up invoking this override:
Code: Select all
// C# code in Native.cs

   static public new bool Equals(object a, object b) {
      // Cobra will generate the C# "a==b" when a and b are both primitive types (int, decimal,
      // etc.) But in the event that a and b are statically typed as "Object", that does not
      // mean that equality should stop making sense. Hence, below we cover the cases where
      // a.Equals(b) fails us (there are suprisingly many).

      // decimal is retarded
...

We pay some performance penalty for this, but semantics generally take precedence over performance. Also, in many cases, one of the more efficient overloads will be used, or even the basic C# "==" for primitive types. Consequently, there have not been any performance complaints to date.

Anyway, you can use -kif to do investigations. I sometimes combine it with things like -contracts:none and -include-nil-checks:no if I want to wade through less generated code (and if those things are unrelated to my investigation). You can get a list of things to turn off by looking at "cobra -help" and especially the -turbo option gives a concise list in its description.
Charles
 
Posts: 2515
Location: Los Angeles, CA

Re: MultiList

Postby jaegs » Sun Jul 29, 2012 4:21 pm

reshape/resize
I'm not so sure, I think np.reshape has around the same efficiency as np.resize. If the count of the new shape is the same as the old, than either method could be used and neither will copy the underlying data. In this case, the sole difference is that np.reshape returns a view and np.resize is in place.

There are even Numpy methods that sometimes return a view and sometimes are in place link1
link2

So I maintain that it's confusing. I've added 2 properties to MultiList that I think solve the problem.

ml.v simply returns a view. This is a syntactic placeholder for the future implementation of ml[:] The .v property lets any method that would normally have side effects instead be functional. Ex.
ml.v.reshape(shape).permute(order) # functional
ml.reshape(shape).permute(order) # side effects

Both cases return a MultiList, which allows for method chaining.

Second property is .t which returns a transposed view. Transposing is very common in Linear Algebra so it deserves a shortcut. Numpy has np.array.T and MATLAB uses a single quote (ex: X' is X transposed)

Changes to MultiList.cobra are up on github (link). Let me know what you think and if it's good, I will submit a patch.

equals
Thanks, I didn't realize == worked that way - that's definitely a good thing. However, how do you check for physical equality in Cobra? === like PHP and JS? I was wondering this for some of the test cases. You'll see that I used a sharp string.

I probably have time for an additional project before heading back to school. I think I would be better at a library or language topic than IDE integration, however. Any suggestions?
jaegs
 
Posts: 58

Re: MultiList

Postby Charles » Sun Jul 29, 2012 8:04 pm

-- I prefer .view and .transposed over .v and .t

-- In the past, I have used verbs in the present tense and past tense for distinguishing modifying in place vs. returning a copy that leaves the original unmodified:
stuff.sort  # if you 'sort' something you change it
for s in stuff.sorted # if you ask for 'sorted' you get a new object
trace s

While it feels natural to me, I don't know how well this approach clicks with others. Comments are welcome.

-- In Cobra, "is" and "is not" are the reference/pointer comparisons (and consequently run fast and will not invoke any methods on the operands) and "==" and "<>" are the content comparisons.

Or you could call "is" and "is not" the "identity comparisons"
and "==" and "<>" the "equality comparisons".

I should really crank out a How-To on comparisons.

-- I'm slammed with client work this week, so I won't be doing much Cobra for the next few days. I'll look at MultiList later in the week.
Charles
 
Posts: 2515
Location: Los Angeles, CA

PreviousNext

Return to Discussion

Who is online

Users browsing this forum: No registered users and 7 guests