Forums

MultiList

General discussion about Cobra. Releases and general news will also be posted here.
Feel free to ask questions or just say "Hello".

MultiList

Postby Charles » Mon Jul 02, 2012 3:23 pm

I took the collection class Matrix in the Supplements directory and placed it in the Cobra.Lang standard library with the new name MultiList. The name "Matrix" was a poor choice because it implied a mathematics orientation when in fact this is a collection class like List, Dictionary and Set.

MultiList provides an n-dimensional List-like collection. It's similar to, but more basic than, C++'s Boost.MultiArray which serves the same purpose.

You can view the code online. It's a start and covers the basics. If anyone wants to contribute a .resize operation, slicing, etc. feel free.
Charles
 
Posts: 2515
Location: Los Angeles, CA

Re: MultiList

Postby torial » Wed Jul 04, 2012 8:39 pm

would there be any drawbacks to making the Cobra list support this multidimensional aspect? If they are minimal, would love to have this promoted, so I can take advantage of the nice syntax for lists w/ this.
torial
 
Posts: 229
Location: IA

Re: MultiList

Postby Charles » Fri Jul 06, 2012 9:09 am

Do you mean make the List class multi-dim? Or do you mean that nested list literals should be interpreted as MultiList under certain circumstances?

If you mean enhancing the List class, I think it does what it does well, as it is now. Plus it's provided by the virtual machine's standard library and not under our control (other than adding some extension methods).

If you mean literals like:
t = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
]

...these are currently nested lists as I'm sure you know. I think there are some problems with trying to make these MultiLists. There is the question of what to do with this:
t = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9, 10],
]

...where the rows are not the same size.

Also, MultiList is not that mature right now. It supports multiple dims, indexing by integers and looping through all elements, but it does not support looping through sub-multilists. Here is what I mean:
t = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
]
# t is a List<of List<of int>>
for row in t # row is inferred as List<of int>
print row
for x in row
print x

The above for-loops wouldn't work if t were a MultiList<of int> because MultiList doesn't have an enumerator that returns sub-multilists. I say "sub-multilists" because with the higher dims, you have to switch from the concept of "row". See Boost.MultiArray, for example.

On a related note, we've played with the idea of allowing a type specified for collection literals like:
# we're start the list with a Square,
# but want the list type to be more general:
t = [Square()] to List<of Shape>

We could then entertain this:
t = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
] to MultiList

The absence of the inner type would mean that the compiler should infer it.

Btw your future comments and inquiries will be more clear if you showed some example code of what you meant. :)
Charles
 
Posts: 2515
Location: Los Angeles, CA

Re: MultiList

Postby torial » Fri Jul 06, 2012 9:56 am

Hi Charles,

Sorry for the ambigious comments. I had a killer few weeks and my brain was definitely addled. Your comments revealed a few things I didn't realize were available (or hadn't thought through), like the nested list literals and the returning all items sequentially.

I'm swamped right now, but I if I can find the time to get a release of Naja out this summer, I'd be happy to take on a task or two on extending the MultiList.

In general, good comments on the disavantages of what I was suggesting, and I'll try to include some examples in the future. I might as well have filed a bug report saying: "It doesn't work" :-O.

Perhaps for a MultiList literal (when it gets solidified enough), a syntax like
x = #[[1,2,3], [4,5,6],[7,8,9]]

would be ok -- I like the # symbol for indicating a visual n-dimensional data, but the obvious drawback would be the contextual parsing to identify it isn't a comment :/ Perhaps a literal format like +[] might be ok.
torial
 
Posts: 229
Location: IA

Re: MultiList

Postby Charles » Fri Jul 06, 2012 3:39 pm

The # sign a clever idea. Unfortunately, it would be incompatible with the many syntax highlighting configs we have. Most editors don't allow for saying that '#' is for single comments, but for certain exceptions. And that raises the question of whether it would be good for humans as well.

Thanks for the follow up.
Charles
 
Posts: 2515
Location: Los Angeles, CA

Re: MultiList

Postby jaegs » Wed Jul 11, 2012 9:00 pm

I think the x values for each dimension could be memoized
for j in i+1 : len, x *= _shape[j]

to optimize as user making a lot of value lookups
Code: Select all
sum = 0
for i in ...
  for j in ...
    for k in ...
      for l in ...
         sum = ml[i,j,k,l]
jaegs
 
Posts: 58

Re: MultiList

Postby Charles » Wed Jul 11, 2012 9:38 pm

In the simple case of summing, you can just write one for loop that goes through .items. Although that doesn't eliminate your possible point.

Does Boost.MultiList cache the indexes? If we cache them, do benchmarks show a pay off?
Charles
 
Posts: 2515
Location: Los Angeles, CA

Re: MultiList

Postby jaegs » Thu Jul 12, 2012 12:30 pm

I have poor reading comprehension when it comes to C++ but it looks like Boost.MultiArray caches the indexes. The technical term to use is "strides." A stride is how far one must move in memory to get to the next element in an array along a certain dimension. In addition to accessing elements in a Boost.MultiArray directly, there are also "Views"
Code: Select all
  array_type::array_view<3>::type myview =
    myarray[ boost::indices[range(0,2)][range(1,3)][range(0,4,2)] ];

which allows you to specify a range for each dimension (aka slicing) and doesn't copy the underlying data. I'm assuming accessing a MultiArray directly uses a view too but I'm not sure.

The stride array is a data member of a View. See View.hpp
Code: Select all
  const index* strides() const {
    return stride_list_.data();
  }


On a related note, we've played with the idea of allowing a type specified for collection literals like:
# we're start the list with a Square,
# but want the list type to be more general:
t = [Square()] to List<of Shape>



I'm in favor of the "to" syntax to specify the exact type of a literal. There are tons of dictionary implementations and using the same literal syntax for all of them is desirable. I'm not clear as to how the "to" keyword differs from "as."

If I get some time next week I'll look into extending the MultiArray class. Methods I'm considering implementing are .reshape, .compareTo, .size (or .numElements). strides and shape will be get properties. Also a constructor that also takes in "data as T*".

One choice is to have an accompanying MultiArrayView class so that one can have multiple views of the same multi array like in Boost. A slice will return a MultiArrayView. Alternatively, there could be no MultiArrayView class, and then MultiArray would expose just a single view of itself, like in numpy.

There's lots of other less important methods on the numpy site if you're interested.

Hopefully I understand all of this correctly.

Also, is there a particular reason for not implementing getHashCode
def getHashCode as int is override
throw InvalidOperationException()

or is it a TODO?

BTW, what is currently the best editor to use on a Mac?
jaegs
 
Posts: 58

Re: MultiList

Postby jaegs » Thu Jul 12, 2012 1:42 pm

Interesting...
>>> a = np.array([[1,2],[3,4]])
>>> b = a[1]
>>> b
array([3, 4])
>>> a[1][1] = 5
>>> b
array([3, 5])
>>> b[1][1] = 6
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'numpy.int64' object does not support item assignment

jaegs
 
Posts: 58

Re: MultiList

Postby hopscc » Sat Jul 14, 2012 3:23 am

I'm not clear as to how the "to" keyword differs from "as."


'to' is the keyword for a cast to an accessible Type
'as' is the keyword for a Type Declaration
The thing following the keyword is always a TypeName in both cases

# in both the following a is typed as a Number

var a as Number = 999
# default inferred type from assignment from the init-expr is as an int
# declaring it as a wider but still assignment compatible Type allows the variable to be typed
# explicitly ( or typed without assignment)


var a = 999 to Number
# uses type inference to make 'a' a Number since the int 999 is cast to a Number and the type of
# the initialising expression is then Number



In most Languages 'views' onto an underlying structure ( or parts thereof) are usually readonly (for performance)
hopscc
 
Posts: 632
Location: New Plymouth, Taranaki, New Zealand

Next

Return to Discussion

Who is online

Users browsing this forum: No registered users and 9 guests

cron