Negative Indexes as end offsets

by **hopscc** » Mon Jun 09, 2008 3:47 am

Here's something to consider

I've been looking at the support for negative index values to indicate offset from the end of a sequence (similarly to slicing)

Unfortunately there's a performance cost for all accesses for an implementation like that for Cobra Slicing which would wrap some
index checking and adjustment around the current use of a direct map to the C# indexing-and-exception-if-out-of-range

On the basis of a reasonable but untested (leading) assumption
- That the most frequent use (of negative indexes) would be as a *literal* (to get the last or some short end offset from last values in list)
rather than than using a (negative) variable as an index for the same purpose
- I thought we maybe able to get some of the best of both.
That is; recognise and only special case (a single) negative literal index
e.g.

Code: Select all: a = [ 1,2,3,4] assert a[-1] == 4 assert a[-4] == 1

For that there's no additional cost for access using (existing working implementation) positive literals or variables
and you only incur any performance cost for the use of negative literals
(in return for removal of that bounds exception and provision of some convenient and simple to use and understand behaviour).
Its definitely not a 100% solution in that negative indices in an index variable would still not be recognised and handled and trigger
a Bounds exception.

A secondary assumption is that negative value indices ( end offset indexing) literal or variable are the minority of cases anyway.

I did some mods to Expr.cobra IndexExpr writeSharpDef()
so as to detect a single unary negative literal as an index on sequences and setup a call a C# runtime overloaded method (GetIndexed) that resolves Object
Type (for Strings arrays and ILists (and generics of same) ) and calls a sub method that offsets the (negative) arg from the Length/Count and
ensures index >=0 to give a final index..
Then index the sequence as per normal obj[i] syntax.
i.e

Code: Select all: a[-n] becomes GetIndexed(a, -n)

The slow down for this
(as calculated using an internal duration timer around 100_000 iterations of a method asserting a number of indexed accesses)
varies (widely) but is at worst about an order of magnitude.

Direct indexing Lists/Strings/arrays seems relatively fast in comparison to method invocation
Direct indexing into Lists/arrays ~ 0.3 microSec/access, Strings ~1 microSec/access.

The Performance overhead/ratio of going through a couple of methods and some
comparisons rather than a direct index (for unoptimised compiles) varies

Code: Select all: Worst Best Lists 1:11 1:2 Strings 1:2.5 1:1.7 Arrays 1:6 1:5.7

(Optimised compiles often give smaller ratios but also optimise down
to 0 durations (:-))

Interestingly enough the timings devolve down to similar values regardless of the type of sequence
~ 0.25 s/100_000 iterations or about 2.5 microSec/per access/index

BUT take the above with a lot of salt since there was a wide variance in the timings across runs or just from a recompilation and rerun
(Variances for Lists up to 5.5x, Others slightly more stable)

Having said max slowdown is order of magnitude though
that is from 0.3 millionths sec to 3 millionths sec
and 1 millionth sec to 10 millionths sec per access.....
for a few accesses I defy anyone to notice the difference.

It makes little difference if rather than calling an overloaded GetIndexed method to do indexing you still use direct indexing but
setup to wrap the index with a C# runtime function that does a similar calculation (to ofset the negative index and pull
any resulting index to be >=0 ) to the above....
e.g similarly to

Code: Select all: a[CobraImp.negIdxCalc(a, -n)]

There's about the same orders of slowdown (and variance).

If we (add some code complexity and) do negative offset calculations
from within cobra :
Preget sequence length and use the negative index applied against that

l = a.length
    ...
    v1 = a[l-1]
    v2 = a[l-2]

You get similar timings as per direct indexing
i.e same order of magnitude + a few percent (unsurprisingly)

If instead we pass the index through a Cobra method

def negIdx(len as int, idx as int) as int
        if idx <0
            idx = len + idx
        if idx < 0
        return 0
        if idx >= len
        return len-1
        return idx

v1 = a[.negIdx(a.length,-1)]
v2 = a[.negIdx(a.length,-2)]

You get similar timings as per direct indexing plus ~20-30%
- but (weirdly enough) this is still much faster than mapping onto
C# internal internal handling doing the same thing

i.e If need to care about negative indexing performance you can handle negative offsetting directly in Cobra or avoid negative/end offsets altogether

SO The bias/opinion/conclusion I bring and take from all this :
I would like Cobra to support some form of negative indexing for
a) Convenience - easy indexing from end (last item in sequence)
b) Consistancy - Cobra slicing and similarity to Python syntax
and to remove (some of) an unnecessary source of a bounds exception and cos its sorta neat.
.. I don't want to pay (performance wise) for that unless I choose to use it in my code (if there's a trade off)

I think its worthwhile supporting negative literal indexes as above if only on the basis of convenience.
As implemented there is (at worst) an order of magnitude slowdown
(only to negative indexes and only if used)
BUT I dont think this is particularly relevant given the frequency of use and absolute speed anyway

And there's an escape: If your app is such where negative indexing is used and the degradation is of importance you can resurrect
direct access performance by doing your own end ofset calculations directly (in cobra) or avoiding end/negative ofsets entirely
IF YOU THINK ITS NECESSARY.

Opinions?

CounterArguments?

I'd be really interested if anyone has any proof (from python?) that my assumptions have any (or no) validity

by **Charles** » Mon Jun 09, 2008 6:07 am

I considered this issue in the past which yielded the following thoughts and conclusions:

-- I don't want to add negative index support everywhere in Cobra due to the slow down that would result (as you yourself pointed out). Part of Cobra's charm is it's direct applicability to performance demanding applications like financial analysis, neural networks, simulation, gaming, etc. while still having most of Python's convenience.

-- The #1 use for negative indexes, as you pointed out, is to get the last element. That could be handled with: t.last

-- Of course, you might want to go back more than one, so we have: t.at(-2) or t.at(x)

-- This can be done for System.Collections.Generic.IList<of> and System.Collections.IList via extension methods. I think the generic IList also covers arrays although we could consider adding special compiler support to inline the code since people generally choose arrays over generic lists in performance sensitive situations.

-- While "t.at(-2)" is not quite as slick as Python's approach, it's not that bad and it leaves normal indexing performance unaffected. And I'm actually fond of "t.last" which I think reads quite well in the same vein as "t.count".

Cobra's extension methods don't currently support generics which is the only reason this isn't already in Cobra. The code would be something like this (untested):

extend IList<of T>

    def last as T
        return this[.count-1]  # let indexer report out of range problems

    def at(flexibleIndex as int) as T
        if flexibleIndex < 0, flexibleIndex += count
        return this[flexibleIndex]

    def at(flexibleIndex as int, default as T) as T
        count = .count
        if flexibleIndex < 0, flexibleIndex += count
        if flexibleIndex >= 0 and flexibleIndex < count
            return this[flexibleIndex]
        else
            return default

I don't care for the idea of supporting a negative index when it's a literal, but not a variable. I think it's better to provide .at which always supports it, and provide .last as a convenience.

Btw, Python gives an IndexError if your negative index is past the bounds. Only slicing silently disregards boundary violations.

Code: Select all: Python 2.5 (r25:51918, Sep 19 2006, 08:49:13) >>> t = [1, 2, 3] >>> t[-2] 2 >>> t[-20] IndexError: list index out of range

Feedback is welcome.

-Chuck

by **hopscc** » Tue Jun 10, 2008 1:38 am

Right - you'd mentioned this somewhere before..

I agree about the performance hence the half soln with negative literals in [].

The problem I have with an added method (at/last) is that this gives an additional (2) indexing syntaxes and assoc semantics.
There would now be 5 syntaxes for getting sub parts of a collection
each with their own caveats about what they swallow as arguments and generate for edge conditions
(perhaps some of that is my own perverted view on things - I view slicing as a superset of indexing ( or indexing as a subset of slicing)
slicing gives a subset collection usually multiple items, index gives a subset single item - This is probably mostly due the similarity of syntax/punctuation []
if so ignore slicing - with the added methods theres now 4 varied syntaxes and semantics for indexing)

The 5 syntaxes are ( if I've got them mostly right):
.item() inbuilt - can probably disregard since [] indexing is more common - are there any pros/cons to using this rather than the [] index syntax ?
I'd assumed it was probably faster but havent tested that
[] index - return single item - takes positive values only (start offset), negative and out of range cause exception
[::] slice - return collection of items ( usually multiple), takes positive values (front offset) or negative values (end offset), out of range values silently pulled to end of range
(really convenient (:-))
+
.at() - returns single item, takes negative (or positive?) value (end offset(or start offset)) ...TBD effect wrt out of range values
last - returns single item at end of collection.

Too many dark twisty little passages all nearly the same
Ideally I want 2 syntaxes (indexing and slicing) with the same semantics wrt values they take and the same (most convenient) behaviour for range excursion...
that allows me to use what few brain cells I have remaining on other things

by **hopscc** » Tue Jun 10, 2008 2:10 am

A thought struck me while writing the previous.
If you dont like the half solution overloading [] what about overloading it onto .item().

Leave [] indexing as is (fast and broken ( whoops sorry that slipped out(:-))
and overload .item() such that it takes negative values as end offsets and does boundary edge blocking..
That way theres no additional syntax and you can choose which behaviour you want
full performance or slower but more convenient.

If generics dont support extensions how would you support .at()/.last for generic collections ?

Another random synapse firing is that we provide an augmented [] syntax for slower, end offset, bounded indexing..

a=[1,2,3]
assert a[1]==1
assert a[3] == 3
exception assert[-1] == 3
# above is existing

assert a[[1]] == 1
assert a[[-1]] == 3
assert a[[-5] == 1
assert a[[4]]==3
# [[]] supports start offset and end offset ( and possibly boundary violation removal)  and anything else useful

by **Charles** » Tue Jun 10, 2008 5:10 am

-- While .item technically exists, we can almost entirely discount it as people just don't think about it much in the .NET world. It's more of a curiosity like examining the helper classes that implement generators or finding out that the VM just sees constructors as methods with the name .ctor.

-- I still agree that the situation is not ideal. But I accepted that Cobra would be shy of ideal if I put it on .NET and/or JVM (unless I were to ignore their standard libraries and ignore compatibility, just using them for gc and machine code gen).

-- You mentioned .at() was TBD wrt out of range values, but my example showed it would result in an exception just like indexing unless you provided a default value. Example: numbers.at(i, 0)

-- t.last is just a convenience for t[t.count-1]. No extra semantics.

-- Re: overloading .item(), I'm not sure what you mean. Overloading in Cobra (and C#) involves having different typed arguments or different number of arguments. The index argument would still be an int...

-- You asked about how we would add .at()/.last to generic collections. The lack of extension methods on generics in Cobra is a temporary situation. I'm saying once that's in place, .at() and .last will show up.

-- Re: a[[1]] I think that's ambiguous with having a list (such as [1]) being passed to an indexer. Technically, you can do this although I doubt it happens in practice. Unless you have some crazy collection that is indexed by "vectors": matrix[[i, j]].

-- Indexing (t[i]) should not return the closest element when i is out of range. Throwing an exception is the correct behavior. Otherwise, consider that when you index outside the bounds of the list (say asking for item 10 in a 9 element list), your application will just plow on, probably giving you the wrong results. You'll have to scratch your head to figure out where things went wrong. Also, the exception behavior matches dictionaries.

I do hear what you're saying, but I think with all things considered, the approach I outlined is currently the best we've got. You can, of course, always roll your own subclass of List<of> if your applications would benefit from different indexing behavior.

-Chuck

by **hopscc** » Tue Jun 10, 2008 8:15 am

re .item()
if it exists but its not used much I was thinking that makes it a nice candidate for overloading with additional (end offset) behaviour. Then it becomes useful (and perhaps more used).

by overloading I meant augmenting in the same manner as writing an augmented method in a subclass ( implementing a decorator - added behaviour behind the same name otherwise same
signature ..) If .NET extensions dont allow that I guess its not possible

Part of the attraction of [] index is that it applies equally to a myriad of collection Types - Arrays, Strings, Lists and implementations of IList - using the same syntax without having to
individually overload each .item (in this case ) method implementation..

re at/last
OK so at() throws like [] missed that..

for generic collections; future will provide one day - wonderful. I'm looking fwd to the JVM based implementation (:-)

I'm not sure why you say basing implementation on .NET/JVM is the rationale for non ideal Cobra syntax additions/augmentations.. the rest of Cobra seems a testament to how incorrect
that is in that its successfully making a clean coherent language without dragging up/exposing much of the underlying worlds baggage

t.last - Ok so same semantics - still more syntax convenience or no but I think I've already belaboured that point

re indexing over bounds throwing an exception:
yep thats the conventional wisdom - stepping outside the limits gets you barfed on. However like slicing what are you trying to do if you access off the edges?? probably access the edge
( for a single lookup anyway) - theres other clearer ways of walking the entire sequence and stopping at the edges...
Whatever, Trying to supply an 'I know you made a mistake but trying to do what I think you mean' result is probably too much of a step in this context anyway.

re augmented []
disregard the exact syntax - the [[]] was an example of what I meant to try and indicate .. that it was an extended indexing operation providing augmented semantics (but still starts off using [] as indexing punctuation)
Of course you can disambiguate that case simply by making a[[i]] - augmented indexing
be distinct from a[ [i] ] - indexing with a list of [i]
I'm goanna have another think around that and see if I can come up with something thats indicative of an augmented indexing operation but doesnt collide with any other syntax

finally - yes you can always do whatever you like to augmented subclasses ( List<of> ) but that doesnt help you any with the existing syntax and behaviour of the existing base/core classes
(array, string, ILists....)

Thanks for the clarifying comments/replies/explanations .. Interesting as always
I'm goanna be out of contact for a week or so - any followon wont get replies till then
cheers
hops

by **hopscc** » Fri Jul 04, 2008 5:42 am

I've had another think and some fiddling with this with the following results:

Introduce a new construct allowing use of positive and negative indices without performance penalty for normal [] indexing.
I think the double Bracket [[ ]] is the best syntax for this extended indexing ...
There are other possibilities [| |], [( )], [: :], which would work equally well but [[ ]] seems to indicate extended [] better for me.

You can specify a literal List as an index as in funkyColl[[1,2]] but this is (relatively) uncommon and can be easily disambiguated if necessary either with explicit insertion of
whitespace or wrapping the literal list in () - funkyColl[ [1,2] ] or funkyColl[([1,2])]

The downside of introducing a new token [[ for this is
- complexity of token scanning vs single and double [ and ]
- collision with literal arrays of lists @[[1,2], [3,5]] and [[1,2],[3,4]]
but it turns out these can be handled without too much difficulty/complexity

I have an implementation that compiles the compiler unchanged and passes all the tests including a large test for itself so I'm opening an enhancement ticket for it.

by **Charles** » Sun Jul 06, 2008 9:56 am

Regardless of whether t[[i]] can be made to work, I really dislike the syntax...

by **hopscc** » Sun Jul 06, 2008 7:56 pm

Me too - I'd prefer using [] (:-) - without a performance cost (:-)

I'm not wedded to the syntax is so long as the capability is available but (I think) it should be notationally similar to indexing (since thats what it is - [] just extended)-
We've precluded using [] for entirely valid reasons so this is the next (though less satisfactory) solution I came up with
I looked at other variants of using [ and combinations of something but the dbl [[]] seemed most
intelligible/approachable/transferable/memorable ( to me anyway) for extended-indexing-with-a-performance-cost.

( None of this precludes adding at/last() or variants its just that I think this is easier to understand,use and remember - uglier than [] but easier to
get to than a new method )
Anyway - I've already given my preferences and arguments such as they are, theres now a patch for working code and tests
do with them what you will.

by **hopscc** » Tue Jul 29, 2008 6:53 am

Before I forget I found a way to use negative indexes for an end offset using the existing constructs
It has about the same performance impact (slowdown) as my implementation ( and slicing) vs std [] indexing

Its so disgustingly ugly its almost beautiful (:-)

Code: Select all: seq = [1,2,3,4] lastIItem = seq[-1:][0] # to get equiv of seq[[-1]] assert lastItem == 4 assert lastItem == seq[seq.count-1]

Negative Indexes as end offsets

Negative Indexes as end offsets

Re: Negative Indexes as end offsets

Re: Negative Indexes as end offsets

Re: Negative Indexes as end offsets

Re: Negative Indexes as end offsets

Re: Negative Indexes as end offsets

Re: Negative Indexes as end offsets

Re: Negative Indexes as end offsets

Re: Negative Indexes as end offsets

Re: Negative Indexes as end offsets

Who is online