Wiki

Ticket #134 (closed enhancement: fixed)

Opened 16 years ago

Last modified 16 years ago

Optimization: Make "key in dict.keys" fast

Reported by: Chuck Owned by: Chuck
Priority: minor Milestone:
Component: Cobra Compiler Version: 0.8.0
Keywords: optimization Cc:

Description

Writing:

if key in dict.keys, ...

is fairly natural, but about 15 X slower than:

if dict.containsKey(key), ...

According to this program:

use System.Diagnostics


class X

	def main is shared
		d = Dictionary<of String, bool>()
		for c in 'abcdefghijklmnopqrstuvwxyz'
			d[c.toString] = false
		
		key = 'k'
		
		count = 1_000_000

		sw = Stopwatch()
		sw.start
		for i in count
			b = d.containsKey(key)
			b = d.containsKey(key)
			b = d.containsKey(key)
			b = d.containsKey(key)
			b = d.containsKey(key)
		sw.stop
		t1 = sw.elapsedMilliseconds
		print 'containsKey:', t1

		sw.reset
		sw.start
		for i in count
			b = key in d.keys
			b = key in d.keys
			b = key in d.keys
			b = key in d.keys
			b = key in d.keys
		sw.stop
		t2 = sw.elapsedMilliseconds
		print 'key in d.keys:', t2

		print t2 / t1, ' X slower'
		ct_trace d.keys
		
		CobraCore.noOp(b)

That will vary since the culprit is the linear search.

Make the in-based expression just as fast by transforming it to the containsKey expression.

Change History

Changed 16 years ago by Chuck

  • summary changed from Optimation: Make "key in dict.keys" fast to Optimization: Make "key in dict.keys" fast

Changed 16 years ago by hopscc

OTOH

if key in dict ...

Is also fairly natural and performs at approx parity to dict.containsKey.
(strangely enough)


Changed 16 years ago by Chuck

Hmmm, I was not even expecting that to work. But it appears that we have a CobraImp?.In overload for dictionaries.

I can't say I like it as "for key in dict" does not produce keys, but instead, key-value pairs.

I might change "key in dict" to complain that it should be "key in dict.keys".

Changed 16 years ago by hopscc

That seems a little retrograde. - Tell them not to use the (natural? faster) one then mangle dict.keys to do something different from what it looks like its doing ( as a system library call)

Its more likely you're checking for key inclusion in a dict than for key-value pairs
and the dict.keys is an performance hit cos its system library implementation (separate from any natural expectation for a key inclusion check) gets a list of the keys then checks (linearly) against the list....

Almost as natural perhaps is an invented (cobra) method 'dict.dictKeys' which could do the best performance

lookup but as a new invented method it has little to reccommend itself over the library dict.containsKey

I think it would be more natural (and less impact) to fiddle "for key in dict" than
"if key in dict"

Changed 16 years ago by Chuck

Dictionary<of TKey, TValue> does not implement ICollection<of TKey>, it implements ICollection<KeyValuePair?<of TKey, TValue>> (and likewise for IEnumerable<of>).

Conceptually, dictionaries are not collections of keys.

Changed 16 years ago by hopscc

Oops missed this when posted.

Conceptually dictionaries are collections of keys and collections of values tied together so that accessing a key returns the value... the rest is an implementation
detail....

If you're accessing a dict how often do you want a "KeyValuePair?" - never
you either want the key, the value for a key, all keys, all values or
the key and its value (multiarg).
The keyvaluePair is an implementation/workaround for
languages/systems that dont support tuples/multiargs. Its an unnecessary indirection and exposure of underlying implementation cruft for its use here.

Changed 16 years ago by Chuck

  • owner set to Chuck
  • status changed from new to assigned

Look at  http://msdn.microsoft.com/en-us/library/xfhwa508.aspx and  http://msdn.microsoft.com/en-us/library/s4ys34ea.aspx

A dictionary is an ICollection<of KeyValuePair?<of TKey, TValue>>. It *is* a collection of pairs.

Whether the pairs were implemented through a KeyValuePair? or a bona fide tuple type are irrelevant. It would still be a collection of pairs, not keys or values or a collection of collections.

Changed 16 years ago by Chuck

  • status changed from assigned to closed
  • resolution set to fixed

Done in changeset:2146

Note: See TracTickets for help on using tickets.