Page 1 of 1

Cobra compiler speed degradation (?)

PostPosted: Wed Oct 07, 2009 3:24 am
by hopscc
I was thinking about the cobra compiler the other day (not that its pertinent but I was digging a posthole at the time - one of the more pointless activities known to humankind)
and idly started wondering if the added features over the last months had affected the speed of compilation much.
I tend to watch the timeit value for the compiler to compile itself and dont think its changed from pre 0.8 release but if it drifted up by small increments on changes how much would I notice over time
and I thought as other changes get made in the future, how would you tell if things slowed down ? ( unless it happened in one big noticeable lump)

Well maybe we could get the compiler to tell us.

What if to the -timeit count we added a lines-compiled/sec calculation ( #lines in all compiled files/timeit time corrected perhaps for running the BE compiler) .
I'd expect for compiling the same files ( the compiler itself) the value would perhaps drift over a small range but could give you an idea of any significant degradation once a baseline was established

Does anyone else think this would perhaps be useful to have as a small performance sanity check on changes/additions ??

The thought is the deed so I implemented support for this on my system but have only been using it intermittently on my development compiler (compiling tests and little programs) not the snapshot.
The values are all over the place ( from 10s lines/sec to 1000s) - it looks like small single files compile much slower than many big files which indicates to me at least that the compiler is (for the cases I tried ) bound by its startup overhead.... but maybe its just wobble

Is it worthwhile to continue down this path , Should I post the changes as a patch ?

Re: Cobra compiler speed degradation (?)

PostPosted: Wed Oct 07, 2009 7:18 am
by Charles
The self compilation time will creep up due to the code base growing. So then it becomes even more interesting to get stats on compilation speed that take source size into consideration. But I've always thought that a count of AST nodes would be more accurate than pure lines:
# not all lines are created equal
x = 1
# vs.
if foo.bar(a.x, b.y) < foo.bar(0, 1), print foo

In terms of performance, I recently did a few optimizations, using the ANTS profiler, but more speed is always welcome. Here are some ideas:

-- The tokenizer/lexer is still the slowest component of the compiler. In addition to optimizing it directly, it might be useful to run it in threads. Anyone with a hyperthreading and/or multicore computer would benefit.

-- The expression "s in ['foo', 'bar']" is a slow expression because it creates a new list every time even though the immutable nature of strings and temporal nature of that list means that it would not have to. This comes up in the parser quite a bit and shows up high on the profile. So InExpr could generate faster code in some cases. Basically if the right hand side is a list or set literal containing primitive literals (there may be other cases, but this is the one that comes up often).

-- The compiler phase "bind implementation" is one of the expensive phases and it is probably amenable to threads as well due to the private nature of method implementations.

-- Locating extension methods is still a source of slow down (although not as bad as before). Smarter caching could be done here, I think.

-- The compiler does a regular assembly load instead of a reflection-only load which would not only be faster, but more appropriate. However, I had problems with reflection-only assembly loading on old versions of Mono (1.x days) which is why I did the regular load.

-- If we can get the Mono C# compiler working on .NET then on .NET we can drop the extra disk I/O and process invocation. We have that now on Mono. AND we can explore the idea of generating their nodes directly which would skip their lexer and parser.

Getting back to AST node count, we could have a "def nodeCount as int" method on nodes and override to count subnodes. Or get fancy and use reflection on properties (although we may need to mark properties to indicade subnodes). See .hasError and SubnodesAttribute for ideas on these two approaches.

-Chuck

Re: Cobra compiler speed degradation (?)

PostPosted: Sat Oct 24, 2009 4:13 am
by hopscc
Heres a patch file that provides a display of the count of lines, nodes and tokens compiled (when -timeit is specified) and a calculation of n{lines,nodes,tokens)/sec compilation speed.

I implemented the nodes count as an additional phase using a visitor rather than augmenting the node items directly (separation of functions)
- this necessitated providing some accessor properties for some of the Syntax nodes.
Dunno if the node walking calculation is 100% correct but it is perhaps somewhat representative (and if nothing else provides an example of accessing the AST nodes down to expressions).

On my system the results when compiling the compiler look like this;
Code: Select all
 ../Source > ./mkcobcsh
Compilation succeeded
timeit = 00:00:30.8697907
38854 lines compiled = 1258.67 lines/sec
113110 nodes compiled = 3664.19 nodes/sec
206523 tokens compiled = 6690.30 tokens/sec
../Source >

Re: Cobra compiler speed degradation (?)

PostPosted: Sun Oct 25, 2009 12:38 am
by Charles
Applied but 2 problems:

-- The time includes the *run-time* of the program which skews and invalidates the "per second" stats.

-- No test case. Don't need a lot of verification, but certainly want to do a basic test.

I don't have time for these right now so I added ticket:182.