So, I'll discuss briefly how the addin makes itself available to MonoDevelop; how MonoDevelop calls it; then a little bit about compilers, parsing, abstract syntax trees; and then get into the actual code-completion portion and its current state. I ask that even if you already know all about compilers that you don't skip that section so you can keep me honest. Compilers are not my area of expertise.
=== PART 1===
A lot of MonoDevelop's developer documentation is out of date, especially with the transition from version 2.8 to 3.0 as a few interfaces changed. But, the article on how you create an addin is still accurate: http://monodevelop.com/Developers/Artic ... ple_Add-in
You don't need to read that, at least not in detail, but do glance at it. See those XML snippets? Those are parts of an XML file called a manifest. The manifest describes the addin to MonoDevelop and tells it which parts of MonoDevelop it will extend. The Cobra addin also has a manifest. Open this up in a separate tab or window: https://github.com/ramon-rocha/MonoDeve ... .addin.xml See all those <Extension path= lines? These are the various "hooks" that are available to a MonoDevelop addin. There are more available than what we have so far in the Cobra addin but I'll just point out a few important ones. The addin classifies itself in the "Language bindings" category (line 11). This means that the addin will implement the the language binding interface MonoDevelop.Projects.IDotNetLanguageBinding which inherits from the MonoDevelop.Projects.ILanguageBinding interface. You can see the structure of these interfaces here:
https://github.com/mono/monodevelop/blo ... Binding.cs
https://github.com/mono/monodevelop/blo ... Binding.cs
Take a look at line 55 of the Cobra addin manifest. This defines which class in the addin will implement those interfaces, MonoDevelop.CobraBinding.CobraLanguageBinding, the source code for which is here: https://github.com/ramon-rocha/MonoDeve ... ding.cobra
The 'compile' method starting from line 73 is the most interesting thing here as this is what is called when you execute the 'Compile' command from within MonoDevelop. It's well commented so should be easy to see what's going on. It gets the project references, determines what kind assembly to create (an exe or dll), which source files should be included, determines which compiler configuration options have been specified and then launches the cobra compiler in a separate process. When that finishes, it parses the output and adds any errors/warnings to the BuildResult and returns that. Makes sense?
So, at this point, you might be going back to the addin manifest to see what other extension points are defined. There are a few and you should be able to dig around now and figure out the names of the classes and see which interfaces they implement. In some cases, like for project templates, file templates, and syntax highlighting, there are no implementing classes but yet more XML files. If anyone has any questions on this part, feel free to ask.
Now, the next extension point I want to talk about is on line 73 of the addin manifest: MonoDevelop.CobraBinding.TypeSystem.Parser. But before we get to that, what is a parser anyways and what's the purpose of this class? Well, when you use a compiler to turn source code into something that is actually a runnable program, there's a lot of different things that happen.
First the compiler reads in all the source files you have specified and breaks them down into the discrete units of the language. Each unit represents something like a keyword, a type name, a variable name, a number, etc. and is usually represented by one or more non-whitespace characters. Each of these units is referred to as a "token". In Cobra, like Python, leading whitespace is significant as are 'newline' characters so these are considered tokens as well. So, once you have a list of all the tokens in a file, you can start analyzing these tokens to see if you have any syntax errors. If there are no errors, you can create the data structures in memory that represent the elements in the given source code file(s). This entire process is called "parsing". After parsing, you end up with a tree-like data structure that describes the source code. It's called, wait for it, a parse tree! It'll be easier to understand it with an example.
class Sample
def main
a = 1
print a + 2
For this code, the token list looks something like this:
class (Keyword)
Sample (Identifier)
\n (End-Of-Line)
\t (Indent)
def (Keyword)
main (Identifier)
\n (End-Of-Line)
\t (Indent)
a (Identifier)
= (Assignment-Operator)
1 (Integer)
\n (End-Of-Line)
print (Keyword)
a (Identifier)
+ (Plus-Operator)
2 (Integer)
\n (End-Of-Line)
(Dedent)
(Dedent)
And the parse tree looks something like this:
- Code: Select all
[Class Declaration (Sample)]
|
[Methods]
|
[Method (main)]
|
[Statements]
/ \
[Assign Statement] [Print statement]
/ \ |
[Identifier (a)] [Integer (1)] [Binary Math Expression]
/ | \
[Identifier (a)][ Operator (+)] [Integer (2)]
The parse tree above does not reflect what Cobra really generates, but I hope the idea is clear. After successful parsing, you will end up with a bunch of Nodes that are connected in a tree structure. This also can explain why sometimes when you try and compile some code, an error message comes back stating something about "line 1, column 7: Expecting end-of-line but got 'Foo' identifier" or "line 2, column 2: Unexpected indent". This is because the parser was going through the list of tokens and encountered something that it was not expecting! Like if it just consumed a 'class' keyword token, it expects the next token to be the name of that class (an identifier).
Now, a parse tree is not enough information to determine if the source code is a valid program. It's enough to tell you if you have any syntax errors (unexpected tokens) but it doesn't have any information about the types for the various identifiers in the tree. Why? Well, remember that these trees are being generated one file at a time, or rather, one token stream at a time. If you refer to a class that is defined in another file, or perhaps to a class that is defined in a referenced assembly, you don't know whether or not that class is valid until you have the information for ALL the files that will be compiled plus all the references. Once you have all this information, the next phase in the compilation can start: Semantic analysis.
Semantics are all about how words and symbols are used. For the purposes of a compiler, it's about analyzing the parse tree and determining which types all the various identifiers are, determining if any referenced types are not declared anywhere in the tree, generate errors for mismatch types such as 'cannot assign a Foo to a Bar', possibly generate warnings such as 'variable 'b' in method 'foo' is never used'. You populate this information into the parse tree and possibly remove or add nodes or move them around. The end result is an 'abstract syntax tree'. It's called this because it is an abstract representation of the original source code. Now, finally, after all this work, if there are no errors, the compiler can now translate this tree into an intermediate language, which can optionally then be further optimized, and then ultimately generate a binary file for your program.
A lot happens in that last sentence and we didn't talk about pre-processing or other things a compiler can do, but for the purposes of this post, we don't care about that stuff. Especially since eventually, Cobra uses C# as its intermediate code (or Java) and then hands that off to yet another compiler and the whole things starts all over again And nevermind the fact that the 'final' binary is actually yet another intermediate language to be compiled by .NET/Mono
You haven't fallen asleep, have you? Okay, good. So, let's get back to the addin. The class 'MonoDevelop.CobraBinding.TypeSystem.Parser' invokes 'Cobra.Compiler.CobraParser.parseSource' to generate the parse tree. See line 62 here: https://github.com/ramon-rocha/MonoDeve ... rser.cobra
Why is this useful? Well, now that we have this tree (or rather if we don't get this tree), we will know if there any syntax errors such as unexpected tokens in the source code and we can have MonoDevelop underline them with red-squiggly lines (line 70). But what if we didn't get any errors? Well, then we got back something called a CobraModule which contains our actual parse tree. Hooray! Now we can use this information for code-completion, right? Well, not exactly. Remember, our parse tree doesn't yet have information about which types the various variables are yet, so if we typed "foo" and then a "." we wouldn't know which methods or member variables to list for completion proposals. However, if we typed just "." or "_" and then wanted some autocompleted members, this is actually possible at this point as we have all the required information (except for inherited members as pointed out by Charles below).
===PART 2===
Let's jump over to the completion extension now to see how we can do this. The code for it is over here: https://github.com/ramon-rocha/MonoDeve ... sion.cobra
It's a little messy so let me explain it. This extension inherits from the MonoDevelop.Ide.Gui.Content.CompletionTextEditorExtension class which in turn inherits from the TextEditorExtension class in the same namespace. You can check these out here:
https://github.com/mono/monodevelop/blo ... tension.cs
https://github.com/mono/monodevelop/blo ... tension.cs
An instance of CobraCompletionTextEditorExtension will be created for each Cobra file in the project. The inherited base class provides a few methods and properties that are useful to us including:
.document - This is an instance of MonoDevelop.Ide.Gui.Document. It acts as the bridge between a project file, the containing project (if any), the text editor and any of its extensions, and the output of the parsing service (i.e. the trees we were discussing earlier). From .document we can use .document.editor to get the current instance of Mono.TextEditor.TextEditorData which gives us some methods for setting or retrieving information such as the text at a certain line, the text in a certain region, where the cursor currently is, and it also has its own .document which is an instance of Mono.TextEditor.Document.TextDocument. Most of the methods in .editor really just pass through to the TextDocument instance it contains.
You can check these out here: https://github.com/mono/monodevelop/blo ... torData.cs
https://github.com/mono/monodevelop/blo ... ocument.cs
.editor - This is just a pass through property that returns .document.editor
.keyPress - This method determines whether the extension should handle the key that was just pressed. The base completion class we inherit from seems to handle things for us so we don't really need to do anything with it as far as I can tell. You'll notice in the cobra addin that I'm just calling the base method. I had some trace statements in there so I could see what it was doing.
.handleCodeCompletion - This method is an override for the method in the base class which really doesn't do anything. It is called when the base class determines that an autocompletion request has been triggerred. It's up to us to return the list of possible completions for the current context in our overridden method. We are provided with an instance of CodeCompletionContext which tells us the line and column number that triggered the completion, it also gives us the character that was last pressed and the length of the word that triggerred the completion. The triggerWordLength is an inout parameter and it's still not quite clear to me how it works. What I've been able to determine is that if the completion character was not a space (e.g. a letter) then you need to add 1 to this value otherwise the letter that was just typed will not be considered part of the word to complete.
Okay, let's back up because it's about to get interesting (finally, right?!). We have created a class called CobraCompletionTextEditorExtension which provides us with a MonoDevelop.Ide.Gui.Document via .document which contains the results of the type system service's call to our parser. You can get to that information from the completion extension via .document.parsedDocument.getAst<of CobraModule>. Our completion extension class also overrides a method in the base class called 'handleCodeCompletion' which gives us line and columns number and the character that triggerred it. So, if 'handleCodeCompletion' is called and the triggering character was a dot or underscore, we can now use the line number from the provided CodeCompletionContext and feed that into our .document.editor methods to get the region the cursor is currently in, and then lookup the members for the class corresponding to that region in the syntax tree of our .document.parsedDocument and then add them to a CompletionDataList and then FINALLY return that from 'handleCodeCompletion' and magically, a list of matching words shows up in MonoDevelop! Wow! But wait a minute, I left out one detail and now we are reaching the limits of my work so far...
Even with a line number and a tree describing all our classes and their members, how do we know which line number corresponds to which class's region? Well, we have to define those regions ourselves. Only then can we make use of the methods in .document.editor to determine which region, or regions, correspond to a given line number. There are many ways to accomplish this and it's not necessarily obvious which way is best. I'll talk about what the current code does and what I've done in the past in experiments.
Look at line 304 of https://github.com/ramon-rocha/MonoDeve ... rser.cobra . It's the body of a method called _addFolds in our parser. However, you feel about the usefulness of folding code, it is useful in one aspect in that you have to define the regions that will fold/unfold and these will typically correspond to the regions for classes and methods. There's room for improvement in this method but it seems to handle most cases correctly including block comments and doc strings which you also need to know the regions for so you don't provide completion results when inside these. The function builds up a list of instances of the MonoDevelop.Ide.TypeSystem.FoldingRegion class each of which contains an instance of the ICSharpCode.NRefactory.TypeSystem.DomRegion class. We'll talk about the NRefactory library later but for now just know that this includes properties for the starting line and column, and the ending line and column for a region. You can see the code for the DomRegion class here: https://github.com/icsharpcode/NRefacto ... mRegion.cs
The list of folds is added to the ParsedDocument along with the AST so it's also available from our completion extension via .document.parsedDocument.foldings. Hopefully, you are getting a picture in your head of one way this could all work. You have the current line number, you find which regions contain that line by iterating through the folds, you determine which AST nodes correspond to those containing regions (maybe a method, a class, and a namespace) and then you return the releveant completion entries.
There's a big problem with this approach. You need to have some way to resolve a region to it's containing node. You could take the starting line for the region, examine the text at that line via .document.editor.getLineText(lineNumber).trimStart and check to see if it starts with "namespace ", "class " or "def ". Yuck! I know, right? This is what the current code was starting to do and it's gross and I hate it.
Another way to generate regions is using the results of the parser. In the addin's Parser, you have a tree of nodes after successful parsing, some of which are declarative and each of which has a token associated with it. That token has line information. Traverse the syntax tree. When you hit a node for a namespace, class, method, etc. you have a starting line so you generate a region for it by reading through the source code and looking for the ending line by checking the indent level. I know this works because I tried it a while ago: https://github.com/ramon-rocha/MonoDeve ... mParser.cs
You now need to decide where you want to stick this Node-to-region relationship such that the completion extension class has access to it because by default it only has access to the AST and Foldings. This can be done either by extending the DefaultParsedDocument class to include a dictionary mapping regions to Nodes, extending the Cobra's SyntaxNode class to include a DomRegion, sticking it in some shared dictionaries somewhere (tried that btw it's hard to test), or probably something way simpler I haven't thought of. I'm thinking now it actually makes sense to create regions both via folds and visiting the AST so it will be easy to equate the two regions and tie them together that way somehow.
Anyways, I got a bit of analysis paralysis at this point so this is where the completion portion of the addin is right now. It is my 4th attempt in case you are keeping track. A few more important pieces which I do not have a strong understanding of remain. This includes making use of the NRefactory library for resolving types and/or running the binding phases of the Cobra Compiler to resolve the types. Remember, we've been talking about just "." and "_" completion so far.
===PART 3===
TODO: Talk about semantic analysis with NRefactory
edit: grammar