Indexing and searching source code with Lucene.Net
During the past week I've been working on a university project for my course of Information Retrieval, and although the project proposal from the teacher was to implement an xml parsing application written in Java I thought I would put in use the skills I developed on my own with the .NET framework to implement something more useful.
My idea was to create a homemade source code indexing and search service, so I started fiddling with Lucene.Net, CastleProject, C# Parser and a couple other open source projects to see what I could come up with. There are already a lot of services which allows to search source code online, see Krugle, Google Code Search and Koders among others.
Well, of course I couldn't use one of them as my course project, so I started implementing my own. I called it CS2 - C Sharp Code Search, and its source code is available under the MIT license on its Google Project Hosting website. I think it's a good example of the usage of Lucene.Net and CastleProject's IoC container in a wanna be real life project.
At the moment only the indexing part is implemented and you can see it working launching the console application project contained in the solution. The index created is compatible with the Lucene family implementations, so it can be browsed using an application like Luke, until I implement the searching part.
At the moment the features it implements are indexing C# source code files by parsing them and retrieving information like class, method and property names so that they can be searched against, as well as full-text search. It is extensible by implementing parsers for other languages, I've built it to make it pretty straightforward. It remembers the files indexed and periodically checks for modifications or file deletions. The console project comes with a full logging mechanism which shows what the program is actually doing. It is highly configurable via configuration files, see App.config and the files in the Configuration directory, used mostly for Castle Windsor configuration.
Let me know what you think! I'll say more about it in the next weeks... and please, don't hand it to your Information Retrieval teachers until I've delivered it to mine ;)