Indexing and searching source code with Lucene.Net

During the past week I've been working on a university project for my course of Information Retrieval, and although the project proposal from the teacher was to implement an xml parsing application written in Java I thought I would put in use the skills I developed on my own with the .NET framework to implement something more useful.

My idea was to create a homemade source code indexing and search service, so I started fiddling with Lucene.Net, CastleProject, C# Parser and a couple other open source projects to see what I could come up with. There are already a lot of services which allows to search source code online, see Krugle, Google Code Search and Koders among others.

Well, of course I couldn't use one of them as my course project, so I started implementing my own. I called it CS2 - C Sharp Code Search, and its source code is available under the MIT license on its Google Project Hosting website. I think it's a good example of the usage of Lucene.Net and CastleProject's IoC container in a wanna be real life project.

At the moment only the indexing part is implemented and you can see it working launching the console application project contained in the solution. The index created is compatible with the Lucene family implementations, so it can be browsed using an application like Luke, until I implement the searching part.

At the moment the features it implements are indexing C# source code files by parsing them and retrieving information like class, method and property names so that they can be searched against, as well as full-text search. It is extensible by implementing parsers for other languages, I've built it to make it pretty straightforward. It remembers the files indexed and periodically checks for modifications or file deletions. The console project comes with a full logging mechanism which shows what the program is actually doing. It is highly configurable via configuration files, see App.config and the files in the Configuration directory, used mostly for Castle Windsor configuration.

Let me know what you think! I'll say more about it in the next weeks... and please, don't hand it to your Information Retrieval teachers until I've delivered it to mine ;)

kick it on

Published 29 June 2007 06:09 PM by simoneb
Filed under:


# said on 29 June, 2007 09:51 AM

You've been kicked (a good thing) - Trackback from

# Jason Haley said on 05 July, 2007 10:31 AM
# Phil Haack said on 05 July, 2007 02:10 PM
Hi Simone! We also have a personal Pro Edition (currently in Beta) that will search your own code on the File System, SVN, and CVS.
# Phil Haack said on 05 July, 2007 02:10 PM
Whoops, I forgot to leave the URL to the Pro Edition beta signup:
# simoneb said on 05 July, 2007 08:41 PM

Thanks Phil, I'll check it out!

# ljianl said on 05 July, 2007 09:02 PM
feel good!!!
# simoneb said on 05 July, 2007 10:08 PM

I've watched the demo Phil, it looks very cool. Congrats for the good work, waiting for entering the beta program now ;)

# TrackBack said on 29 July, 2007 08:20 AM
# SimoneB's Blog said on 13 September, 2007 11:38 AM

As I blogged some time ago I have been developing a source code search engine called CS2 . I completed

This site



This Blog




  • MaximumASP