June 2007 - Posts

Indexing and searching source code with Lucene.Net

During the past week I've been working on a university project for my course of Information Retrieval, and although the project proposal from the teacher was to implement an xml parsing application written in Java I thought I would put in use the skills I developed on my own with the .NET framework to implement something more useful.

My idea was to create a homemade source code indexing and search service, so I started fiddling with Lucene.Net, CastleProject, C# Parser and a couple other open source projects to see what I could come up with. There are already a lot of services which allows to search source code online, see Krugle, Google Code Search and Koders among others.

Well, of course I couldn't use one of them as my course project, so I started implementing my own. I called it CS2 - C Sharp Code Search, and its source code is available under the MIT license on its Google Project Hosting website. I think it's a good example of the usage of Lucene.Net and CastleProject's IoC container in a wanna be real life project.

At the moment only the indexing part is implemented and you can see it working launching the console application project contained in the solution. The index created is compatible with the Lucene family implementations, so it can be browsed using an application like Luke, until I implement the searching part.

At the moment the features it implements are indexing C# source code files by parsing them and retrieving information like class, method and property names so that they can be searched against, as well as full-text search. It is extensible by implementing parsers for other languages, I've built it to make it pretty straightforward. It remembers the files indexed and periodically checks for modifications or file deletions. The console project comes with a full logging mechanism which shows what the program is actually doing. It is highly configurable via configuration files, see App.config and the files in the Configuration directory, used mostly for Castle Windsor configuration.

Let me know what you think! I'll say more about it in the next weeks... and please, don't hand it to your Information Retrieval teachers until I've delivered it to mine ;)

kick it on DotNetKicks.com

Posted 29 June 2007 06:09 PM by simoneb | 9 comment(s)
Filed under:
How to sort a generic List<T>

After reading this post from Steven Smith I thought I should write something about it.

Sorting a generic List<T> is pretty straightforward if you know how to do it. With C# 2.0, anonymous methods come at hand, as well as the little known Comparison<T> delegate (check out this post for more information about this class as well as other useful classes new to C# 2.0).

Ok, let's suppose we have a product class (let me save some space by using C# 3.0 syntax).

class Product
{
public int ProductID { get; set; }
public string ProductName { get; set; }
public decimal UnitPrice { get; set; }
}

When we have a list of products we may want to sort it on the ProductName property before displaying it to the user. This can be accomplished with the Sort method of the List<T> class, which defines several overloads. The most handy in this case is the Sort(Comparison<Product>) method and the result is easily achieved with a couple lines of code.

List<Product> products = new List<Product>();

products.Sort(delegate(Product p1, Product p2)
{
return p1.ProductName.CompareTo(p2.ProductName);
});

So far so good, but what if we need to sort our list in several places during the execution of our program? Do we have to write that code each time? Actually no, since we can use the parameterless Sort() method of our list class. What this method does is use the "default comparer" to sort the list. So what's this default comparer? It's the comparer that's automatically created if we implement the IComparable<T> interface. This way we can centralize the sorting logic into our class, and just call the parameterless Sort() method on it whenever we need it sorted on the ProductName property.

public class Product : IComparable<Product>
{
[...]

public int CompareTo(Product other)
{
return ProductName.CompareTo(other.ProductName);
}
}

Ok, now what if we want to be able to sort it on the other two properties, ProductID and UnitPrice? Do we have to write an anonymous method each time as we did in the beginning? Of course no, since there's a useful trick which prevents us from needing to do that. We can define two static Comparer<Product> properties in our product class, and supply them as parameters to the Sort(Comparer<T>) method of our list whenever we need it sorted on something which is not the default sorting logic.

public class Product : IComparable<Product>
{
[...]

public static Comparison<Product> PriceComparison =
delegate(Product p1, Product p2)
{
return p1.Price.CompareTo(p2.Price);
};

public static Comparison<Product> IDComparison =
delegate(Product p1, Product p2)
{
return p1.ProductID.CompareTo(p2.ProductID);
};

[...]
}

Since they are static they can be used simply like so: products.Sort(Product.PriceComparison) or products.Sort(Product.IDComparison), which will respectively sort the list by price and id.

Below is the full code of the Product class.

public class Product : IComparable<Product>
{
private int id;
private string prodName;
private decimal price;

public static Comparison<Product> PriceComparison = delegate(Product p1, Product p2)
{
return p1.price.CompareTo(p2.price);
};

public static Comparison<Product> IDComparison = delegate(Product p1, Product p2)
{
return p1.id.CompareTo(p2.id);
};

public int ProductID
{
get { return id; }
set { id = value; }
}

public string ProductName
{
get { return prodName; }
set { prodName = value; }
}

public decimal UnitPrice
{
get { return price; }
set { price = value; }
}

public Product(int id, string prodName, decimal price)
{
this.id = id;
this.prodName = prodName;
this.price = price;
}

#region IComparable<Product> Members

public int CompareTo(Product other)
{
return ProductName.CompareTo(other.ProductName);
}

#endregion

public override string ToString()
{
return string.Format("Id: {0} Name: {1} Price: {2}", id, prodName, price);
}
}

kick it on DotNetKicks.com

Posted 20 June 2007 04:19 AM by simoneb | 27 comment(s)
Filed under:
ASP.NET Internals - the second article is available for reading

The second article of my series about ASP.NET Internals is up. Before reading I reccomend giving a look at the first one however.

It took me a while to write it since I had to fiddle into the ASP.NET infrastructure with several profilers to get to understand the undocumented parts, but it was fun. In this article I've talked about the interaction between ISAPI extensions and managed code as well as the setup mechanism of AppDomains.

Next part will cover the principal managed part of ASP.NET, the HTTP Pipeline.

The leading UI suite for ASP.NET - Telerik radControls
Outstanding performance. Full ASP.NET AJAX support. Nearly codeless development.

This site

Search

Go

This Blog

News

     

    CS2
    Checkout CS2, my academic project about indexing and searching personal source code with Lucene.Net.

    Windows Developer Power Tools

Syndication

Sponsors

  • MaximumASP
  • Packet Sniffer
    Custom Essay
  • conference calling services