May 2007 - Posts

Type Visualizers : Visualizing non-serializable types.

Well thank you PShaffer for your comment. You can indeed create type visualizers for non-serializable types using the Microsoft.VisualStudio.VisualizerObjectSource class. This object is responsible for serializing the type, in binary, from Visual Studio (debugee) to the Type Visualizer (debugger).

Fairly straightforward, to do your own serialization create a custom class that inherits from this class and override the GetData(object target, Stream outgoingData) method. Target is the object being debugged and the outgoingData stream is where you serialize the object too, this stream is what is passed to the overloaded Show() method of the visualizer.

When you use the DebuggerVisualizerAttribute to map your visualizer to the type you use an overload that accepts as a parameter the custom VisualizerObjectSource and this is used instead of the default. aka...

[assembly: DebuggerVisualizer(typeof(XmlDebugTypeVisualizer.XmlTypeVisualizer), 
typeof(XmlDebugTypeVisualizer.XmlVisualizerObjectSource),
Target= typeof(System.Xml.XmlDocument), Description="View String as DOM tree")]

Visualizer Architecture

How to: Write a Visualizer

It's too late in the day to go updating the XmlVisualizer I posted before so in the meantime here's a better one.

Conchango Xml Visualizer for Visual Studio 2005 (RTM)

Just like to add I do not mind being wrong, so if you find something that I write is incorrect or not quite complete then please do tell me. So that I can correct it and learn some more. Thanks again PShaffer that comment was appreciated.

XML Debug Type Visualizer

I've create a XML debugger visualizer for Visual Studio 2005 which you can download here. Just unzip and place the DLL in 'C:\Program Files\Microsoft Visual Studio 8\Common7\Packages\Debugger\Visualizers'. You can view the XML as text and as a DOM like tree as well as run XPath expressions as a filter mechanism.

Couple of points worth noting....

1) The visualizer borrows heavily from an application I wrote a couple of years ago. Performance on large XML files might not be very good and I haven't really reviewed the code. I've went very much with the it works so why break it approach.

2) You can only develop Type Visualizer for types that are serializable. Nothing I have read on visualizer has stated this explicitly so you've heard it here first. Unfortunately the XmlDocument type is not serializable, it doesn't have a Serializable attribute. So this visualizer works only on strings, the idea being you use it to view the InnerText property of an XmlDocument or other well formed XML based string.

Update: I've been told point 2 is incorrect. I'll will update once I have more information.

3) XPath expressions uses a prefix of 'default' for default namespaces. So use /default:root/default:child to run an XPath expression. The application hasn't been ran on fairly complex XML but I've never had any problems with it so far.

4) You can modify the XML in the visualizer but it won't update the application.

I'll work on it if and when time and projects permit. It's really something I've developed for myself rather that thinking of other people.

Benchmarking Regular Expressions

I've been helping out quite a bit on MSDN Forums over the last month in preperation for the approaching MCAD -> MCPD Windows exam. There is only so much book reading you can do and the good thing about forums are they are a great study tool.

Tonight I got into a discussion with TaDa about what is faster; parsing text using regular expressions or parsing text by iterating character per character. The goal was to extract out a large amount of telephone numbers in the format 0809 983 398, as an example, mixed along with some unwanted text. Before I go into any more detail thanks TaDa I very rarely get to discuss any sort of development with another developer as I'm the only .NET developer in my company.

So the original benchmark was 100,000 records, file size 3.81mb.

The results were not including time to load the file...

Using regular expressions it took 78 milliseconds.

Using iterative method it took 222 milliseconds.

Regular expressions were 2.846 times faster but the time taken by the iterative method wasn't a huge amount of time. In terms of the application (which had a maximum of 50,000), and in term of the discussion, there wasn't that much to be concerned about. Either way would have done the job.

So the benchmark was increase to 1,000,000 records, file size 31.8mb.

TaDa benchmark method and my benchmark method changed somewhat, TaDa read the file line by line while I just loaded the file in one big 31.8 mb hit. The results where interesting...

TaDa's method (using Reader.Peek and ReadLine) resulted in the following times (including file access).

Using regular expressions it took 2760 milliseconds.

Using iterative method it took 4232 milliseconds.

My method after reading the whole file into a string....

Using regular expressions it took 235 milliseconds.

Using iterative method it took 14052 milliseconds.

But then I ran my application again....

Using regular expressions it took 3 milliseconds.

Using iterative method it took 2136 milliseconds.

. o O (WTF? 3 milliseconds!) I thought, how was that possible !!

I expected a bit of difference between the times needed for iterative approaches as there was quite a lot of memory getting used and the GC would have randomly kicked in but how, how can parsing a 31mb file only take 3 milliseconds. Turns out regular expressions are compiled into assemblies the same as XSLT stylesheets so in the initial call there is a bit of a performance hit but after that things are better. I presume that this is case here and that this assembly is cached for later application runs, but that is quite a saving.

So I re-ran TaDa's methods, that used Peek and ReadLine, wondering why the same saving wasn't being made in the regular expression that were ran line per line. In TaDa's methods after reading each line he called the Regex.Matches() method passing in the string that was being search for, and it dawned on me.

Regular expression must be getting compiled based on both pattern and string, if the string changes then a new compile occurs, and this compile must be getting saved to only one assembly, this assembly getting over written per compile. In TaDa's code where he was passing a different string each time to the Matches() method it was causing a new compile as the string he passed was different to the string he passed the last time, a little bit of time again and again for each different string passed to Regex.Matches().

In my method I just threw the whole string into the Regex.Matches() method, each and ever time the string didn't change, no compile was ever needed. So the first time the application ran there was a hit but after that the string never changed so no compile happened and that results in a 3 milliseconds result.

Problem is.... when the regular expression returns a Matches collection containing over 1,000,000 records it take a bit of time to return the count of that collection. However I am not sure what similar performance hit applies to the iterative method.

So I think this is important when it comes to using regular expressions. They are faster than looping character per character. On small amount of data there isn't that much of a difference, not enough to cause any great problems but as the data set increases then regular expressions are still faster but how you use them can have a massive effect on the amount of time it takes, so do you go line by line, do you batch your lines, or do you throw the whole lot at regex. I think the answer, as with most situations, depends on the project and the data, but, with after this test it turns out, with large amounts of data, you really need to think about.

Posted by dsmyth
Filed under:

VSTO - Code Snippet Downloads

Just some download links to VSTO code snippet over on MSDN.

Visual Studio 2005 Tools for Office Sample: Outlook Snippets

Visual Studio 2005 Tools for Office IntelliSense Code Snippets

Hopefully these will prove useful to someone.

 

Posted by dsmyth
Filed under:
The leading UI suite for ASP.NET - Telerik radControls
Outstanding performance. Full ASP.NET AJAX support. Nearly codeless development.