Where to begin... at the moment I am unbelieveably busy. To cut a long story short some promises very have been made by some managerial types at the top of the ladder than has meant us developer grunts at the bottom of said ladder have to deliver by the end of March. Fortunately for me I'm the lead developer and responsibility rests with me to roll the gear out on time... today the project reached a bit of a show stopper in that the files the application stored (XML from a strongly typed dataset) looked to be growing in size at quite an alarming rate.
The application is used to perform a lot of heavy engineering calculations, these calculations produce an amazing amount of data which is then stored in a dataset. As well as that a subsection of other data (like a time based snapshot) had to be stored along with the result for producing reports. The specification gave an average of about 30 results per file and after only 3 results the files generated where storing over 2Mb of information... multiplying this up and it looked like files of around 60Mb where going to be 'the norm'. Not on my shift.
So the solution, serialise the dataset to binary and while it's serialising run it through GZip compression,. The result, the 2Mb file became 520K.... superb. 60 results maybe around 3Mb...
Here's the code snippets that was used..
The main namespaces involved.
Imports System.IO
Imports System.IO.Compression
Imports System.Runtime.Serialization.Formatters.Binary
Serialisation of a DataSet (named DocumentDataSet)
DocumentDataSet.RemotingFormat = SerializationFormat.Binary
DocumentDataSet.SchemaSerializationMode = SchemaSerializationMode.ExcludeSchema
Dim stream As Stream = File.Create(fileName)
Dim compress As New GZipStream(stream, CompressionMode.Compress)
Dim bin As New BinaryFormatter()
bin.Serialize(compress, DocumentDataSet)
compress.Close()
Make sure you exclude the schema, I found that including the schema produced a binary file larger than the XML file. Also make sure you close the GZipStream; if you don't you'll get an error on deserialization (something about reading past the end of the stream)
Deserialisation of a DataSet (named DocumentDataSet)
DocumentDataSet.RemotingFormat = SerializationFormat.Binary
DocumentDataSet.SchemaSerializationMode = SchemaSerializationMode.ExcludeSchema
Dim stream As Stream = File.Open(fileName, FileMode.Open)
Dim decompress As New GZipStream(stream, CompressionMode.Decompress)
Dim bin As New BinaryFormatter()
DocumentDataSet = CType(bin.Deserialize(decompress), DocumentStructure)
decompress.Close()
The deserialization code is fairly straightforward. Remember to close the GZipStream when your finished. DocumentStructure is the DataSets type, DocumentDataSet is an instance of that type. Well worth knowing if your sending datasets over the wire or storing them in cache. You'll also notice a performance gain as Binary serialization is quicker than XML serialization, however one side effect is you can only deserialize the DataSet using a .NET language.