Treating separate worksheets as relational data in Excel.
A request on the MSDN VBA forum got me thinking. The problem was this; in Excel there are two ADO recordsets, (a) and (b), populated from SQL Server both containing a different list of ID's. How do you find the items that appear in (a) but not (b) and vice versa.
My inital thought was write a stored procedure and return the actual required data from the server rather doing any local processing however this wasn't possible, so how do you do it locally. As far as I know there is no way to create a relational link between recordsets and no way to then run an SQL query over the recordsets. So another option would have been to drop the recordsets into an Access database and do a local query, but the recordset were in Excel and it felt like a hassle bringing in Access to do one task. It looked like the only choice left was a process of loops and filters, not nice.
Then it dawned on me, if you can query a single worksheet with SQL using ADO as though it was a table, would it be possible to query two worksheets joined together as though they were related tables. Turns out it is possible.
Before I expand on the approach and post the code there are some inital steps to take into consideration. Since each worksheet acts like a table the first thing to do is to drop each recordset into seperate worksheets, which can be done using the Range.CopyFromRecordset() method. i.e. MySheet.Range("A1").CopyFromRecordset MyRecordset.
Another step you need to do is save the workbook once the data has been populated. The connection string used to open Excel as a database only works from a file and not from a in memory workbook. So thats an important step to take.
I should point out here that a header row in each worksheet could be used to define the fields used in the SQL statements however it's not necessary. To save an additional step of placing a header row you can specify F1, F2, F3 for the field names in the SQL statements
So presume that the steps above have been done and I have a spreadsheet that contains three worksheets. A sheet called Recordset1 contains the data dumped from an in-memory recordset, Recordset2 contains the data dumped from the other recordset, neither of worksheets contain a header row as the data for each recordet is dumped to cell A1, and the worksheet has been saved.
'connection object needed to connect to Excel
Dim adoConnection As New ADODB.Connection
'connection string to Excel/Access
adoConnection.ConnectionString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & sWorkbook & ";Extended Properties=""Excel 8.0;HDR=Yes;"""
adoConnection.Open
Dim adoCommand As New ADODB.Command
adoCommand.CommandType = adCmdText
adoCommand.ActiveConnection = adoConnection
'unmatch query between recordset1 and recordset2, i.e. find all ID's in recordset1 that aren't in recordset2
adoCommand.CommandText = "SELECT Recordset1$.F1 FROM [Recordset1$] LEFT JOIN [Recordset2$] ON Recordset1$.F1 = Recordset2$.F1 WHERE (((Recordset2$.F1) Is Null));"
Dim adoInRst1NotRs2 As ADODB.Recordset
Set adoInRst1NotRs2 = adoCommand.Execute
'unmatch query between recordset2 and recordset1, i.e. find all ID's in recordset2 that aren't in recordset1
adoCommand.CommandText = "SELECT Recordset2$.F1 FROM [Recordset2$] LEFT JOIN [Recordset1$] ON Recordset2$.F1 = Recordset1$.F1 WHERE (((Recordset1$.F1) Is Null));"
Dim adoInRst2NotRs1 As ADODB.Recordset
Set adoInRst2NotRs1 = adoCommand.Execute
'dump results to another worksheet to test that the approach works
Worksheets("Actual Results").Range("A1").CopyFromRecordset adoInRst1NotRs2
Worksheets("Actual Results").Range("B1").CopyFromRecordset adoInRst2NotRs1
'close the connection
adoConnection.Close
Basically whats happening is an adoConnection is set up to the workbook and two non matching queries are ran. It's the LEFT JOIN of the queries that makes the relationships between the sheets work. The end result is still two recordsets but at least they only contain the information thats required.
*Many thanks to Keith Turnerk for letting me know the following, A small suggestion to correct a potential error in the above code...because you have no header row the connection string needs to read HDR=No rather than HDR=Yes. Otherwise the first value in Recordset1 & 2 are ignored in the queries*