Removing Gaps and Duplicates from a Numeric Column in Microsoft SQL Server

Posted by: Scott on Writing, on 18 Jan 2011 | View original | Bookmarked: 0 time(s)

Heres the scenario: you have a database table with an integral numeric column used for sort order of some other non-identifying purpose. Lets call this column SortOrder. There are a many rows in this table. Every row should have a unique, sequentially increasing value in its SortOrder column, but this may not be the case there may be gaps and/or duplicate values in this column.

For example, consider a table with the following schema and data:




1 Scott 1
2 Jisun 8
3 Alice 7
4 Sam 7
5 Benjamin 3
6 Aaron 9
7 Alexis 4
8 Barney 5
9 Jim 5

Note how the SortOrder column has some gaps and duplicates. Ideally, the SortOrder column values for these nine rows would be 1, 2, 3, , 9, but this isnt the case. Instead, the current values (in ascending order) are: 1, 3, 4, 5, 5, 7, 7, 8, 9.

Our task is to take the existing SortColumn values and get them into the ideal format. That is, after our modifications, the tables data should look like so:




1 Scott 1
2 Jisun 8
3 Alice 6
4 Sam 7
5 Benjamin 2
6 Aaron 9
7 Alexis 3
8 Barney 4
9 Jim 5

Note how now there are now no gaps or duplicates in SortOrder.

The Solution: Ranking Functions, Multi-Table UPDATE Statements and Common Table Expressions (CTEs)

Microsoft SQL Server 2005 added a number of ranking functions that simplify assigning ranks to query results, such as associating a sequentially increasing row number with each record returned from a query or assigning a rank to each result. For example, the following query which uses SQL Servers ROW_NUMBER() function returns the records from the Employees table with a sequentially increasing number associated with each record:

       ROW_NUMBER() OVER (ORDER BY SortOrder) AS NoGapsNoDupsSortOrder
FROM Employees

The above query would return the following results. Note how the data is sorted by SortOrder. Theres also a new, materialized column (NoGapsNoDupsSortOrder) that returns sequentially increasing values.




Scott 1 1
Benjamin 3 2
Alexis 4 3
Barney 5 4
Jim 5 5
Alice 7 6
Sam 7 7
Jisun 8 8
Aaron 9 9

What we need to do now is take the value in NoGapsNoDupsSortOrder and assign it to the SortOrder column. If we had the above results in a separate table we could perform such an UPDATE, as SQL Server makes it possible to update records in one database table with data from another table. (See HOWTO: Update Records in a Database Table With Data From Another Table.)

While the results in the above grid are not in a table (but are rather the results from a query), the good news is that we can treat those results as if they were results in another table using a Common Table Expression (CTE). CTEs, which were introduced in SQL Server 2005, can be thought of as a one-off view; that is, a view that is created, defined, and used in a single SQL statement.

Putting it all together, we end up with the following UPDATE statement:

WITH OrderedResults(EmployeeId, NoGapNoDupSortOrder) AS 
    SELECT EmployeeId, 
              ROW_NUMBER() OVER (ORDER BY SortOrder) AS NoGapNoDupSortOrder
    FROM Employees
UPDATE Employees
    SET SortOrder = OrderedResults.NoGapNoDupSortOrder
FROM OrderedResults
WHERE Employees.EmployeeId = OrderedResults.EmployeeId AND 
       Employees.SortOrder <> OrderedResults.NoGapNoDupSortOrder

The above query starts by defining a CTE named OrderedResults that returns two column values: EmployeeId and NoGapNoDupSortOrder. It then updates the Employees table, setting its SortOrder column value to the NoGapNoDupSortOrder value where the Employees tables EmployeeId value matches the OrderedResults CTEs EmployeeId value (and where the SortOrder does not equal the NoGapNoDupSortOrder).

For more information on CTEs, ranked results, and updating one table (Employees) with data from another table or CTE (OrderedResults), check out the following resources:

Happy Programming!

Category: SQL | Other Posts: View all posts by this blogger | Report as irrelevant | View bloggers stats | Views: 918 | Hits: 43

News Categories

.NET | Agile | Ajax | Architecture | ASP.NET | BizTalk | C# | Certification | Data | DataGrid | DataSet | Debugger | DotNetNuke | Events | GridView | IIS | Indigo | JavaScript | Mobile | Mono | Patterns and Practices | Performance | Podcast | Refactor | Regex | Security | Sharepoint | Silverlight | Smart Client Applications | Software | SQL | VB.NET | Visual Studio | W3 | WCF | WinFx | WPF | WSE | XAML | XLinq | XML | XSD