Wednesday, June 16, 2010

SortedSet vs HashSet


HashSet<T> is very good at add and search operations. Any search operation (Contains, Remove, and similar operations) are O(1). That's great. However, on the minus side, the HashSet<T> is not a sorted collection. Therefore, enumerating the elements in a sorted order forces you to copy the items to a different collection (like a List<T>) and sort the resulting list. You could construct a LINQ query to order the elements, however internally that query will likely use some form of temporary storage to create the sorted sequence. That means every sort will be an expensive operation. Sort is typically an O(n ln n) operation, Also, because the HashSet<T> does not have a sort method, you'll also have increased memory pressure and time cost to copy the elements.

SortedSet is new to .NET 4.0 System.Collections.Generic namespace. SortedSet<T> has different characteristics. The sorted set ensures that the elements in the set are always in sorted order. Every Add operation places the new element in the correct location in the set. That means Add is an O(ln n) operation. The SortedSet<T> must perform a binary search to find the correct location for the new element. The search happens on any of the search actions (Contains, Remove, etc). Those operations also have an O(ln n) performance characteristic. That sounds like the SortedSet<T> is always slower than the HashSet<T>. No one would use it if it was always slower. SortedSet<T> is much faster for iterating the set in sorted order. It's already in the correct order, so the enumeration becomes an O(n) operation.

Conclusion
SortedSet<T> will typically be faster than HashSet<T> when the majority of your operations require enumerating the set in one particular order. If, instead, most of the operations are searching, you'll find better performance using the HashSet<T>. The frequency of insert operations also has an effect on which collection would be better. The more frequently insert operations occur, the more likely HashSet<T> will be faster.

No comments:

10 Take-aways Moving MSSQL to PostgreSQL

Background:  Based on my recent project, here are the 10 takeaways of MS SQL Server to PostgreSQL migration. Save the operating cost on SQL ...