Set package:containers

Finite Sets

The Set e type represents a set of elements of type e. Most operations require that e be an instance of the Ord class. A Set is strict in its elements. For a walkthrough of the most commonly used functions see the sets introduction. This module is intended to be imported qualified, to avoid name clashes with Prelude functions, e.g.
import Data.Set (Set)
import qualified Data.Set as Set
Note that the implementation is generally left-biased. Functions that take two sets as arguments and combine them, such as union and intersection, prefer the entries in the first argument to those in the second. Of course, this bias can only be observed when equality is an equivalence relation instead of structural equality.

Warning

The size of the set must not exceed maxBound::Int. Violation of this condition is not detected and if the size limit is exceeded, its behaviour is undefined.

Implementation

The implementation of Set is based on size balanced binary trees (or trees of bounded balance) as described by: Bounds for union, intersection, and difference are as given by

Performance information

The time complexity is given for each operation in big-O notation, with <math> referring to the number of entries in the set. Operations like member, insert, and delete take <math> time. Binary set operations like union and intersection take <math> time, where <math> and <math> are the sizes of the smaller and larger input sets respectively.
A set of values a.
Reverse ordering of topSort. See note in topSort.
Build a map from a set of keys and a function which for each key computes its value.
fromSet (\k -> replicate k 'a') (Data.IntSet.fromList [3, 5]) == fromList [(5,"aaaaa"), (3,"aaa")]
fromSet undefined Data.IntSet.empty == empty
The set of all keys of the map.
keysSet (fromList [(5,"a"), (3,"b")]) == Data.IntSet.fromList [3,5]
keysSet empty == Data.IntSet.empty

Finite Int Sets

The IntSet type represents a set of elements of type Int. An IntSet is strict in its elements. For a walkthrough of the most commonly used functions see their sets introduction. These modules are intended to be imported qualified, to avoid name clashes with Prelude functions, e.g.
import Data.IntSet (IntSet)
import qualified Data.IntSet as IntSet

Implementation

The implementation is based on big-endian patricia trees. This data structure performs especially well on binary operations like union and intersection. Additionally, benchmarks show that it is also (much) faster on insertions and deletions when compared to a generic size-balanced set implementation (see Data.Set). Additionally, this implementation places bitmaps in the leaves of the tree. Their size is the natural size of a machine word (32 or 64 bits) and greatly reduces the memory footprint and execution times for dense sets, e.g. sets where it is likely that many values lie close to each other. The asymptotics are not affected by this optimization.

Performance information

The time complexity is given for each operation in big-O notation, with <math> referring to the number of entries in the map and <math> referring to the number of bits in an Int (32 or 64). Operations like member, insert, and delete have a worst-case complexity of <math>. This means that the operation can become linear in the number of elements with a maximum of <math> -- the number of bits in an Int (32 or 64). These peculiar asymptotics are determined by the depth of the Patricia trees:
  • even for an extremely unbalanced tree, the depth cannot be larger than the number of elements <math>,
  • each level of a Patricia tree determines at least one more bit shared by all subelements, so there could not be more than <math> levels.
If all <math> elements in the tree are between 0 and <math> (or, say, between <math> and <math>), the estimate can be refined to <math>. If the set is sufficiently "dense", this becomes <math> or simply the familiar <math>, matching balanced binary trees. The most performant scenario for IntSet are elements from a contiguous subset, in which case the complexity is proportional to <math>, capped by <math>. The worst scenario are exponentially growing elements (1,2,4, ldots,2^n), for which complexity grows as fast as <math> but again is capped by <math>. Binary set operations like union and intersection take <math> time, where <math> and <math> are the sizes of the smaller and larger input sets respectively.
A set of integers.
Is this a proper subset? (ie. a subset but not equal).
Is this a subset? (s1 `isSubsetOf` s2) tells whether s1 is a subset of s2.
The set of all elements of the map contained in Args.
argSet (fromList [(5,"a"), (3,"b")]) == Data.Set.fromList [Arg 3 "b",Arg 5 "a"]
argSet empty == Data.Set.empty
Build a map from a set of elements contained inside Args.
fromArgSet (Data.Set.fromList [Arg 3 "aaa", Arg 5 "aaaaa"]) == fromList [(5,"aaaaa"), (3,"aaa")]
fromArgSet Data.Set.empty == empty
Build a map from a set of keys and a function which for each key computes its value.
fromSet (\k -> replicate k 'a') (Data.Set.fromList [3, 5]) == fromList [(5,"aaaaa"), (3,"aaa")]
fromSet undefined Data.Set.empty == empty
The set of all keys of the map.
keysSet (fromList [(5,"a"), (3,"b")]) == Data.Set.fromList [3,5]
keysSet empty == Data.Set.empty
(s1 `isProperSubsetOf` s2) indicates whether s1 is a proper subset of s2.
s1 `isProperSubsetOf` s2 = s1 `isSubsetOf` s2 && s1 /= s2
(s1 `isSubsetOf` s2) indicates whether s1 is a subset of s2.
s1 `isSubsetOf` s2 = all (`member` s2) s1
s1 `isSubsetOf` s2 = null (s1 `difference` s2)
s1 `isSubsetOf` s2 = s1 `union` s2 == s2
s1 `isSubsetOf` s2 = s1 `intersection` s2 == s1
Calculate the power set of a set: the set of all its subsets.
t `member` powerSet s == t `isSubsetOf` s
Example:
powerSet (fromList [1,2,3]) =
fromList $ map fromList [[],[1],[1,2],[1,2,3],[1,3],[2],[2,3],[3]]