Set package:containers

Finite Sets

The Set e type represents a set of elements of type e. Most operations require that e be an instance of the Ord class. A Set is strict in its elements. For a walkthrough of the most commonly used functions see the sets introduction. This module is intended to be imported qualified, to avoid name clashes with Prelude functions, e.g.

import Data.Set (Set)
import qualified Data.Set as Set

Note that the implementation is generally left-biased. Functions that take two sets as arguments and combine them, such as union and intersection, prefer the entries in the first argument to those in the second. Of course, this bias can only be observed when equality is an equivalence relation instead of structural equality.

Warning

The size of the set must not exceed maxBound::Int. Violation of this condition is not detected and if the size limit is exceeded, its behaviour is undefined.

Implementation

The implementation of Set is based on size balanced binary trees (or trees of bounded balance) as described by:

Stephen Adams, "Efficient sets—a balancing act", Journal of Functional Programming 3(4):553-562, October 1993, https://doi.org/10.1017/S0956796800000885, https://groups.csail.mit.edu/mac/users/adams/BB/index.html.
J. Nievergelt and E.M. Reingold, "Binary search trees of bounded balance", SIAM journal of computing 2(1), March 1973. https://doi.org/10.1137/0202005.
Yoichi Hirai and Kazuhiko Yamamoto, "Balancing weight-balanced trees", Journal of Functional Programming 21(3):287-307, 2011, https://doi.org/10.1017/S0956796811000104

Bounds for union, intersection, and difference are as given by

Guy Blelloch, Daniel Ferizovic, and Yihan Sun, "Parallel Ordered Sets Using Join", https://arxiv.org/abs/1602.02120v4.

Performance information

The time complexity is given for each operation in big-O notation, with <math> referring to the number of entries in the set. Operations like member, insert, and delete take <math> time. Binary set operations like union and intersection take <math> time, where <math> and <math> are the sizes of the smaller and larger input sets respectively.

data Set a

containers Data.Set Data.Set.Internal

A set of values a.

reverseTopSort :: Graph -> [Vertex]

containers Data.Graph

Reverse ordering of topSort. See note in topSort.

fromSet :: (Key -> a) -> IntSet -> IntMap a

containers Data.IntMap.Internal Data.IntMap.Lazy Data.IntMap.Strict Data.IntMap.Strict.Internal

Build a map from a set of keys and a function which for each key computes its value.

fromSet (\k -> replicate k 'a') (Data.IntSet.fromList [3, 5]) == fromList [(5,"aaaaa"), (3,"aaa")]
fromSet undefined Data.IntSet.empty == empty

keysSet :: IntMap a -> IntSet

containers Data.IntMap.Internal Data.IntMap.Lazy Data.IntMap.Strict Data.IntMap.Strict.Internal

The set of all keys of the map.

keysSet (fromList [(5,"a"), (3,"b")]) == Data.IntSet.fromList [3,5]
keysSet empty == Data.IntSet.empty

module Data.IntSet

containers

Finite Int Sets

The IntSet type represents a set of elements of type Int. An IntSet is strict in its elements. For a walkthrough of the most commonly used functions see their sets introduction. These modules are intended to be imported qualified, to avoid name clashes with Prelude functions, e.g.

import Data.IntSet (IntSet)
import qualified Data.IntSet as IntSet

Implementation

The implementation is based on big-endian patricia trees. This data structure performs especially well on binary operations like union and intersection. Additionally, benchmarks show that it is also (much) faster on insertions and deletions when compared to a generic size-balanced set implementation (see Data.Set).

Chris Okasaki and Andy Gill, "Fast Mergeable Integer Maps", Workshop on ML, September 1998, pages 77-86, https://web.archive.org/web/20150417234429/https://ittc.ku.edu/~andygill/papers/IntMap98.pdf.
D.R. Morrison, "PATRICIA -- Practical Algorithm To Retrieve Information Coded In Alphanumeric", Journal of the ACM, 15(4), October 1968, pages 514-534, https://doi.org/10.1145/321479.321481.

Additionally, this implementation places bitmaps in the leaves of the tree. Their size is the natural size of a machine word (32 or 64 bits) and greatly reduces the memory footprint and execution times for dense sets, e.g. sets where it is likely that many values lie close to each other. The asymptotics are not affected by this optimization.

Performance information

The time complexity is given for each operation in big-O notation, with <math> referring to the number of entries in the map and <math> referring to the number of bits in an Int (32 or 64). Operations like member, insert, and delete have a worst-case complexity of <math>. This means that the operation can become linear in the number of elements with a maximum of <math> -- the number of bits in an Int (32 or 64). These peculiar asymptotics are determined by the depth of the Patricia trees:

even for an extremely unbalanced tree, the depth cannot be larger than the number of elements <math>,
each level of a Patricia tree determines at least one more bit shared by all subelements, so there could not be more than <math> levels.

If all <math> elements in the tree are between 0 and <math> (or, say, between <math> and <math>), the estimate can be refined to <math>. If the set is sufficiently "dense", this becomes <math> or simply the familiar <math>, matching balanced binary trees. The most performant scenario for IntSet are elements from a contiguous subset, in which case the complexity is proportional to <math>, capped by <math>. The worst scenario are exponentially growing elements (1,2,4, ldots,2^n), for which complexity grows as fast as <math> but again is capped by <math>. Binary set operations like union and intersection take <math> time, where <math> and <math> are the sizes of the smaller and larger input sets respectively.

data IntSet

containers Data.IntSet Data.IntSet.Internal

A set of integers.

isProperSubsetOf :: IntSet -> IntSet -> Bool

containers Data.IntSet Data.IntSet.Internal

Is this a proper subset? (ie. a subset but not equal).

isSubsetOf :: IntSet -> IntSet -> Bool

containers Data.IntSet Data.IntSet.Internal

Is this a subset? (s1 `isSubsetOf` s2) tells whether s1 is a subset of s2.

argSet :: Map k a -> Set (Arg k a)

containers Data.Map.Internal Data.Map.Lazy Data.Map.Strict Data.Map.Strict.Internal

The set of all elements of the map contained in Args.

argSet (fromList [(5,"a"), (3,"b")]) == Data.Set.fromList [Arg 3 "b",Arg 5 "a"]
argSet empty == Data.Set.empty

fromArgSet :: Set (Arg k a) -> Map k a

containers Data.Map.Internal Data.Map.Lazy Data.Map.Strict Data.Map.Strict.Internal

Build a map from a set of elements contained inside Args.

fromArgSet (Data.Set.fromList [Arg 3 "aaa", Arg 5 "aaaaa"]) == fromList [(5,"aaaaa"), (3,"aaa")]
fromArgSet Data.Set.empty == empty

fromSet :: (k -> a) -> Set k -> Map k a

containers Data.Map.Internal Data.Map.Lazy Data.Map.Strict Data.Map.Strict.Internal

Build a map from a set of keys and a function which for each key computes its value.

fromSet (\k -> replicate k 'a') (Data.Set.fromList [3, 5]) == fromList [(5,"aaaaa"), (3,"aaa")]
fromSet undefined Data.Set.empty == empty

keysSet :: Map k a -> Set k

containers Data.Map.Internal Data.Map.Lazy Data.Map.Strict Data.Map.Strict.Internal

The set of all keys of the map.

keysSet (fromList [(5,"a"), (3,"b")]) == Data.Set.fromList [3,5]
keysSet empty == Data.Set.empty

isProperSubsetOf :: Ord a => Set a -> Set a -> Bool

containers Data.Set Data.Set.Internal

(s1 `isProperSubsetOf` s2) indicates whether s1 is a proper subset of s2.

s1 `isProperSubsetOf` s2 = s1 `isSubsetOf` s2 && s1 /= s2

isSubsetOf :: Ord a => Set a -> Set a -> Bool

containers Data.Set Data.Set.Internal

(s1 `isSubsetOf` s2) indicates whether s1 is a subset of s2.

s1 `isSubsetOf` s2 = all (`member` s2) s1
s1 `isSubsetOf` s2 = null (s1 `difference` s2)
s1 `isSubsetOf` s2 = s1 `union` s2 == s2
s1 `isSubsetOf` s2 = s1 `intersection` s2 == s1

powerSet :: Set a -> Set (Set a)

containers Data.Set Data.Set.Internal

Calculate the power set of a set: the set of all its subsets.

t `member` powerSet s == t `isSubsetOf` s

Example:

powerSet (fromList [1,2,3]) =
fromList $ map fromList [[],[1],[1,2],[1,2,3],[1,3],[2],[2,3],[3]]