Finite Int Sets
The
IntSet type represents a set of elements of type
Int. An
IntSet is strict in its elements.
For a walkthrough of the most commonly used functions see their
sets introduction.
These modules are intended to be imported qualified, to avoid name
clashes with Prelude functions, e.g.
import Data.IntSet (IntSet)
import qualified Data.IntSet as IntSet
Implementation
The implementation is based on
big-endian patricia trees. This
data structure performs especially well on binary operations like
union and
intersection. Additionally, benchmarks show
that it is also (much) faster on insertions and deletions when
compared to a generic size-balanced set implementation (see
Data.Set).
- Chris Okasaki and Andy Gill, "Fast Mergeable Integer Maps",
Workshop on ML, September 1998, pages 77-86,
https://web.archive.org/web/20150417234429/https://ittc.ku.edu/~andygill/papers/IntMap98.pdf.
- D.R. Morrison, "PATRICIA -- Practical Algorithm To Retrieve
Information Coded In Alphanumeric", Journal of the ACM, 15(4),
October 1968, pages 514-534,
https://doi.org/10.1145/321479.321481.
Additionally, this implementation places bitmaps in the leaves of the
tree. Their size is the natural size of a machine word (32 or 64 bits)
and greatly reduces the memory footprint and execution times for dense
sets, e.g. sets where it is likely that many values lie close to each
other. The asymptotics are not affected by this optimization.
Performance information
The time complexity is given for each operation in
big-O
notation, with <math> referring to the number of entries in
the map and <math> referring to the number of bits in an
Int (32 or 64).
Operations like
member,
insert, and
delete have a
worst-case complexity of <math>. This means that the operation
can become linear in the number of elements with a maximum of
<math> -- the number of bits in an
Int (32 or 64). These
peculiar asymptotics are determined by the depth of the Patricia
trees:
- even for an extremely unbalanced tree, the depth cannot be larger
than the number of elements <math>,
- each level of a Patricia tree determines at least one more bit
shared by all subelements, so there could not be more than
<math> levels.
If all <math> elements in the tree are between 0 and
<math> (or, say, between <math> and <math>), the
estimate can be refined to <math>. If the set is sufficiently
"dense", this becomes <math> or simply the familiar
<math>, matching balanced binary trees.
The most performant scenario for
IntSet are elements from a
contiguous subset, in which case the complexity is proportional to
<math>, capped by <math>. The worst scenario are
exponentially growing elements (1,2,4, ldots,2^n), for which
complexity grows as fast as <math> but again is capped by
<math>.
Binary set operations like
union and
intersection take
<math> time, where <math> and <math> are the sizes
of the smaller and larger input sets respectively.