Builder

Builders are used to efficiently construct sequences of bytes from smaller parts. Typically, such a construction is part of the implementation of an encoding, i.e., a function for converting Haskell values to sequences of bytes. Examples of encodings are the generation of the sequence of bytes representing a HTML document to be sent in a HTTP response by a web application or the serialization of a Haskell value using a fixed binary format. For an efficient implementation of an encoding, it is important that (a) little time is spent on converting the Haskell values to the resulting sequence of bytes and (b) that the representation of the resulting sequence is such that it can be consumed efficiently. Builders support (a) by providing an O(1) concatentation operation and efficient implementations of basic encodings for Chars, Ints, and other standard Haskell values. They support (b) by providing their result as a LazyByteString, which is internally just a linked list of pointers to chunks of consecutive raw memory. LazyByteStrings can be efficiently consumed by functions that write them to a file or send them over a network socket. Note that each chunk boundary incurs expensive extra work (e.g., a system call) that must be amortized over the work spent on consuming the chunk body. Builders therefore take special care to ensure that the average chunk size is large enough. The precise meaning of large enough is application dependent. The current implementation is tuned for an average chunk size between 4kb and 32kb, which should suit most applications. As a simple example of an encoding implementation, we show how to efficiently convert the following representation of mixed-data tables to an UTF-8 encoded Comma-Separated-Values (CSV) table.
data Cell = StringC String
| IntC Int
deriving( Eq, Ord, Show )

type Row   = [Cell]
type Table = [Row]
We use the following imports.
import qualified Data.ByteString.Lazy               as L
import           Data.ByteString.Builder
import           Data.List                            (intersperse)
CSV is a character-based representation of tables. For maximal modularity, we could first render Tables as Strings and then encode this String using some Unicode character encoding. However, this sacrifices performance due to the intermediate String representation being built and thrown away right afterwards. We get rid of this intermediate String representation by fixing the character encoding to UTF-8 and using Builders to convert Tables directly to UTF-8 encoded CSV tables represented as LazyByteStrings.
encodeUtf8CSV :: Table -> L.LazyByteString
encodeUtf8CSV = toLazyByteString . renderTable

renderTable :: Table -> Builder
renderTable rs = mconcat [renderRow r <> charUtf8 '\n' | r <- rs]

renderRow :: Row -> Builder
renderRow []     = mempty
renderRow (c:cs) =
renderCell c <> mconcat [ charUtf8 ',' <> renderCell c' | c' <- cs ]

renderCell :: Cell -> Builder
renderCell (StringC cs) = renderString cs
renderCell (IntC i)     = intDec i

renderString :: String -> Builder
renderString cs = charUtf8 '"' <> foldMap escape cs <> charUtf8 '"'
where
escape '\\' = charUtf8 '\\' <> charUtf8 '\\'
escape '\"' = charUtf8 '\\' <> charUtf8 '\"'
escape c    = charUtf8 c
Note that the ASCII encoding is a subset of the UTF-8 encoding, which is why we can use the optimized function intDec to encode an Int as a decimal number with UTF-8 encoded digits. Using intDec is more efficient than stringUtf8 . show, as it avoids constructing an intermediate String. Avoiding this intermediate data structure significantly improves performance because encoding Cells is the core operation for rendering CSV-tables. See Data.ByteString.Builder.Prim for further information on how to improve the performance of renderString. We demonstrate our UTF-8 CSV encoding function on the following table.
strings :: [String]
strings =  ["hello", "\"1\"", "λ-wörld"]

table :: Table
table = [map StringC strings, map IntC [-3..3]]
The expression encodeUtf8CSV table results in the following lazy LazyByteString.
Chunk "\"hello\",\"\\\"1\\\"\",\"\206\187-w\195\182rld\"\n-3,-2,-1,0,1,2,3\n" Empty
We can clearly see that we are converting to a binary format. The 'λ' and 'ö' characters, which have a Unicode codepoint above 127, are expanded to their corresponding UTF-8 multi-byte representation. We use the criterion library (http://hackage.haskell.org/package/criterion) to benchmark the efficiency of our encoding function on the following table.
import Criterion.Main     -- add this import to the ones above

maxiTable :: Table
maxiTable = take 1000 $ cycle table

main :: IO ()
main = defaultMain
[ bench "encodeUtf8CSV maxiTable (original)" $
whnf (L.length . encodeUtf8CSV) maxiTable
]
On a Core2 Duo 2.20GHz on a 32-bit Linux, the above code takes 1ms to generate the 22'500 bytes long LazyByteString. Looking again at the definitions above, we see that we took care to avoid intermediate data structures, as otherwise we would sacrifice performance. For example, the following (arguably simpler) definition of renderRow is about 20% slower.
renderRow :: Row -> Builder
renderRow  = mconcat . intersperse (charUtf8 ',') . map renderCell
Similarly, using O(n) concatentations like ++ or the equivalent concat operations on strict and LazyByteStrings should be avoided. The following definition of renderString is also about 20% slower.
renderString :: String -> Builder
renderString cs = charUtf8 $ "\"" ++ concatMap escape cs ++ "\""
where
escape '\\' = "\\"
escape '\"' = "\\\""
escape c    = return c
Apart from removing intermediate data-structures, encodings can be optimized further by fine-tuning their execution parameters using the functions in Data.ByteString.Builder.Extra and their "inner loops" using the functions in Data.ByteString.Builder.Prim.
Builders denote sequences of bytes. They are Monoids where mempty is the zero-length sequence and mappend is concatenation, which runs in O(1).
Warning: this is an internal module, and does not have a stable API or name. Functions in this module may not check or enforce preconditions expected by public modules. Use at your own risk! Efficient construction of lazy Text values. The principal operations on a Builder are singleton, fromText, and fromLazyText, which construct new builders, and mappend, which concatenates two builders. To get maximum performance when building lazy Text values using a builder, associate mappend calls to the right. For example, prefer
singleton 'a' `mappend` (singleton 'b' `mappend` singleton 'c')
to
singleton 'a' `mappend` singleton 'b' `mappend` singleton 'c'
as the latter associates mappend to the left.
A Builder is an efficient way to build lazy Text values. There are several functions for constructing builders, but only one to inspect them: to extract any data, you have to turn them into lazy Text values using toLazyText. Internally, a builder constructs a lazy Text by filling arrays piece by piece. As each buffer is filled, it is 'popped' off, to become a new chunk of the resulting lazy Text. All this is hidden from the user of the Builder.
Efficient construction of lazy Text values. The principal operations on a Builder are singleton, fromText, and fromLazyText, which construct new builders, and mappend, which concatenates two builders. To get maximum performance when building lazy Text values using a builder, associate mappend calls to the right. For example, prefer
singleton 'a' `mappend` (singleton 'b' `mappend` singleton 'c')
to
singleton 'a' `mappend` singleton 'b' `mappend` singleton 'c'
as the latter associates mappend to the left. Or, equivalently, prefer
singleton 'a' <> singleton 'b' <> singleton 'c'
since the <> from recent versions of Monoid associates to the right.
Efficient constructions of lazy bytestrings. This now re-exports Builder.
drastically, or be entirely removed, in a future release.
A set of helpers that should make construction of ParseErrors more concise. This is primarily useful in test suites and for debugging.
Blaze.ByteString.Builder is the main module, which you should import as a user of the blaze-builder library.
import Blaze.ByteString.Builder
It provides you with a type Builder that allows to efficiently construct lazy bytestrings with a large average chunk size. Intuitively, a Builder denotes the construction of a part of a lazy bytestring. Builders can either be created using one of the primitive combinators in Blaze.ByteString.Builder.Write or by using one of the predefined combinators for standard Haskell values (see the exposed modules of this package). Concatenation of builders is done using mappend from the Monoid typeclass. Here is a small example that serializes a list of strings using the UTF-8 encoding.
import Blaze.ByteString.Builder.Char.Utf8
strings :: [String]
strings = replicate 10000 "Hello there!"
The function fromString creates a Builder denoting the UTF-8 encoded argument. Hence, UTF-8 encoding and concatenating all strings can be done follows.
concatenation :: Builder
concatenation = mconcat $ map fromString strings
The function toLazyByteString can be used to execute a Builder and obtain the resulting lazy bytestring.
result :: L.ByteString
result = toLazyByteString concatenation
The result is a lazy bytestring containing 10000 repetitions of the string "Hello there!" encoded using UTF-8. The corresponding 120000 bytes are distributed among three chunks of 32kb and a last chunk of 6kb. A note on history. This serialization library was inspired by the Data.Binary.Builder module provided by the binary package. It was originally developed with the specific needs of the blaze-html package in mind. Since then it has been restructured to serve as a drop-in replacement for Data.Binary.Builder, which it improves upon both in speed as well as expressivity.
Convert a stream of blaze-builder Builders into a stream of ByteStrings. Works with both blaze-builder < 0.4's Builders and Builder. Adapted from blaze-builder-enumerator, written by myself and Simon Meier. Note that the functions here can work in any monad built on top of IO or ST. Since 1.1.7.0
Build an interface from an XML UI definition All functions in this module are only available in Gtk 2.12 or higher.
Low-level bytestring builders. Most users want to use the more type-safe Data.Csv.Incremental module instead.
A builder for building the CSV data incrementally. Just like the ByteString builder, this builder should be used in a right-associative, foldr style. Using <> to compose builders in a left-associative, foldl' style makes the building not be incremental.
Builders denote sequences of bytes. They are Monoids where mempty is the zero-length sequence and mappend is concatenation, which runs in O(1).
Buffering for output streams based on bytestring builders. Buffering an output stream can often improve throughput by reducing the number of system calls made through the file descriptor. The bytestring package provides an efficient monoidal datatype used for serializing values directly to an output buffer, called a Builder, originally implemented in the blaze-builder package by Simon Meier. When compiling with bytestring versions older than 0.10.4, (i.e. GHC <= 7.6) users must depend on the bytestring-builder library to get the new builder implementation. Since we try to maintain compatibility with the last three GHC versions, the dependency on bytestring-builder can be dropped after the release of GHC 7.12. Using this module Given an OutputStream taking ByteString:
someOutputStream :: OutputStream ByteString
You create a new output stream wrapping the original one that accepts Builder values:
do
newStream <- Streams.builderStream someOutputStream
Streams.write (Just $ byteString "hello") newStream
....
You can flush the output buffer using flush:
....
Streams.write (Just flush) newStream
....
As a convention, builderStream will write the empty string to the wrapped OutputStream upon a builder buffer flush. Output streams which receive ByteString should either ignore the empty string or interpret it as a signal to flush their own buffers, as the handleToOutputStream and System.IO.Streams.Zlib functions do. Example
example :: IO [ByteString]
example = do
let l1 = intersperse " " ["the", "quick", "brown", "fox"]
let l2 = intersperse " " ["jumped", "over", "the"]
let l  = map byteString l1 ++ [flush] ++ map byteString l2
is          <- Streams.fromList l
(os0, grab) <- Streams.listOutputStream
os          <- Streams.builderStream os0
Streams.connect is os >> grab

ghci> example
["the quick brown fox","","jumped over the"]
Convenience functions for building pandoc documents programmatically. Example of use (with OverloadedStrings pragma):
import Text.Pandoc.Builder

myDoc :: Pandoc
myDoc = setTitle "My title" $ doc $
para "This is the first paragraph" <>
para ("And " <> emph "another" <> ".") <>
bulletList [ para "item one" <> para "continuation"
, plain ("item two and a " <>
link "/url" "go to url" "link")
]
Isn't that nicer than writing the following?
import Text.Pandoc.Definition
import Data.Map (fromList)

myDoc :: Pandoc
myDoc = Pandoc (Meta {unMeta = fromList [("title",
MetaInlines [Str "My",Space,Str "title"])]})
[Para [Str "This",Space,Str "is",Space,Str "the",Space,Str "first",
Space,Str "paragraph"],Para [Str "And",Space,Emph [Str "another"],
Str "."]
,BulletList [
[Para [Str "item",Space,Str "one"]
,Para [Str "continuation"]]
,[Plain [Str "item",Space,Str "two",Space,Str "and",Space,
Str "a",Space,Link nullAttr [Str "link"] ("/url","go to url")]]]]
And of course, you can use Haskell to define your own builders:
import Text.Pandoc.Builder
import Text.JSON
import Control.Arrow ((***))
import Data.Monoid (mempty)

-- | Converts a JSON document into 'Blocks'.
json :: String -> Blocks
json x =
case decode x of
Ok y    -> jsValueToBlocks y
Error y -> error y
where jsValueToBlocks x =
case x of
JSNull         -> mempty
JSBool x       -> plain $ text $ show x
JSRational _ x -> plain $ text $ show x
JSString x     -> plain $ text $ fromJSString x
JSArray xs     -> bulletList $ map jsValueToBlocks xs
JSObject x     -> definitionList $
map (text *** (:[]) . jsValueToBlocks) $
fromJSObject x
Convert a stream of bytestring Builders into a stream of ByteStrings. Adapted from blaze-builder-enumerator, written by Michael Snoyman and Simon Meier. Note that the functions here can work in any monad built on top of IO or ST. Also provides toByteStringIO* like Blaze.ByteString.Builders, for Data.ByteString.Builder. Since 0.1.9
A Builder is an efficient way to build lazy Text values. There are several functions for constructing builders, but only one to inspect them: to extract any data, you have to turn them into lazy Text values using toLazyText. Internally, a builder constructs a lazy Text by filling arrays piece by piece. As each buffer is filled, it is 'popped' off, to become a new chunk of the resulting lazy Text. All this is hidden from the user of the Builder.