Builder is:module

Builders are used to efficiently construct sequences of bytes from smaller parts. Typically, such a construction is part of the implementation of an encoding, i.e., a function for converting Haskell values to sequences of bytes. Examples of encodings are the generation of the sequence of bytes representing a HTML document to be sent in a HTTP response by a web application or the serialization of a Haskell value using a fixed binary format. For an efficient implementation of an encoding, it is important that (a) little time is spent on converting the Haskell values to the resulting sequence of bytes and (b) that the representation of the resulting sequence is such that it can be consumed efficiently. Builders support (a) by providing an O(1) concatentation operation and efficient implementations of basic encodings for Chars, Ints, and other standard Haskell values. They support (b) by providing their result as a LazyByteString, which is internally just a linked list of pointers to chunks of consecutive raw memory. LazyByteStrings can be efficiently consumed by functions that write them to a file or send them over a network socket. Note that each chunk boundary incurs expensive extra work (e.g., a system call) that must be amortized over the work spent on consuming the chunk body. Builders therefore take special care to ensure that the average chunk size is large enough. The precise meaning of large enough is application dependent. The current implementation is tuned for an average chunk size between 4kb and 32kb, which should suit most applications. As a simple example of an encoding implementation, we show how to efficiently convert the following representation of mixed-data tables to an UTF-8 encoded Comma-Separated-Values (CSV) table.
data Cell = StringC String
| IntC Int
deriving( Eq, Ord, Show )

type Row   = [Cell]
type Table = [Row]
We use the following imports.
import qualified Data.ByteString.Lazy               as L
import           Data.ByteString.Builder
import           Data.List                            (intersperse)
CSV is a character-based representation of tables. For maximal modularity, we could first render Tables as Strings and then encode this String using some Unicode character encoding. However, this sacrifices performance due to the intermediate String representation being built and thrown away right afterwards. We get rid of this intermediate String representation by fixing the character encoding to UTF-8 and using Builders to convert Tables directly to UTF-8 encoded CSV tables represented as LazyByteStrings.
encodeUtf8CSV :: Table -> L.LazyByteString
encodeUtf8CSV = toLazyByteString . renderTable

renderTable :: Table -> Builder
renderTable rs = mconcat [renderRow r <> charUtf8 '\n' | r <- rs]

renderRow :: Row -> Builder
renderRow []     = mempty
renderRow (c:cs) =
renderCell c <> mconcat [ charUtf8 ',' <> renderCell c' | c' <- cs ]

renderCell :: Cell -> Builder
renderCell (StringC cs) = renderString cs
renderCell (IntC i)     = intDec i

renderString :: String -> Builder
renderString cs = charUtf8 '"' <> foldMap escape cs <> charUtf8 '"'
where
escape '\\' = charUtf8 '\\' <> charUtf8 '\\'
escape '\"' = charUtf8 '\\' <> charUtf8 '\"'
escape c    = charUtf8 c
Note that the ASCII encoding is a subset of the UTF-8 encoding, which is why we can use the optimized function intDec to encode an Int as a decimal number with UTF-8 encoded digits. Using intDec is more efficient than stringUtf8 . show, as it avoids constructing an intermediate String. Avoiding this intermediate data structure significantly improves performance because encoding Cells is the core operation for rendering CSV-tables. See Data.ByteString.Builder.Prim for further information on how to improve the performance of renderString. We demonstrate our UTF-8 CSV encoding function on the following table.
strings :: [String]
strings =  ["hello", "\"1\"", "λ-wörld"]

table :: Table
table = [map StringC strings, map IntC [-3..3]]
The expression encodeUtf8CSV table results in the following lazy LazyByteString.
Chunk "\"hello\",\"\\\"1\\\"\",\"\206\187-w\195\182rld\"\n-3,-2,-1,0,1,2,3\n" Empty
We can clearly see that we are converting to a binary format. The 'λ' and 'ö' characters, which have a Unicode codepoint above 127, are expanded to their corresponding UTF-8 multi-byte representation. We use the criterion library (http://hackage.haskell.org/package/criterion) to benchmark the efficiency of our encoding function on the following table.
import Criterion.Main     -- add this import to the ones above

maxiTable :: Table
maxiTable = take 1000 $ cycle table

main :: IO ()
main = defaultMain
[ bench "encodeUtf8CSV maxiTable (original)" $
whnf (L.length . encodeUtf8CSV) maxiTable
]
On a Core2 Duo 2.20GHz on a 32-bit Linux, the above code takes 1ms to generate the 22'500 bytes long LazyByteString. Looking again at the definitions above, we see that we took care to avoid intermediate data structures, as otherwise we would sacrifice performance. For example, the following (arguably simpler) definition of renderRow is about 20% slower.
renderRow :: Row -> Builder
renderRow  = mconcat . intersperse (charUtf8 ',') . map renderCell
Similarly, using O(n) concatentations like ++ or the equivalent concat operations on strict and LazyByteStrings should be avoided. The following definition of renderString is also about 20% slower.
renderString :: String -> Builder
renderString cs = charUtf8 $ "\"" ++ concatMap escape cs ++ "\""
where
escape '\\' = "\\"
escape '\"' = "\\\""
escape c    = return c
Apart from removing intermediate data-structures, encodings can be optimized further by fine-tuning their execution parameters using the functions in Data.ByteString.Builder.Extra and their "inner loops" using the functions in Data.ByteString.Builder.Prim.
Warning: this is an internal module, and does not have a stable API or name. Functions in this module may not check or enforce preconditions expected by public modules. Use at your own risk! Efficient construction of lazy Text values. The principal operations on a Builder are singleton, fromText, and fromLazyText, which construct new builders, and mappend, which concatenates two builders. To get maximum performance when building lazy Text values using a builder, associate mappend calls to the right. For example, prefer
singleton 'a' `mappend` (singleton 'b' `mappend` singleton 'c')
to
singleton 'a' `mappend` singleton 'b' `mappend` singleton 'c'
as the latter associates mappend to the left.
Efficient construction of lazy Text values. The principal operations on a Builder are singleton, fromText, and fromLazyText, which construct new builders, and mappend, which concatenates two builders. To get maximum performance when building lazy Text values using a builder, associate mappend calls to the right. For example, prefer
singleton 'a' `mappend` (singleton 'b' `mappend` singleton 'c')
to
singleton 'a' `mappend` singleton 'b' `mappend` singleton 'c'
as the latter associates mappend to the left. Or, equivalently, prefer
singleton 'a' <> singleton 'b' <> singleton 'c'
since the <> from recent versions of Monoid associates to the right.
Efficient constructions of lazy bytestrings. This now re-exports Builder.
drastically, or be entirely removed, in a future release.
A set of helpers that should make construction of ParseErrors more concise. This is primarily useful in test suites and for debugging.
Blaze.ByteString.Builder is the main module, which you should import as a user of the blaze-builder library.
import Blaze.ByteString.Builder
It provides you with a type Builder that allows to efficiently construct lazy bytestrings with a large average chunk size. Intuitively, a Builder denotes the construction of a part of a lazy bytestring. Builders can either be created using one of the primitive combinators in Blaze.ByteString.Builder.Write or by using one of the predefined combinators for standard Haskell values (see the exposed modules of this package). Concatenation of builders is done using mappend from the Monoid typeclass. Here is a small example that serializes a list of strings using the UTF-8 encoding.
import Blaze.ByteString.Builder.Char.Utf8
strings :: [String]
strings = replicate 10000 "Hello there!"
The function fromString creates a Builder denoting the UTF-8 encoded argument. Hence, UTF-8 encoding and concatenating all strings can be done follows.
concatenation :: Builder
concatenation = mconcat $ map fromString strings
The function toLazyByteString can be used to execute a Builder and obtain the resulting lazy bytestring.
result :: L.ByteString
result = toLazyByteString concatenation
The result is a lazy bytestring containing 10000 repetitions of the string "Hello there!" encoded using UTF-8. The corresponding 120000 bytes are distributed among three chunks of 32kb and a last chunk of 6kb. A note on history. This serialization library was inspired by the Data.Binary.Builder module provided by the binary package. It was originally developed with the specific needs of the blaze-html package in mind. Since then it has been restructured to serve as a drop-in replacement for Data.Binary.Builder, which it improves upon both in speed as well as expressivity.
Convert a stream of blaze-builder Builders into a stream of ByteStrings. Works with both blaze-builder < 0.4's Builders and Builder. Adapted from blaze-builder-enumerator, written by myself and Simon Meier. Note that the functions here can work in any monad built on top of IO or ST. Since 1.1.7.0
Build an interface from an XML UI definition All functions in this module are only available in Gtk 2.12 or higher.
Low-level bytestring builders. Most users want to use the more type-safe Data.Csv.Incremental module instead.
Buffering for output streams based on bytestring builders. Buffering an output stream can often improve throughput by reducing the number of system calls made through the file descriptor. The bytestring package provides an efficient monoidal datatype used for serializing values directly to an output buffer, called a Builder, originally implemented in the blaze-builder package by Simon Meier. When compiling with bytestring versions older than 0.10.4, (i.e. GHC <= 7.6) users must depend on the bytestring-builder library to get the new builder implementation. Since we try to maintain compatibility with the last three GHC versions, the dependency on bytestring-builder can be dropped after the release of GHC 7.12. Using this module Given an OutputStream taking ByteString:
someOutputStream :: OutputStream ByteString
You create a new output stream wrapping the original one that accepts Builder values:
do
newStream <- Streams.builderStream someOutputStream
Streams.write (Just $ byteString "hello") newStream
....
You can flush the output buffer using flush:
....
Streams.write (Just flush) newStream
....
As a convention, builderStream will write the empty string to the wrapped OutputStream upon a builder buffer flush. Output streams which receive ByteString should either ignore the empty string or interpret it as a signal to flush their own buffers, as the handleToOutputStream and System.IO.Streams.Zlib functions do. Example
example :: IO [ByteString]
example = do
let l1 = intersperse " " ["the", "quick", "brown", "fox"]
let l2 = intersperse " " ["jumped", "over", "the"]
let l  = map byteString l1 ++ [flush] ++ map byteString l2
is          <- Streams.fromList l
(os0, grab) <- Streams.listOutputStream
os          <- Streams.builderStream os0
Streams.connect is os >> grab

ghci> example
["the quick brown fox","","jumped over the"]
Convenience functions for building pandoc documents programmatically. Example of use (with OverloadedStrings pragma):
import Text.Pandoc.Builder

myDoc :: Pandoc
myDoc = setTitle "My title" $ doc $
para "This is the first paragraph" <>
para ("And " <> emph "another" <> ".") <>
bulletList [ para "item one" <> para "continuation"
, plain ("item two and a " <>
link "/url" "go to url" "link")
]
Isn't that nicer than writing the following?
import Text.Pandoc.Definition
import Data.Map (fromList)

myDoc :: Pandoc
myDoc = Pandoc (Meta {unMeta = fromList [("title",
MetaInlines [Str "My",Space,Str "title"])]})
[Para [Str "This",Space,Str "is",Space,Str "the",Space,Str "first",
Space,Str "paragraph"],Para [Str "And",Space,Emph [Str "another"],
Str "."]
,BulletList [
[Para [Str "item",Space,Str "one"]
,Para [Str "continuation"]]
,[Plain [Str "item",Space,Str "two",Space,Str "and",Space,
Str "a",Space,Link nullAttr [Str "link"] ("/url","go to url")]]]]
And of course, you can use Haskell to define your own builders:
import Text.Pandoc.Builder
import Text.JSON
import Control.Arrow ((***))
import Data.Monoid (mempty)

-- | Converts a JSON document into 'Blocks'.
json :: String -> Blocks
json x =
case decode x of
Ok y    -> jsValueToBlocks y
Error y -> error y
where jsValueToBlocks x =
case x of
JSNull         -> mempty
JSBool x       -> plain $ text $ show x
JSRational _ x -> plain $ text $ show x
JSString x     -> plain $ text $ fromJSString x
JSArray xs     -> bulletList $ map jsValueToBlocks xs
JSObject x     -> definitionList $
map (text *** (:[]) . jsValueToBlocks) $
fromJSObject x
Convert a stream of bytestring Builders into a stream of ByteStrings. Adapted from blaze-builder-enumerator, written by Michael Snoyman and Simon Meier. Note that the functions here can work in any monad built on top of IO or ST. Also provides toByteStringIO* like Blaze.ByteString.Builders, for Data.ByteString.Builder. Since 0.1.9
Optimised encode function for Osc packets.
Build a lazy storable vector by incrementally adding an element. This is analogous to Data.Binary.Builder for Data.ByteString.Lazy. Attention: This implementation is still almost 3 times slower than constructing a lazy storable vector using unfoldr in our Chorus speed test.
A GtkBuilder reads XML descriptions of a user interface and instantiates the described objects. To create a GtkBuilder from a user interface description, call builderNewFromFile, builderNewFromResource or builderNewFromString. In the (unusual) case that you want to add user interface descriptions from multiple sources to the same GtkBuilder you can call builderNew to get an empty builder and populate it by (multiple) calls to builderAddFromFile, builderAddFromResource or builderAddFromString. A GtkBuilder holds a reference to all objects that it has constructed and drops these references when it is finalized. This finalization can cause the destruction of non-widget objects or widgets which are not contained in a toplevel window. For toplevel windows constructed by a builder, it is the responsibility of the user to call windowDestroy to get rid of them and all the widgets they contain. The functions builderGetObject and builderGetObjects can be used to access the widgets in the interface by the names assigned to them inside the UI description. Toplevel windows returned by these functions will stay around until the user explicitly destroys them with windowDestroy. Other widgets will either be part of a larger hierarchy constructed by the builder (in which case you should not have to worry about their lifecycle), or without a parent, in which case they have to be added to some container to make use of them. Non-widget objects need to be reffed with objectRef to keep them beyond the lifespan of the builder.

GtkBuilder UI Definitions

GtkBuilder parses textual descriptions of user interfaces which are specified in XML format. We refer to these descriptions as “GtkBuilder UI definitions” or just “UI definitions” if the context is clear.

Structure of UI definitions

UI definition files are always encoded in UTF-8. The toplevel element is <interface>. It optionally takes a “domain” attribute, which will make the builder look for translated strings using dgettext() in the domain specified. This can also be done by calling builderSetTranslationDomain on the builder. For example:

xml code

<?xml version="1.0" encoding="UTF-8">
<interface domain="your-app">
...
</interface>

Requirements

The target toolkit version(s) are described by <requires> elements, the “lib” attribute specifies the widget library in question (currently the only supported value is “gtk”) and the “version” attribute specifies the target version in the form “<major>.<minor>”. GtkBuilder will error out if the version requirements are not met. For example:

xml code

<?xml version="1.0" encoding="UTF-8">
<interface domain="your-app">
<requires lib="gtk" version="4.0" />
</interface>

Objects

Objects are defined as children of the <interface> element. Objects are described by <object> elements, which can contain <property> elements to set properties, <signal> elements which connect signals to handlers, and <child> elements, which describe child objects. Typically, the specific kind of object represented by an <object> element is specified by the “class” attribute. If the type has not been loaded yet, GTK tries to find the get_type() function from the class name by applying heuristics. This works in most cases, but if necessary, it is possible to specify the name of the get_type() function explicitly with the "type-func" attribute. If your UI definition is referencing internal types, you should make sure to call g_type_ensure() for each object type before parsing the UI definition. Objects may be given a name with the “id” attribute, which allows the application to retrieve them from the builder with builderGetObject. An id is also necessary to use the object as property value in other parts of the UI definition. GTK reserves ids starting and ending with ___ (three consecutive underscores) for its own purposes.

Properties

Setting properties of objects is pretty straightforward with the <property> element: the “name” attribute specifies the name of the property, and the content of the element specifies the value:

xml code

<object class="GtkButton">
<property name="label">Hello, world</property>
</object>
If the “translatable” attribute is set to a true value, GTK uses gettext() (or dgettext() if the builder has a translation domain set) to find a translation for the value. This happens before the value is parsed, so it can be used for properties of any type, but it is probably most useful for string properties. It is also possible to specify a context to disambiguate short strings, and comments which may help the translators:

xml code

<object class="GtkButton">
<property name="label" translatable="yes" context="button">Hello, world</property>
</object>
GtkBuilder can parse textual representations for the most common property types:
  • characters
  • strings
  • integers
  • floating-point numbers
  • booleans (strings like “TRUE”, “t”, “yes”, “y”, “1” are interpreted as true values, strings like “FALSE”, “f”, “no”, “n”, “0” are interpreted as false values)
  • enumeration types (can be specified by their full C identifier their short name used when registering the enumeration type, or their integer value)
  • flag types (can be specified by their C identifier, short name, integer value, and optionally combined with “|” for bitwise OR, e.g. “GTK_INPUT_HINT_EMOJI|GTK_INPUT_HINT_LOWERCASE”, or “emoji|lowercase”)
  • colors (in a format understood by rGBAParse)
  • GVariant (can be specified in the format understood by variantParse)
  • pixbufs (can be specified as an object id, a resource path or a filename of an image file to load relative to the Builder file or the CWD if builderAddFromString was used)
  • GFile (like pixbufs, can be specified as an object id, a URI or a filename of a file to load relative to the Builder file or the CWD if builderAddFromString was used)
Objects can be referred to by their name and by default refer to objects declared in the local XML fragment and objects exposed via builderExposeObject. In general, GtkBuilder allows forward references to objects declared in the local XML; an object doesn’t have to be constructed before it can be referred to. The exception to this rule is that an object has to be constructed before it can be used as the value of a construct-only property.

Child objects

Many widgets have properties for child widgets, such as Expander:child. In this case, the preferred way to specify the child widget in a ui file is to simply set the property:

xml code

<object class="GtkExpander">
<property name="child">
<object class="GtkLabel">
...
</object>
</property>
</object>
Generic containers that can contain an arbitrary number of children, such as Box instead use the <child> element. A <child> element contains an <object> element which describes the child object. Most often, child objects are widgets inside a container, but they can also be, e.g., actions in an action group, or columns in a tree model. Any object type that implements the Buildable interface can specify how children may be added to it. Since many objects and widgets that are included with GTK already implement the GtkBuildable interface, typically child objects can be added using the <child> element without having to be concerned about the underlying implementation. See the `GtkWidget` documentation for many examples of using GtkBuilder with widgets, including setting child objects using the <child> element. A noteworthy special case to the general rule that only objects implementing GtkBuildable may specify how to handle the <child> element is that GtkBuilder provides special support for adding objects to a ListStore by using the <child> element. For instance:

xml code

<object class="GListStore">
<property name="item-type">MyObject</property>
<child>
<object class="MyObject" />
</child>
...
</object>

Property bindings

It is also possible to bind a property value to another object's property value using the attributes "bind-source" to specify the source object of the binding, and optionally, "bind-property" and "bind-flags" to specify the source property and source binding flags respectively. Internally, GtkBuilder implements this using Binding objects. For instance, in the example below the “label” property of the bottom_label widget is bound to the “label” property of the top_button widget:

xml code

<object class="GtkBox">
<property name="orientation">vertical</property>
<child>
<object class="GtkButton" id="top_button">
<property name="label">Hello, world</property>
</object>
</child>
<child>
<object class="GtkLabel" id="bottom_label">
<property name="label"
bind-source="top_button"
bind-property="label"
bind-flags="sync-create" />
</object>
</child>
</object>
For more information, see the documentation of the objectBindProperty method. Please note that another way to set up bindings between objects in .ui files is to use the GtkExpression methodology. See the `GtkExpression` documentation for more information.

Internal children

Sometimes it is necessary to refer to widgets which have implicitly been constructed by GTK as part of a composite widget, to set properties on them or to add further children (e.g. the content area of a GtkDialog). This can be achieved by setting the “internal-child” property of the <child> element to a true value. Note that GtkBuilder still requires an <object> element for the internal child, even if it has already been constructed.

Specialized children

A number of widgets have different places where a child can be added (e.g. tabs vs. page content in notebooks). This can be reflected in a UI definition by specifying the “type” attribute on a <child> The possible values for the “type” attribute are described in the sections describing the widget-specific portions of UI definitions.

Signal handlers and function pointers

Signal handlers are set up with the <signal> element. The “name” attribute specifies the name of the signal, and the “handler” attribute specifies the function to connect to the signal.

xml code

<object class="GtkButton" id="hello_button">
<signal name="clicked" handler="hello_button__clicked" />
</object>
The remaining attributes, “after”, “swapped” and “object”, have the same meaning as the corresponding parameters of the GObject.signal_connect_object or GObject.signal_connect_data functions:
  • “after” matches the G_CONNECT_AFTER flag, and will ensure that the handler is called after the default class closure for the signal
  • “swapped” matches the G_CONNECT_SWAPPED flag, and will swap the instance and closure arguments when invoking the signal handler
  • “object” will bind the signal handler to the lifetime of the object referenced by the attribute
By default "swapped" will be set to "yes" if not specified otherwise, in the case where "object" is set, for convenience. A “last_modification_time” attribute is also allowed, but it does not have a meaning to the builder. When compiling applications for Windows, you must declare signal callbacks with the G_MODULE_EXPORT decorator, or they will not be put in the symbol table:

c code

G_MODULE_EXPORT void
hello_button__clicked (GtkButton *button,
gpointer data)
{
// ...
}
On Linux and Unix, this is not necessary; applications should instead be compiled with the -Wl,--export-dynamic argument inside their compiler flags, and linked against gmodule-export-2.0.

Example UI Definition

xml code

<interface>
<object class="GtkDialog" id="dialog1">
<child internal-child="content_area">
<object class="GtkBox">
<child internal-child="action_area">
<object class="GtkBox">
<child>
<object class="GtkButton" id="ok_button">
<property name="label" translatable="yes">_Ok</property>
<property name="use-underline">True</property>
<signal name="clicked" handler="ok_button_clicked"/>
</object>
</child>
</object>
</child>
</object>
</child>
</object>
</interface>

Using GtkBuildable for extending UI definitions

Objects can implement the Buildable interface to add custom elements and attributes to the XML. Typically, any extension will be documented in each type that implements the interface.

Templates

When describing a Widget, you can use the <template> tag to describe a UI bound to a specific widget type. GTK will automatically load the UI definition when instantiating the type, and bind children and signal handlers to instance fields and function symbols. For more information, see the `GtkWidget` documentation for details.
Block builder