parser is:module

drastically, or be entirely removed, in a future release.
This module provides the generated Happy parser for Haskell. It exports a number of parsers which may be used in any library that uses the GHC API. A common usage pattern is to initialize the parser state with a given string and then parse that string:
runParser :: ParserOpts -> String -> P a -> ParseResult a
runParser opts str parser = unP parser parseState
where
filename = "<interactive>"
location = mkRealSrcLoc (mkFastString filename) 1 1
buffer = stringToStringBuffer str
parseState = initParserState opts buffer location
Parsers for unit/module identifiers
Annotated parser for Haskell with extensions.
A CSV parser. The parser defined here is RFC 4180 compliant, with the following extensions:
  • Empty lines are ignored.
  • Non-escaped fields may contain any characters except double-quotes, commas, carriage returns, and newlines.
  • Escaped fields may contain any characters (but double-quotes need to be escaped).
The functions in this module can be used to implement e.g. a resumable parser that is fed input incrementally.
The Parser monad.
This module contains Dhall's parsing logic
The class XmlContent is a kind of replacement for Read and Show: it provides conversions between a generic XML tree representation and your own more specialised typeful Haskell data trees. If you are starting with a set of Haskell datatypes, use DrIFT to derive instances of this class for you: http://repetae.net/john/computer/haskell/DrIFT If you are starting with an XML DTD, use HaXml's tool DtdToHaskell to generate both the Haskell types and the corresponding instances. This unified class interface replaces two previous (somewhat similar) classes: Haskell2Xml and Xml2Haskell. There was no real reason to have separate classes depending on how you originally defined your datatypes. However, some instances for basic types like lists will depend on which direction you are using. See Text.XML.HaXml.XmlContent and Text.XML.HaXml.XmlContent.Haskell.
A custom parsing monad, optimized for speed.
Haskell parser.
Efficiently and correctly parse a JSON string. The string must be encoded as UTF-8. It can be useful to think of parsing as occurring in two phases:
  • Identification of the textual boundaries of a JSON value. This is always strict, so that an invalid JSON document can be rejected as soon as possible.
  • Conversion of a JSON value to a Haskell value. This may be either immediate (strict) or deferred (lazy); see below for details.
The question of whether to choose a lazy or strict parser is subtle, but it can have significant performance implications, resulting in changes in CPU use and memory footprint of 30% to 50%, or occasionally more. Measure the performance of your application with each!
Parser for text to TOML AST.
This module contains the definitions for a generic parser, without running state. These are the parts that are shared between the Plain and Lazy variations. Do not import this module directly, but only via T.P.Poly.Plain or T.P.Poly.Lazy.
Parse non-resumable sequence of bytes. To parse a byte sequence as text, use the Ascii, Latin, and Utf8 modules instead. Functions for parsing decimal-encoded numbers are found in those modules.
Parsers are stream consumers like folds with the following differences:
  • folds cannot fail but parsers can fail and backtrack.
  • folds can be composed as a Tee but parsers cannot.
  • folds can be used for scanning but parsers cannot.
  • folds can be converted to parsers.
This module implements parsers with stream fusion which compile to efficient loops comparable to the speed of C.

Using Parsers

This module provides elementary parsers and parser combinators that can be used to parse a stream of data. Additionally, all the folds from the Streamly.Data.Fold module can be converted to parsers using fromFold. All the parsing functionality provided by popular parsing libraries, and more is available. Also see Streamly.Unicode.Parser module for Char stream parsers. A data stream can be transformed to a stream of parsed data elements. Parser combinators can be used to create a pipeline of folds or parsers such that the next fold or parser consumes the result of the previous parser. See parse and parseMany to run these parsers on a stream.

Parser vs ParserK

There are two functionally equivalent parsing modules, Streamly.Data.Parser (this module) and Streamly.Data.ParserK. The latter is a CPS based wrapper over the former, and can be used for parsing in general. Streamly.Data.Parser enables stream fusion and should be preferred over Streamly.Data.ParserK for high performance stream parsing use cases. However, there are a few cases where this module is not suitable and ParserK should be used instead. For static fusion, parser combinators have to use strict pattern matching on arguments of type Parser. This leads to infinte loop when a parser is defined recursively, due to strict evaluation of the recursive call. For example, the following implementation loops infinitely because of the recursive use of parser p in the *> combinator:
>>> import Streamly.Data.Parser (Parser)

>>> import qualified Streamly.Data.Fold as Fold

>>> import qualified Streamly.Data.Parser as Parser

>>> import qualified Streamly.Data.Stream as Stream

>>> import Control.Applicative ((<|>))
>>> :{

>>> p :: Monad m => Parser Char m String

>>> p = Parser.satisfy (== '(') *> p <|> Parser.fromFold Fold.toList

>>> :}
Use ParserK when recursive use is required:
>>> import Streamly.Data.ParserK (ParserK)

>>> import qualified Streamly.Data.StreamK as StreamK

>>> import qualified Streamly.Internal.Data.StreamK as StreamK (parse)

>>> import qualified Streamly.Internal.Data.ParserK as ParserK (adapt)
>>> :{

>>> p :: Monad m => ParserK Char m String

>>> p = ParserK.adapt (Parser.satisfy (== '(')) *> p <|> ParserK.adapt (Parser.fromFold Fold.toList)

>>> :}
>>> StreamK.parse p $ StreamK.fromStream $ Stream.fromList "hello"
Right "hello"
For this reason Applicative, Alternative or Monad compositions with recursion cannot be used with the Parser type. Alternative type class based operations like asum and Alternative based generic parser combinators use recursion. Similarly, Applicative type class based operations like sequence use recursion. Custom implementations of many such operations are provided in this module (e.g. some, many), and those should be used instead. Another limitation of Parser type is due to the quadratic complexity causing slowdown when too many nested compositions are used. Especially Applicative, Monad, Alternative instances, and sequenced parsing operations (e.g. nested one, and splitWith) degrade the performance quadratically (O(n^2)) when combined n times, roughly 8 or less sequenced parsers are fine. READ THE DOCS OF APPLICATIVE, MONAD AND ALTERNATIVE INSTANCES.

Streaming Parsers

With ParserK you can use the generic Alternative type class based parsers from the parser-combinators library or similar. However, we recommend that you use the equivalent functionality from this module for better performance and for streaming behavior. Firstly, the combinators in this module are faster due to stream fusion. Secondly, these are streaming in nature as the results can be passed directly to other stream consumers (folds or parsers). The Alternative type class based parsers would end up buffering all the results in lists before they can be consumed. When recursion or heavy nesting is needed use ParserK.

Error Reporting

These parsers do not report the error context (e.g. line number or column). This may be supported in future.

Monad Transformer Stack

MonadTrans instance is not provided. If the Parser type is the top most layer (which should be the case almost always) you can just use fromEffect to execute the lower layer monad effects.

Parser vs ParserK Implementation

The Parser type represents a stream consumer by composing state as data which enables stream fusion. Stream fusion generates a tight loop without any constructor allocations between the stages, providing C like performance for the loop. Stream fusion works when multiple functions are combined in a pipeline statically. Therefore, the operations in this module must be inlined and must not be used recursively to allow for stream fusion. The ParserK type represents a stream consumer by composing function calls, therefore, a function call overhead is incurred at each composition. It is quite fast in general but may be a few times slower than a fused parser. However, it allows for scalable dynamic composition especially parsers can be used in recursive calls. Using the ParserK type operations like splitWith provide linear (O(n)) performance with respect to the number of compositions.

Experimental APIs

Please refer to Streamly.Internal.Data.Parser for functions that have not yet been released.
Decode Haskell data types from byte streams. It would be inefficient to use this to compose parsers for general algebraic data types. For general deserialization of ADTs please use the Serialize type class instances. The fastest way to deserialize byte streams representing Haskell data types is to write them to arrays and deserialize the array using the Serialize type class.