parser is:module

drastically, or be entirely removed, in a future release.
This module provides the generated Happy parser for Haskell. It exports a number of parsers which may be used in any library that uses the GHC API. A common usage pattern is to initialize the parser state with a given string and then parse that string:
runParser :: ParserOpts -> String -> P a -> ParseResult a
runParser opts str parser = unP parser parseState
where
filename = "<interactive>"
location = mkRealSrcLoc (mkFastString filename) 1 1
buffer = stringToStringBuffer str
parseState = initParserState opts buffer location
Parsers for unit/module identifiers
Annotated parser for Haskell with extensions.
A CSV parser. The parser defined here is RFC 4180 compliant, with the following extensions:
  • Empty lines are ignored.
  • Non-escaped fields may contain any characters except double-quotes, commas, carriage returns, and newlines.
  • Escaped fields may contain any characters (but double-quotes need to be escaped).
The functions in this module can be used to implement e.g. a resumable parser that is fed input incrementally.
The Parser monad.
This module contains Dhall's parsing logic
The class XmlContent is a kind of replacement for Read and Show: it provides conversions between a generic XML tree representation and your own more specialised typeful Haskell data trees. If you are starting with a set of Haskell datatypes, use DrIFT to derive instances of this class for you: http://repetae.net/john/computer/haskell/DrIFT If you are starting with an XML DTD, use HaXml's tool DtdToHaskell to generate both the Haskell types and the corresponding instances. This unified class interface replaces two previous (somewhat similar) classes: Haskell2Xml and Xml2Haskell. There was no real reason to have separate classes depending on how you originally defined your datatypes. However, some instances for basic types like lists will depend on which direction you are using. See Text.XML.HaXml.XmlContent and Text.XML.HaXml.XmlContent.Haskell.
A custom parsing monad, optimized for speed.
Haskell parser.
Efficiently and correctly parse a JSON string. The string must be encoded as UTF-8. It can be useful to think of parsing as occurring in two phases:
  • Identification of the textual boundaries of a JSON value. This is always strict, so that an invalid JSON document can be rejected as soon as possible.
  • Conversion of a JSON value to a Haskell value. This may be either immediate (strict) or deferred (lazy); see below for details.
The question of whether to choose a lazy or strict parser is subtle, but it can have significant performance implications, resulting in changes in CPU use and memory footprint of 30% to 50%, or occasionally more. Measure the performance of your application with each!
This module contains the definitions for a generic parser, without running state. These are the parts that are shared between the Plain and Lazy variations. Do not import this module directly, but only via T.P.Poly.Plain or T.P.Poly.Lazy.
Parsers are more powerful but less general than Folds:
  • folds cannot fail but parsers can fail and backtrack.
  • folds can be composed as a Tee but parsers cannot.
  • folds can be converted to parsers.
Streamly parsers support all operations offered by popular Haskell parser libraries. Unlike other parser libraries, (1) streamly parsers can operate on any Haskell type as input - not just bytes, (2) natively support streaming, (3) and are faster.

High Performance by Static Parser Fusion

Like folds, parsers are designed to utilize stream fusion, compiling to efficient low-level code comparable to the speed of C. Parsers are suitable for high-performance parsing of streams. Operations in this module are designed to be composed statically rather than dynamically. They are inlined to enable static fusion. More importantly, they are not designed to be used recursively. Recursive use will break fusion and lead to quadratic performance slowdown. For dynamic and recursive compositions use the continuation passing style (CPS) operations from the Streamly.Data.ParserK module. Parser and ParserK types are interconvertible.

How to parse a stream?

Parser combinators can be used to create a pipeline of parsers such that the next parser consumes the result of the previous parser. Such a composed pipeline of parsers can then be driven by one of many parser drivers available in the Stream and Array modules. Use Streamly.Data.Stream.parse or Streamly.Data.Stream.parseBreak to run a parser on an input stream and return the parsed result. Use Streamly.Data.Stream.parseMany or Streamly.Data.Stream.parseIterate to transform an input data stream to an output stream of parsed data elements using a parser.

Parser vs ParserK

There are two functionally equivalent parsing modules, Streamly.Data.Parser (this module) and Streamly.Data.ParserK. The latter is a CPS based wrapper over the former, and can be used for parsing in general. Streamly.Data.Parser enables stream fusion and where possible it should be preferred over Streamly.Data.ParserK for high performance stream parsing use cases. However, there are a few cases where this module is not suitable and ParserK should be used instead. As a thumb rule, when recursion or heavy nesting is needed use ParserK.

Parser: suitable for non-recursive static fusion

The Parser type is suitable only for non-recursive static fusion. It could be problematic for recursive definitions. To enable static fusion, parser combinators use strict pattern matching on arguments of type Parser. This leads to infinte loop when a parser is defined recursively, due to strict evaluation of the recursive call. For example, the following implementation loops infinitely because of the recursive use of parser p in the *> combinator:
>>> import Streamly.Data.Parser (Parser)

>>> import qualified Streamly.Data.Fold as Fold

>>> import qualified Streamly.Data.Parser as Parser

>>> import qualified Streamly.Data.Stream as Stream

>>> import Control.Applicative ((<|>))
>>> :{

>>> p, p1, p2 :: Monad m => Parser Char m String

>>> p1 = Parser.satisfy (== '(') *> p

>>> p2 = Parser.fromFold Fold.toList

>>> p = p1 <|> p2

>>> :}
Another limitation of Parser type quadratic performance slowdown when too many nested compositions are used. Especially Applicative, Monad, Alternative instances, and sequenced parsing operations (e.g. nested one, and splitWith) exhibit quadratic slowdown (O(n^2) complexity) when combined n times, roughly 8 or less sequenced parsers usually work fine. READ THE DOCS OF APPLICATIVE, MONAD AND ALTERNATIVE INSTANCES.

ParserK: suitable for recursive definitions

ParserK is suitable for recursive definitions:
>>> import Streamly.Data.ParserK (ParserK)

>>> import Streamly.Data.StreamK (toParserK)

>>> import qualified Streamly.Data.StreamK as StreamK
>>> :{

>>> p, p1, p2 :: Monad m => ParserK Char m String

>>> p1 = toParserK (Parser.satisfy (== '(')) *> p

>>> p2 = toParserK (Parser.fromFold Fold.toList)

>>> p = p1 <|> p2

>>> :}
>>> StreamK.parse p $ StreamK.fromStream $ Stream.fromList "hello"
Right "hello"
For this reason Applicative, Alternative or Monad compositions with recursion cannot be used with the Parser type. Alternative type class based operations like asum and Alternative based generic parser combinators use recursion. Similarly, Applicative type class based operations like sequence use recursion. Custom implementations of many such operations are provided in this module (e.g. some, many), and those should be used instead.

Parsers Galore!

Streamly provides all the parsing functionality provided by popular parsing libraries, and much more with higher performance. This module provides most of the elementary parsers and parser combinators. Additionally,

Generic Parser Combinators

With ParserK you can use the Applicative and Alternative type class based generic parser combinators from the parser-combinators library or similar. However, if available, we recommend that you use the equivalent functionality from this module where performance and streaming behavior matters. Firstly, the combinators in this module are faster due to stream fusion. Secondly, these are streaming in nature as the results can be passed directly to other stream consumers (folds or parsers). The Alternative type class based parsers would end up buffering all the results in lists before they can be consumed.

Error Reporting

There are two types of parser drivers available, parse and parseBreak drivers do not track stream position, whereas parsePos and parseBreakPos drivers track and report stream position information with slightly more performance overhead. When an error occurs the stream position is reported, in case of byte streams or unboxed array streams this is the byte position, in case of generic element parsers or generic array parsers this is the element position in the stream. These parsers do not report a case specific error context (e.g. line number or column). If you need line number or column information you can read the stream again (if it is immutable) and this count the lines to translate the reported byte position to line number and column. More elaborate support for building arbitrary and custom error context information is planned to be added in future.

Monad Transformer Stack

MonadTrans instance is not provided. If the Parser type is the top most layer (which should be the case almost always) you can just use fromEffect to execute the lower layer monad effects.

Experimental APIs

Please refer to Streamly.Internal.Data.Parser for functions that have not yet been released.
Decode Haskell data types from byte streams. It would be inefficient to use this to compose parsers for general algebraic data types. For general deserialization of ADTs please use the Serialize type class instances. The fastest way to deserialize byte streams representing Haskell data types is to write them to arrays and deserialize the array using the Serialize type class.
To parse a text input, use the decode routines from Streamly.Unicode.Stream module to convert an input byte stream to a Unicode Char stream and then use these parsers on the Char stream.
To parse a text input, use the decode routines from Streamly.Unicode.Stream module to convert an input byte stream to a Unicode Char stream and then use these parsers on the Char stream.
Parser for text to TOML AST.