Parsers are more powerful but less general than
Folds:
- folds cannot fail but parsers can fail and backtrack.
- folds can be composed as a Tee but parsers cannot.
- folds can be converted to parsers.
Streamly parsers support all operations offered by popular Haskell
parser libraries. Unlike other parser libraries, (1) streamly parsers
can operate on any Haskell type as input - not just bytes, (2)
natively support streaming, (3) and are faster.
High Performance by Static Parser Fusion
Like folds, parsers are designed to utilize stream fusion, compiling
to efficient low-level code comparable to the speed of C. Parsers are
suitable for high-performance parsing of streams.
Operations in this module are designed to be composed statically
rather than dynamically. They are inlined to enable static fusion.
More importantly, they are not designed to be used recursively.
Recursive use will break fusion and lead to quadratic performance
slowdown. For dynamic and recursive compositions use the continuation
passing style (CPS) operations from the
Streamly.Data.ParserK
module.
Parser and
ParserK types are interconvertible.
How to parse a stream?
Parser combinators can be used to create a pipeline of parsers such
that the next parser consumes the result of the previous parser. Such
a composed pipeline of parsers can then be driven by one of many
parser drivers available in the Stream and Array modules.
Use Streamly.Data.Stream.
parse or
Streamly.Data.Stream.
parseBreak to run a parser on an input
stream and return the parsed result.
Use Streamly.Data.Stream.
parseMany or
Streamly.Data.Stream.
parseIterate to transform an input data
stream to an output stream of parsed data elements using a parser.
Parser vs ParserK
There are two functionally equivalent parsing modules,
Streamly.Data.Parser (this module) and
Streamly.Data.ParserK. The latter is a CPS based wrapper over
the former, and can be used for parsing in general.
Streamly.Data.Parser enables stream fusion and where possible
it should be preferred over
Streamly.Data.ParserK for high
performance stream parsing use cases. However, there are a few cases
where this module is not suitable and ParserK should be used instead.
As a thumb rule, when recursion or heavy nesting is needed use
ParserK.
Parser: suitable for non-recursive static fusion
The
Parser type is suitable only for non-recursive static
fusion. It could be problematic for recursive definitions. To enable
static fusion, parser combinators use strict pattern matching on
arguments of type Parser. This leads to infinte loop when a parser is
defined recursively, due to strict evaluation of the recursive call.
For example, the following implementation loops infinitely because of
the recursive use of parser
p in the
*>
combinator:
>>> import Streamly.Data.Parser (Parser)
>>> import qualified Streamly.Data.Fold as Fold
>>> import qualified Streamly.Data.Parser as Parser
>>> import qualified Streamly.Data.Stream as Stream
>>> import Control.Applicative ((<|>))
>>> :{
>>> p, p1, p2 :: Monad m => Parser Char m String
>>> p1 = Parser.satisfy (== '(') *> p
>>> p2 = Parser.fromFold Fold.toList
>>> p = p1 <|> p2
>>> :}
Another limitation of Parser type quadratic performance slowdown when
too many nested compositions are used. Especially Applicative, Monad,
Alternative instances, and sequenced parsing operations (e.g. nested
one, and
splitWith) exhibit quadratic slowdown (O(n^2)
complexity) when combined
n times, roughly 8 or less
sequenced parsers usually work fine. READ THE DOCS OF APPLICATIVE,
MONAD AND ALTERNATIVE INSTANCES.
ParserK: suitable for recursive definitions
ParserK is suitable for recursive definitions:
>>> import Streamly.Data.ParserK (ParserK)
>>> import Streamly.Data.StreamK (toParserK)
>>> import qualified Streamly.Data.StreamK as StreamK
>>> :{
>>> p, p1, p2 :: Monad m => ParserK Char m String
>>> p1 = toParserK (Parser.satisfy (== '(')) *> p
>>> p2 = toParserK (Parser.fromFold Fold.toList)
>>> p = p1 <|> p2
>>> :}
>>> StreamK.parse p $ StreamK.fromStream $ Stream.fromList "hello"
Right "hello"
For this reason Applicative, Alternative or Monad compositions with
recursion cannot be used with the
Parser type. Alternative type
class based operations like
asum and Alternative based
generic parser combinators use recursion. Similarly, Applicative type
class based operations like
sequence use recursion. Custom
implementations of many such operations are provided in this module
(e.g.
some,
many), and those should be used instead.
Parsers Galore!
Streamly provides all the parsing functionality provided by popular
parsing libraries, and much more with higher performance. This module
provides most of the elementary parsers and parser combinators.
Additionally,
Generic Parser Combinators
With
ParserK you can use the
Applicative and
Alternative type class based generic parser combinators from
the
parser-combinators library or similar. However, if
available, we recommend that you use the equivalent functionality from
this module where performance and streaming behavior matters. Firstly,
the combinators in this module are faster due to stream fusion.
Secondly, these are streaming in nature as the results can be passed
directly to other stream consumers (folds or parsers). The Alternative
type class based parsers would end up buffering all the results in
lists before they can be consumed.
Error Reporting
There are two types of parser drivers available,
parse and
parseBreak drivers do not track stream position, whereas
parsePos and
parseBreakPos drivers track and report
stream position information with slightly more performance overhead.
When an error occurs the stream position is reported, in case of byte
streams or unboxed array streams this is the byte position, in case of
generic element parsers or generic array parsers this is the element
position in the stream.
These parsers do not report a case specific error context (e.g. line
number or column). If you need line number or column information you
can read the stream again (if it is immutable) and this count the
lines to translate the reported byte position to line number and
column. More elaborate support for building arbitrary and custom error
context information is planned to be added in future.
Monad Transformer Stack
MonadTrans instance is not provided. If the
Parser
type is the top most layer (which should be the case almost always)
you can just use
fromEffect to execute the lower layer monad
effects.
Experimental APIs
Please refer to
Streamly.Internal.Data.Parser for functions
that have not yet been released.