Parsers are stream consumers like folds with the following
differences:
- folds cannot fail but parsers can fail and backtrack.
- folds can be composed as a Tee but parsers cannot.
- folds can be used for scanning but parsers cannot.
- folds can be converted to parsers.
This module implements parsers with stream fusion which compile to
efficient loops comparable to the speed of C.
Using Parsers
This module provides elementary parsers and parser combinators that
can be used to parse a stream of data. Additionally, all the folds
from the
Streamly.Data.Fold module can be converted to parsers
using
fromFold. All the parsing functionality provided by
popular parsing libraries, and more is available. Also see
Streamly.Unicode.Parser module for Char stream parsers.
A data stream can be transformed to a stream of parsed data elements.
Parser combinators can be used to create a pipeline of folds or
parsers such that the next fold or parser consumes the result of the
previous parser. See
parse and
parseMany to run these
parsers on a stream.
Parser vs ParserK
There are two functionally equivalent parsing modules,
Streamly.Data.Parser (this module) and
Streamly.Data.ParserK. The latter is a CPS based wrapper over
the former, and can be used for parsing in general.
Streamly.Data.Parser enables stream fusion and should be
preferred over
Streamly.Data.ParserK for high performance
stream parsing use cases. However, there are a few cases where this
module is not suitable and ParserK should be used instead.
For static fusion, parser combinators have to use strict pattern
matching on arguments of type Parser. This leads to infinte loop when
a parser is defined recursively, due to strict evaluation of the
recursive call. For example, the following implementation loops
infinitely because of the recursive use of parser
p in the
*> combinator:
>>> import Streamly.Data.Parser (Parser)
>>> import qualified Streamly.Data.Fold as Fold
>>> import qualified Streamly.Data.Parser as Parser
>>> import qualified Streamly.Data.Stream as Stream
>>> import Control.Applicative ((<|>))
>>> :{
>>> p :: Monad m => Parser Char m String
>>> p = Parser.satisfy (== '(') *> p <|> Parser.fromFold Fold.toList
>>> :}
Use ParserK when recursive use is required:
>>> import Streamly.Data.ParserK (ParserK)
>>> import qualified Streamly.Data.StreamK as StreamK
>>> import qualified Streamly.Internal.Data.StreamK as StreamK (parse)
>>> import qualified Streamly.Internal.Data.ParserK as ParserK (adapt)
>>> :{
>>> p :: Monad m => ParserK Char m String
>>> p = ParserK.adapt (Parser.satisfy (== '(')) *> p <|> ParserK.adapt (Parser.fromFold Fold.toList)
>>> :}
>>> StreamK.parse p $ StreamK.fromStream $ Stream.fromList "hello"
Right "hello"
For this reason Applicative, Alternative or Monad compositions with
recursion cannot be used with the
Parser type. Alternative type
class based operations like
asum and Alternative based
generic parser combinators use recursion. Similarly, Applicative type
class based operations like
sequence use recursion. Custom
implementations of many such operations are provided in this module
(e.g.
some,
many), and those should be used instead.
Another limitation of Parser type is due to the quadratic complexity
causing slowdown when too many nested compositions are used.
Especially Applicative, Monad, Alternative instances, and sequenced
parsing operations (e.g. nested
one, and
splitWith)
degrade the performance quadratically (O(n^2)) when combined
n times, roughly 8 or less sequenced parsers are fine. READ
THE DOCS OF APPLICATIVE, MONAD AND ALTERNATIVE INSTANCES.
Streaming Parsers
With
ParserK you can use the generic Alternative type class
based parsers from the
parser-combinators library or similar.
However, we recommend that you use the equivalent functionality from
this module for better performance and for streaming behavior.
Firstly, the combinators in this module are faster due to stream
fusion. Secondly, these are streaming in nature as the results can be
passed directly to other stream consumers (folds or parsers). The
Alternative type class based parsers would end up buffering all the
results in lists before they can be consumed.
When recursion or heavy nesting is needed use ParserK.
Error Reporting
These parsers do not report the error context (e.g. line number or
column). This may be supported in future.
Monad Transformer Stack
MonadTrans instance is not provided. If the
Parser
type is the top most layer (which should be the case almost always)
you can just use
fromEffect to execute the lower layer monad
effects.
Parser vs ParserK Implementation
The
Parser type represents a stream consumer by composing state
as data which enables stream fusion. Stream fusion generates a tight
loop without any constructor allocations between the stages, providing
C like performance for the loop. Stream fusion works when multiple
functions are combined in a pipeline statically. Therefore, the
operations in this module must be inlined and must not be used
recursively to allow for stream fusion.
The
ParserK type represents a stream consumer by composing
function calls, therefore, a function call overhead is incurred at
each composition. It is quite fast in general but may be a few times
slower than a fused parser. However, it allows for scalable dynamic
composition especially parsers can be used in recursive calls. Using
the
ParserK type operations like
splitWith provide
linear (O(n)) performance with respect to the number of compositions.
Experimental APIs
Please refer to
Streamly.Internal.Data.Parser for functions
that have not yet been released.