Stream package:streaming

an elementary streaming prelude and general stream type. This package contains two modules, Streaming and Streaming.Prelude. The principal module, Streaming.Prelude, exports an elementary streaming prelude focused on a simple "source" or "producer" type, namely Stream (Of a) m r. This is a sort of effectful version of ([a],r) in which successive elements of type a arise from some sort of monadic action before the succession ends with a value of type r. Everything in the library is organized to make programming with this type as simple as possible, by the simple expedient of making it as close to Prelude and Data.List as possible. Thus for example the trivial program

>>> S.sum $ S.take 3 (S.readLn :: Stream (Of Int) IO ())
1<Enter>
2<Enter>
3<Enter>
6 :> ()

sums the first three valid integers from user input. Similarly,

>>> S.stdoutLn $ S.map (map toUpper) $ S.take 2 S.stdinLn
hello<Enter>
HELLO
world!<Enter>
WORLD!

upper-cases the first two lines from stdin as they arise, and sends them to stdout. And so on, with filtering, mapping, breaking, chunking, zipping, unzipping, replicating and so forth: we program with streams of Ints or Strings directly as if they constituted something like a list. That's because streams really do constitute something like a list, and the associated operations can mostly have the same names. (A few, like reverse, don't stream and thus disappear; others like unzip are here given properly streaming formulation for the first time.) And we everywhere oppose "extracting a pure list from IO", which is the origin of typical Haskell memory catastrophes. Basically any case where you are tempted to use mapM, replicateM, traverse or sequence with Haskell lists, you would do better to use something like Stream (Of a) m r. The type signatures are a little fancier, but the programs themselves are mostly the same. In fact, they are mostly simpler. Thus, consider the trivial demo program mentioned in this SO question

main = mapM newIORef [1..10^8::Int] >>= mapM readIORef >>= mapM_ print

The new user notices that this exhausts memory, and worries about the efficiency of Haskell IORefs. But of course it exhausts memory! Look what it says! The problem is immediately cured by writing

main = S.print $ S.mapM readIORef $ S.mapM newIORef $ S.each [1..10^8::Int]

which really does what the other program was meant to do, uses no more memory than hello-world, and is simpler anyway, since it doesn't involve the detour of "extracting a list from IO". Almost every use of list mapM, replicateM, traverse and sequence produces this problem on a smaller scale. People get used to it, as if it were characteristic of Haskell programs to use a lot of memory. But in truth "extracting a list or sequence from IO" is mostly just bad practice pure and simple. Of course, mapM, replicateM, traverse and sequence make sense for lists, under certain conditions! But unsafePerformIO also makes sense under certain conditions. The Streaming module exports the general type, Stream f m r, which can be used to stream successive distinct steps characterized by any functor f, though we are mostly interested in organizing computations of the form Stream (Of a) m r. The streaming-IO libraries have various devices for dealing with effectful variants of [a] or ([a],r) in which the emergence of successive elements somehow depends on IO. But it is only with the general type Stream f m r, or some equivalent, that one can envisage (for example) the connected streaming of their sorts of stream - as one makes lists of lists in the Haskell Prelude and Data.List. One needs some such type if we are to express properly streaming equivalents of e.g.

group :: Ord a => [a] -> [[a]]
chunksOf :: Int -> [a] -> [[a]]
lines :: [Char] -> [[Char]] -- but similarly with byte streams, etc.

to mention a few obviously desirable operations. (This is explained more elaborately in the readme below.) One could of course throw something like the present Stream type on top of a prior stream concept: this is how pipes and pipes-group (which are very much our model here) use FreeT. But once one grasps the iterable stream concept needed to express those functions then one will also see that, with it, one is already in possession of a complete elementary streaming library - since one possesses Stream ((,) a) m r or equivalently Stream (Of a) m r. This is the type of a 'generator' or 'producer' or 'source' or whatever you call an effectful stream of items. The present Streaming.Prelude is thus the simplest streaming library that can replicate anything like the API of the Prelude and Data.List. The emphasis of the library is on interoperation; for the rest its advantages are: extreme simplicity, re-use of intuitions the user has gathered from mastery of Prelude and Data.List, and a total and systematic rejection of type synonyms. The two conceptual pre-requisites are some comprehension of monad transformers and some familiarity with 'rank 2 types'. It is hoped that experimentation with this simple material, starting with the ghci examples in Streaming.Prelude, will give people who are new to these concepts some intuition about their importance. The most fundamental purpose of the library is to express elementary streaming ideas without reliance on a complex framework, but in a way that integrates transparently with the rest of Haskell, using ideas - e.g. rank 2 types, which are here implicit or explicit in most mapping - that the user can carry elsewhere, rather than chaining her understanding to the curiosities of a so-called streaming IO framework (as necessary as that is for certain purposes.) See the readme below for further explanation, including the examples linked there. Elementary usage can be divined from the ghci examples in Streaming.Prelude and perhaps from this rough beginning of a tutorial. Note also the streaming bytestring and streaming utils packages. Questions about usage can be put raised on StackOverflow with the tag [haskell-streaming], or as an issue on Github, or on the pipes list (the package understands itself as part of the pipes 'ecosystem'.) The simplest form of interoperation with pipes is accomplished with this isomorphism:

Pipes.unfoldr Streaming.next        :: Stream (Of a) m r   -> Producer a m r
Streaming.unfoldr Pipes.next        :: Producer a m r      -> Stream (Of a) m r

Interoperation with io-streams is thus:

Streaming.reread IOStreams.read     :: InputStream a       -> Stream (Of a) IO ()
IOStreams.unfoldM Streaming.uncons  :: Stream (Of a) IO () -> IO (InputStream a)

With conduit one might use, e.g.:

Conduit.unfoldM Streaming.uncons                        :: Stream (Of a) m () -> Source m a
\str -> Streaming.mapM_ Conduit.yield (hoist lift str)  :: Stream (Of o) m r  -> ConduitM i o m r
\src -> hoist lift str $$ Conduit.mapM_ Streaming.yield :: Source m a         -> Stream (Of a) m ()

These conversions should never be more expensive than a single >-> or =$=. The simplest interoperation with regular Haskell lists is provided by, say

Streaming.each                                 :: [a]               -> Stream (Of a) m ()
Streaming.toList_                              :: Stream (Of a) m r -> m [a]

The latter of course accumulates the whole list in memory, and is mostly what we are trying to avoid. Every use of Prelude.mapM f should be reconceived as using the composition Streaming.toList_ . Streaming.mapM f . Streaming.each with a view to considering whether the accumulation required by Streaming.toList_ is really necessary. Here are the results of some microbenchmarks based on the benchmarks included in the machines package: Because these are microbenchmarks for individual functions, they represent a sort of "worst case"; many other factors can influence the speed of a complex program.

streamBuild :: (forall b . (r -> b) -> (m b -> b) -> (f b -> b) -> b) -> Stream f m r

streaming Streaming Streaming.Internal

Reflect a church-encoded stream; cp. GHC.Exts.build

streamFold return_ effect_ step_ (streamBuild psi) = psi return_ effect_ step_

streamFold :: (Functor f, Monad m) => (r -> b) -> (m b -> b) -> (f b -> b) -> Stream f m r -> b

streaming Streaming Streaming.Internal

streamFold reorders the arguments of destroy to be more akin to foldr It is more convenient to query in ghci to figure out what kind of 'algebra' you need to write.

>>> :t streamFold return join
(Monad m, Functor f) =>
(f (m a) -> m a) -> Stream f m a -> m a        -- iterT

>>> :t streamFold return (join . lift)
(Monad m, Monad (t m), Functor f, MonadTrans t) =>
(f (t m a) -> t m a) -> Stream f m a -> t m a  -- iterTM

>>> :t streamFold return effect
(Monad m, Functor f, Functor g) =>
(f (Stream g m r) -> Stream g m r) -> Stream f m r -> Stream g m r

>>> :t \f -> streamFold return effect (wrap . f)
(Monad m, Functor f, Functor g) =>
(f (Stream g m a) -> g (Stream g m a))
-> Stream f m a -> Stream g m a                 -- maps

>>> :t \f -> streamFold return effect (effect . fmap wrap . f)
(Monad m, Functor f, Functor g) =>
(f (Stream g m a) -> m (g (Stream g m a)))
-> Stream f m a -> Stream g m a                 -- mapped

streamFold done eff construct
= eff . iterT (return . construct . fmap eff) . fmap done