fork package:pooled-io

This runs an action parallelly to the starting thread. Since it is an Applicative Functor and not a Monad, there are no data dependencies between the actions and thus all actions in a T can be run parallelly. Only the IO actions are parallelised but not the combining function passed to liftA2 et.al. That is, the main work must be done in the IO actions in order to benefit from parallelisation.
fork runs an IO action in parallel while respecting a maximum number of threads. Evaluating the result of T waits for the termination of the according thread. Unfortunately, this means that sometimes threads are bored:
foo a b = do
c <- fork $ f a
d <- fork $ g c
e <- fork $ h b
Here the execution of g c reserves a thread but starts with waiting for the evaluation of c. It would be certainly better to execute h b first. You may relax this problem by moving dependent actions away from another as much as possible. It would be optimal to have an OutOfOrder monad, but this is more difficult to implement. Although we fork all actions in order, the fork itself might re-order the actions. Thus the actions must not rely on a particular order other than the order imposed by data dependencies. We enforce with the NFData constraint that the computation is actually completed when the thread terminates. Currently the monad does not handle exceptions. It's certainly best to use a package with explicit exception handling like explicit-exception in order to tunnel exception information from the forked action to the main thread. Although fork has almost the same type signature as liftIO we do not define instance MonadIO InOrder.T since this definition would not satisfy the laws required by the MonadIO class.