Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Pipes lecture notes #5

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
246 changes: 246 additions & 0 deletions 10-pipes/pipes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,246 @@
% Pipes
% Paul Martinez
# Pipes 5/1/14

This guest lecture was given by Gabriel Gonzalez, creater of the Haskell library Pipes.

Consider the following functions:

~~~ {.haskell}
replicateM :: Monad m => Int -> ma -> m [a]
mapM :: monad m => (a -> m b) -> [a] -> m[b]
sequence :: Monad m => [m a] -> m [a]
~~~

These are three functions involving mapping over monads. A problem with this functions is that they don't return until everything has been processed, so you can't consume any results until everything has been processed. This is inefficient both time-wise and memory-wise and it also won't work for infinite lists.

A potential solution is lazy IO, but this is disappointing for a number of reasons. It only works for the IO monad, and it only works for sources of information, not sinks or transformations. A major problem is that it invalidates the equational reasoning of Haskell programs because evaluation order may be important. It seems like an admission of defeat, declaring that monads are too difficult and awkward.


What we would to do is separate the production of values and the consumption of values.
Pipes is a co-routine library that tries tries to emulate this sort of paradigm in a manner
similar to Unix pipes.
~~~ {.haskell}
import Pipes
import System.IO (isEOF)

-- Producer designates a generator of values
stdinLn :: Producer String IO ()
stdinLn = do
eof <- lift isEOF
if eof
then return ()
else do
str <- lift getLine
-- Special function yield hands off the value and blocks
-- until the value is used
yield str
stdinLn

-- For every call to "yield str", a corresponding call to "useString str" is made
useString:: String -> Effect IO ()
useString str = lift (putStrLn str)

-- Echoes back string inputs from user
echo :: Effect IO ()
echo = for stdinLn useString

main :: IO ()
main = runEffect echo
~~~


How can we build something like this? We can think of the Producer type as
a sort of list containing effects inside.

~~~ {.haskell}
import Control.Monad.Trans.Class (MonadTrans(lift))

data Producer a m r
= Yield a (Producer a m r) -- "Cons" of a list
| M (m (Producer a m r))
| Return r -- Empty list

yield :: a -> Producer a m ()
yield a = Yield a (Return ())

instance Monad m => Monad (Producer a m) where
-- return :: Monad m => r -> Producer a m r
return r = Return r

-- (>>=) :: Monad m
-- => Producer a m r -> (r -> Producer a m s) -> Producer a m s
(Yield a p) >>= return' = Yield a (p >>= return')
(M m) >>= return' = M (m >>= \p -> return (p >>= return'))
(Return r) >>= return' = return' r

instance MonadTrans (Producer a) where
-- lift :: Monad m => m r -> Producer a m r
lift m = M (liftM Return m)
~~~


Alternatively, the Producer type can be thought of as a syntax tree of `Yield` values
and a nil value. In this sense `for` connects to syntax trees to create a new one.

~~~ {.haskell}
for :: Monad m
=> Producer a m ()
-> (a -> Producer b m ())
-> Producer b m ()
for (Yield a p) yield' = yield' a >> for p yield'
for (M m) yield' = M (m >>= \p -> return (for p yield'))
for (Return r) _ = Return r
~~~


`runEffect` is a useful function for actually performing the actions generated by
Producer. An `Effect` is a `Producer Void`, where `Void` is a type with no constructors.
This means that an `Effect` has no yield constructors, so it contains an entirely
self-contained producer-consumer cycle.



## Theory behind Pipes:

A little bit about the theory behind Pipes: One of the cool things about Haskell
is that it uses design patterns that are inspired by category theory. We see these
in the typeclasses `Monoid`, `Applicative`, `Monad`, etc. We use these things because
we want to *reduce software complexity*. In software we have this problem where we hook
up a bunch of components together and the more components you have the more difficult
it is to keep track of everything. We can reduce the complexity if we make sure that
whenever we add a new component we still have the same type at the end, which is what a monoid is!


~~~ {.haskell}
class Monoid m where
mappend :: m -> m -> m
mempty :: m

(<>) :: Monoid m => m -> m -> m
(<>) = mappend

-- Monids must follow the following rules:
-- Associativity
(x <> y) <> z = x <> (y <> z)
-- Identity
mempty <> x = x
x <> mempty = x
~~~



We then see that a `Producer` can fit into this mold.
Returning unit is the equivalent of returning zero things while calling yield is
the equivalent of adding things. This is because `(>>)` and `return ()` within a Monad form a Monoid.

~~~ {.haskell}
(>>) :: Producer a IO () -- (<>) :: m
-> Producer a IO () -- -> m
-> Producer a IO () -- -> m

return () :: Producer a IO () -- mempty :: m
~~~



We can generalize monoids even further by discussing *categories*.

~~~ {.haskell}
class Category cat where
(.) :: cat b c -> cat a b -> cat a c
id :: cat a a

(>>>) :: cat a b -> cat b c -> cat a c
(>>>) = flip (.)
~~~


In a monad `(>=>)` and `return` form a Category.


We will now define `~>` to be a point free oposition operator. We would like `(~>)` and `yield` to form a category. What this means in terms of following the appropriate laws can be found
on the ensuing slides.
~~~ {.haskell}
(f ~> g) x = for (f x) g
~~~


## Pipes API

In addition to having a producer that creates values, we can also create a consumer
that takes in values in a stateful manner. This example echoes back a user's input as before
but also prefixes it with a line number:

~~~ {.haskell}
import Pipes
import Pipes.Prelude (stdinLn)

numbered :: Int -> Consumer String IO r
numbered n = do
str <- await
let str' = show n ++ ": " ++ str
lift (putStrLn str')
numbered (n + 1)

giveString :: Effect IO String
giveString = lift getLine

nl :: Effect IO ()
nl = giveString >~ numbered 0

main :: IO ()
main = runEffect nl
~~~



The `Consumer` typeclass is defined similarly to `Producer`.

~~~ {.haskell}
data Consumer a m r
= Await (a -> Consumer a m r )
| M (m (Consumer a m r))
| Return r

await :: Consumer a m a
await = Await (\a -> Return a)
~~~

The `Consumer` equivalent of `Producer`'s for is the `(>~)`, the feed operator.
~~~ {.haskell}
(>~) :: Monad m
=> Consumer a m b
-> Consumer b m c
-> Consumer a m c
~~~


We can combine `Producer`s and `Consumer`s with the piper operator `(>->)`.

~~~ {.haskell}
Mix Producers and Consumers using >->
(>->) :: Producer a IO r
-> Consumer a IO r
-> Effect IO r

main :: IO ()
main = runEffect (stdinLn >-> numbered)
~~~


In addition to mixing `Producer`s and `Consumer`s, we also have the `Pipe` type
which can both yield and await. In a way we can create `Consumer`s and `Producer`s from
Pipes simply by sealing off one end of the pipe:

~~~ {.haskell}
type Consumer a = Pipe a Void
type Producer b = Pipe () b -- Almost, the real implementation is a bit more clever
~~~



The Pipes API inspired by category theory, equating `(>=>)` with `return`,
`(~>)` with `yield`, `(>~)` with `await`, and `(>->)` with `cat`.
A neat advantage or equating these is that the category laws then act as a small
test cases for the library.