From 1d9ced6277899fc13a33fff1bc2f73dfa030eb30 Mon Sep 17 00:00:00 2001 From: Adelbert Chang Date: Mon, 24 Oct 2016 21:13:04 -0700 Subject: [PATCH] New blog post: EDSLs as functions --- posts/2016-09-21-edsls-part-1.md | 3 + posts/2016-10-26-edsls-part-2.md | 450 +++++++++++++++++++++++++++++++ 2 files changed, 453 insertions(+) create mode 100644 posts/2016-10-26-edsls-part-2.md diff --git a/posts/2016-09-21-edsls-part-1.md b/posts/2016-09-21-edsls-part-1.md index 9a48fdc2..3b4929b2 100644 --- a/posts/2016-09-21-edsls-part-1.md +++ b/posts/2016-09-21-edsls-part-1.md @@ -78,3 +78,6 @@ axes: interpret our mini-programs. How EDSLs are encoded will affect how they perform and therefore affect any downstream consumers of our programs, be it other programs or end users. + +In the [next post]({% post_url 2016-10-26-edsls-part-2 %}) we'll take a look +at the first of these approaches. diff --git a/posts/2016-10-26-edsls-part-2.md b/posts/2016-10-26-edsls-part-2.md new file mode 100644 index 00000000..4118dbb8 --- /dev/null +++ b/posts/2016-10-26-edsls-part-2.md @@ -0,0 +1,450 @@ +--- +layout: post +title: EDSLs as functions + +meta: + nav: blog + author: adelbertc + pygments: true +--- + +*This is the second of a series of articles on "Monadic EDSLs in Scala."* + +Perhaps the most direct way to start writing an EDSL is to start writing +functions. Let's say we want a language for talking about sets of integers. + +```scala +trait SetLang { + def add(i: Int, set: Set[Int]): Set[Int] + def remove(i: Int, set: Set[Int]): Set[Int] + def exists(i: Int, set: Set[Int]): Boolean +} +``` + +This works... to the extent that we want only to work with +`scala.collection.Set`s. As it stands we cannot talk about +other sets such as bloom filters or sets controlled by other threads. +Our language isn't *abstract* enough, so let's remove +all traces of `Set`. + +```scala +trait SetLang[F[_]] { + def add(i: Int, set: F[Int]): F[Int] + def remove(i: Int, set: F[Int]): F[Int] + def exists(i: Int, set: F[Int]): Boolean + + // Given unknown F we no longer know how to create an empty set + // so we add the capability to our language + def empty: F[Int] +} +``` + +We've parameterized our language with a [higher-kinded type][hkt] which +represents the context of our set. A similar parameterization could be +done with a *-kinded type (e.g. `SetLang[A]`) but since this series +focuses on **monadic** EDSLs, the choice is made for us. + +Now we can write mini-programs which talk about some abstract set +yet to be determined. + +```scala +def program[F[_]](lang: SetLang[F]): Boolean = { + import lang._ + exists(10, remove(5, add(10, add(5, empty)))) +} +``` + +Interpretation of our program is done by implementing `SetLang` and +passing an instance into `program`. + +However, our language is still not abstract enough. Replacing `Set` +with `F` allows us to swap in implementations of sets, but doesn't +allow us to talk about the context. Consider the behavior of `exists` if `F` +represents some remote set. Since `exists` returns a `Boolean`, +checking membership must be a synchronous operation despite the set living +on another node. + +It's also tedious to thread the set through each method manually. + +We can solve both problems by generalizing the use of `F` to some +context that is able to read and write to some set +(think `Set[Int] => (Set[Int], A)`). + +```scala +trait SetLang[F[_]] { + def add(i: Int): F[Unit] + def remove(i: Int): F[Unit] + def exists(i: Int): F[Boolean] + + // No longer need `empty` since the "context" has it already +} +``` + +`SetLang` can now talk about the **effects** around interpretation, such as +asynchronity. + +```scala +import scala.concurrent.Future + +type AsyncSet[A] = Set[Int] => Future[(Set[Int], A)] + +object AsyncSet extends SetLang[AsyncSet] { + def add(i: Int): Set[Int] => Future[(Set[Int], Unit)] = ??? + + def remove(i: Int): Set[Int] => Future[(Set[Int], Unit)] = ??? + + def exists(i: Int): Set[Int] => Future[(Set[Int], Boolean)] = ??? +} +``` + +This new encoding introduces a new but important problem: how do we +combine the results of multiple calls to `SetLang` methods? In the previous +encoding we could add and remove by threading the set from one call to +the next. With this change to represent a context, it's not clear how to do +that. + +Fortunately we are now in a position to leverage a powerful tool: +[monads][monads]. By extending our set language to be monadic +we recover composition in an elegant way. The [Cats][cats] library is used +for demonstration purposes, but the discussion applies equally to +[Scalaz][scalaz]. + +```scala +import cats.Monad +import cats.implicits._ + +trait SetLang[F[_]] { + // See [this post][typeClassSubType] for why the `Monad` instance is defined + // as a member as opposed to through inheritance. + def monad: Monad[F] + + def add(i: Int): F[Unit] + def remove(i: Int): F[Unit] + def exists(i: Int): F[Boolean] +} + +def program[F[_]](lang: SetLang[F]): F[Boolean] = { + import lang._ + implicit val monadInstance = monad + for { + _ <- add(5) + _ <- add(10) + _ <- remove(5) + b <- exists(10) + } yield b +} +``` + +Defining an interpreter starts by identifying a target context. Since the context +computes values while updating state, this suggests the state monad. + +```scala +import cats.data.State + +object ScalaSet extends SetLang[State[Set[Int], ?]] { + val monad = Monad[State[Set[Int], ?]] + + def add(i: Int): State[Set[Int], Unit] = + State.modify(_ + i) + + def remove(i: Int): State[Set[Int], Unit] = + State.modify(_ - i) + + def exists(i: Int): State[Set[Int], Boolean] = + State.inspect(_(i)) +} +``` + +```scala +val state = program[State[Set[Int], ?]](ScalaSet) +// state: cats.data.StateT[cats.Eval,scala.collection.immutable.Set[Int],Boolean] = cats.data.StateT@ce9f626 + +state.run(Set.empty).value +// res5: (scala.collection.immutable.Set[Int], Boolean) = (Set(10),true) +``` + +Note that calling `program` did not require any context-specific knowledge - +we could define another interpreter, perhaps one that talks to a set +concurrently. + +```scala +import cats.data.StateT +import scala.concurrent.{ExecutionContext, Future} + +// Asynchronous state +def AsyncSet(implicit ec: ExecutionContext): SetLang[StateT[Future, Set[Int], ?]] = + new SetLang[StateT[Future, Set[Int], ?]] { + val monad = Monad[StateT[Future, Set[Int], ?]] + + def add(i: Int): StateT[Future, Set[Int], Unit] = + StateT.modify(_ + i) + + def remove(i: Int): StateT[Future, Set[Int], Unit] = + StateT.modify(_ - i) + + def exists(i: Int): StateT[Future, Set[Int], Boolean] = + StateT.inspect(_(i)) + } +``` + +```scala +// No changes to `program` required +val result = program(AsyncSet(ExecutionContext.global)) +// result: cats.data.StateT[scala.concurrent.Future,scala.collection.immutable.Set[Int],Boolean] = cats.data.StateT@1c029382 +``` + +`SetLang` captures the *structure* of a computation, but leaves open +its *interpretation*. + +# Monad transformers and classes + +As it turns out, `SetLang` is an example of an encoding often referred to as +[MTL-style][mtl]. + +## Monads in monads + +Among the motivations for monad classes is to remove the need to specify +monad transformer stacks. The following example is adapted from +[Functional Programming with Overloading and Higher-Order Polymorphism][mtlPaper] +by Professor Mark P. Jones. + +Consider a program that is open to failure and computes with some state. This +suggests a combinator of `Either` and `State`, both of which have +monad transformers. All that is left is to decide which transformer to use. + +```scala +type App1[A] = EitherT[State[S, ?], Error, A] + // State[S, Either[Error, A]] + // S => (S, Either[Error, A]) + +type App2[A] = StateT[Either[Error, ?], S, A] + // S => Either[Error, (S, A)] +``` + +While `App1` and `App2` are both valid compositions, the +semantics of the compositions differ. `App1` describes a program where +the computation of a *value* at each transition may fail - but any changes +are preserved - whereas `App2` describes a program where the *entire* +transition may fail. + +We can abstract away the difference by creating a type class which provides +the relevant operations we need. + +```scala +trait MonadError[F[_], E] { + def monad: Monad[F] + + def raiseError[A](e: E): F[A] + def handleErrorWith[A](fa: F[A])(f: E => F[A]): F[A] +} + +trait MonadState[F[_], S] { + def monad: Monad[F] + + def get: F[S] + def set(s: S): F[Unit] +} +``` + +Similar type classes exist for the `Reader` and `Writer` data types. +These type classes are provided in both [Cats][cats] and [Scalaz][scalaz], +[with some caveats][typeClassSubType]. + +With these type classes in place we can write functions against these as +opposed to specific transformer stacks. Furthermore our functions can specify +exactly what operations they need which helps correctness and +[parametricity][parametricity]. + +```scala +import cats.{MonadError, MonadState} +import cats.data.{EitherT, State, StateT} + +def program[F[_]](implicit F0: MonadError[F, String], + F1: MonadState[F, Int]): F[Int] = + F0.flatMap(F1.get) { i => + F0.raiseError[Int]("fail") + } +``` + +Our program can then be instantiated with either transformer stack. + +```scala +import cats.implicits._ + +// At the time of this writing Cats does not have these instances +// so they are defined here. +// +// Additionally, both Cats and Scalaz 7 have encoding issues +// with these MTL type classes which requires us to redefine Monad when +// defining MonadState instances, despite there already being one. +implicit def eitherTMonadState[F[_], E, S](implicit F: MonadState[F, S]): MonadState[EitherT[F, E, ?], S] = + new MonadState[EitherT[F, E, ?], S] { + def get: EitherT[F, E, S] = + EitherT(F.get.map(Right(_))) + + def set(s: S): EitherT[F, E, Unit] = + EitherT(F.set(s).map(Right(_))) + + def flatMap[A, B](fa: EitherT[F, E, A]) + (f: A => EitherT[F, E, B]): EitherT[F, E, B] = + fa.flatMap(f) + + def pure[A](x: A): EitherT[F, E, A] = + EitherT.pure(x) + + def tailRecM[A, B](a: A)(f: A => EitherT[F, E, Either[A, B]]): EitherT[F, E, B] = + EitherT.catsDataMonadErrorForEitherT[F, E].tailRecM(a)(f) + } + +implicit def stateTMonadError[F[_], E, S](implicit F: MonadError[F, E]): MonadError[StateT[F, S, ?], E] = + new MonadError[StateT[F, S, ?], E] { + def handleErrorWith[A](fa: StateT[F, S, A])(f: E => StateT[F, S, A]): StateT[F, S, A] = + StateT[F, S, A] { (s: S) => + val state: F[(S, A)] = fa.run(s) + F.handleErrorWith(state)(e => f(e).run(s)) + } + + def raiseError[A](e: E): StateT[F, S, A] = + StateT.lift(F.raiseError(e)) + + def flatMap[A, B](fa: StateT[F, S, A])(f: A => StateT[F, S, B]): StateT[F, S, B] = + fa.flatMap(f) + + def pure[A](x: A): StateT[F, S, A] = StateT.pure(x) + + def tailRecM[A, B](a: A)(f: A => StateT[F, S, Either[A, B]]): StateT[F, S, B] = + StateT.catsDataMonadStateForStateT[F, S].tailRecM(a)(f) + } + +type App1[A] = EitherT[State[Int, ?], String, A] + +type App2[A] = StateT[Either[String, ?], Int, A] +``` + +```scala +val app1 = program[App1] +// app1: App1[Int] = EitherT(cats.data.StateT@5fdc056d) + +val app2 = program[App2] +// app2: App2[Int] = cats.data.StateT@72493a33 +``` + +# Composing languages + +From one angle we can view our set language, or more generally any EDSL +in MTL-style, as an effect like `MonadError` and `MonadState`. From another +angle we can view `MonadError` and `MonadState` as EDSLs that talk about errors +and stateful computations. We can eliminate the distinctions by renaming +`SetLang` to `MonadSet` and treating it as a type class. + +```scala +import cats.Monad +import cats.implicits._ + +trait MonadSet[F[_]] { + def monad: Monad[F] + + def add(i: Int): F[Unit] + def remove(i: Int): F[Unit] + def exists(i: Int): F[Boolean] +} +``` + +Composing multiple languages then becomes adding constraints to functions, and +interpretation becomes instantiating type parameters that satisfy the +constraints. + +```scala +trait MonadCalc[F[_]] { + def monad: Monad[F] + + def lit(i: Int): F[Int] + def plus(l: F[Int], r: F[Int]): F[Int] +} + +def setProgram[F[_]: MonadSet](i: Int): F[Boolean] = + implicitly[MonadSet[F]].exists(i) + +def calcProgram[F[_]: MonadCalc]: F[Int] = { + val calc = implicitly[MonadCalc[F]] + calc.plus(calc.lit(1), calc.lit(2)) +} + +def composedProgram[F[_]: MonadCalc: MonadSet]: F[Boolean] = { + implicit val monad: Monad[F] = implicitly[MonadCalc[F]].monad + for { + i <- calcProgram[F] + b <- setProgram(i) + } yield b +} + +// Instance + +// Instances are defined together but nothing is stopping us from defining +// these separately, perhaps one in the MonadSet object and another in the +// SetState object. +implicit val stateInstance: MonadSet[State[Set[Int], ?]] with MonadCalc[State[Set[Int], ?]] = + new MonadSet[State[Set[Int], ?]] with MonadCalc[State[Set[Int], ?]] { + val monad = Monad[State[Set[Int], ?]] + + def add(i: Int): State[Set[Int], Unit] = State.modify(_ + i) + + def remove(i: Int): State[Set[Int], Unit] = State.modify(_ - i) + + def exists(i: Int): State[Set[Int], Boolean] = State.inspect(_(i)) + + def lit(i: Int): State[Set[Int], Int] = State.pure(i) + def plus(l: State[Set[Int], Int], r: State[Set[Int], Int]): State[Set[Int], Int] = + (l |@| r).map(_ + _) + } +``` + +```scala +val result = composedProgram[State[Set[Int], ?]].run(Set.empty[Int]).value +// result: (scala.collection.immutable.Set[Int], Boolean) = (Set(),false) +``` + +As before, `composedProgram`, `calcProgram`, and `setProgram` are defined +independent of interpretation, so alternative interpretations simply require +defining appropriate instances. + +# A note about laws + +Type classes should come with laws - this lets us give meaning to their use. +The `Monoid` type class requires data types to have an **associative** binary +operation and a corresponding identity element. These laws allow us to +parallelize batch operations, such as partitioning a `List[A]` into +multiple chunks to be scattered across threads or machines and gathered +back. + +Since our EDSLs are type classes, we should think about what laws we expect +to hold. Below are some possible candidates for laws: + +``` +// MonadSet +set *> add(i) *> remove(i) = set +set *> remove(i) *> exists(i) = false +set *> add(i) *> exists(i) = true + +// MonadCalc - these are just the Monoid laws +plus(lit(0), x) = plus(x, lit(0)) = x +plus(x, plus(y, z)) = plus(plus(x, y), z) +``` + +Next up we'll take a look at some pitfalls of this approach, and a modified +encoding that solves some of them. + +*This article was tested with Scala 2.11.8, Cats 0.7.2, kind-projector 0.9.0, +and si2712fix-plugin 1.2.0 using [tut][tut].* + +[cats]: https://github.com/typelevel/cats "Typelevel Cats" +[hkt]: http://typelevel.org/blog/2016/08/21/hkts-moving-forward.html "Higher-kinded types: the difference between giving up, and moving forward" +[hkt]: {% post_url 2016-08-21-hkts-moving-forward %} "Higher-kinded types: the difference between giving up, and moving forward" +[mtl]: https://hackage.haskell.org/package/mtl "Monad classes" +[mtlPaper]: http://web.cecs.pdx.edu/~mpj/pubs/springschool.html "Functional Programming with Overloading and Higher-Order Polymorphism" +[monads]: http://homepages.inf.ed.ac.uk/wadler/papers/marktoberdorf/baastad.pdf "Monads for functional programming" +[parametricity]: https://www.mpi-sws.org/~dreyer/tor/papers/wadler.pdf "Theorems for free!" +[scalaz]: https://github.com/scalaz/scalaz/tree/series/7.3.x "Scalaz 7" +[tagless]: http://okmij.org/ftp/tagless-final/ "Typed final (tagless-final) style" +[tut]: https://github.com/tpolecat/tut "tut: doc/tutorial generator for scala" +[typeClassSubType]: {% post_url 2016-09-30-subtype-typeclasses %} "Subtype type classes don't work"