Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scala Wart: Convoluted de-sugaring of for-comprehensions #2573

Open
lihaoyi opened this issue May 29, 2017 · 38 comments
Open

Scala Wart: Convoluted de-sugaring of for-comprehensions #2573

lihaoyi opened this issue May 29, 2017 · 38 comments

Comments

@lihaoyi
Copy link
Contributor

lihaoyi commented May 29, 2017

Opening this issue, as suggested by Martin, to provide a place to discuss the individual warts brought up in the blog post Warts of the Scala Programming Language and the possibility of mitigating/fixing them in Dotty (and perhaps later in Scala 2.x). These are based on Scala 2.x behavior, which I understand Dotty follows closely, apologies in advance if it has already been fixed


Scala lets you write for-comprehensions, which are converted into a chain
of flatMaps an maps as shown below:

@ val (x, y, z) = (Some(1), Some(2), Some(3))
x: Some[Int] = Some(1)
y: Some[Int] = Some(2)
z: Some[Int] = Some(3)
@ for{
    i <- x
    j <- y
    k <- z
  } yield i + j + k
res40: Option[Int] = Some(6)
@ desugar{
    for{
      i <- x
      j <- y
      k <- z
    } yield i + j + k
  }
res41: Desugared = x.flatMap{ i => 
  y.flatMap{ j => 
    z.map{ k => 
      i + j + k
    }
  }
}

I have nicely formatted the desugared code for you, but you can try this
yourself in the Ammonite Scala REPL to
verify that this is what the for-comprehension gets transformed into.

This is a convenient way of implementing nested loops over lists, and happily
works with things that aren't lists: Options (as shown above), Futures,
and many other things.

You can also assign local values within the for-comprehension, e.g.

@ for{
    i <- x
    j <- y
    foo = 5
    k <- z
  } yield i + j + k + foo
res42: Option[Int] = Some(11)

The syntax is a bit wonky (you don't need a val, you can't define defs or
classes or run imperative commands without _ = println("debug")) but for
simple local assignments it works. You may expect the above code to be
transformed into something like this

res43: Desugared = x.flatMap{ i => 
  y.flatMap{ j =>
    val foo = 5
    z.map{ k => 
      i + j + k
    }
  }
}

But it turns out it instead gets converted into something like this:

@ desugar{
    for{
      i <- x
      j <- y
      foo = 5
      k <- z
    } yield i + j + k + foo
  }
res43: Desugared = x.flatMap(i => 
  y.map{ j =>
    val foo = 5
    scala.Tuple2(j, foo)
  }.flatMap((x$1: (Int, Int)) => 
    (x$1: @scala.unchecked) match {
    case Tuple2(j, foo) => z.map(k => i + j + k + foo)
    }
  )
)

Although it is roughly equivalent, and ends up with the same result in most
cases, this output format is tremendously convoluted and wastefully inefficient
(e.g. creating and taking-apart unnecessary Tuple2s). As far as I can tell,
there is no reason at all not to generated the simpler version of the code
shown above.

PostScript:

Notably, these two desugarings do not always produce the same results! The current desugaring behaves weirdly in certain cases; here is one that just bit me an hour ago:

Welcome to Scala 2.11.11 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_112).
Type in expressions for evaluation. Or try :help.

scala>  for{
     |    x <- Right(1).right
     |    y = 2
     |    z <- Right(3).right
     |  } yield x + y + z
<console>:13: error: value flatMap is not a member of Product with Serializable with scala.util.Either[Nothing,(Int, Int)]
          x <- Right(1).right
            ^
<console>:16: error: type mismatch;
 found   : Any
 required: String
        } yield x + y + z
                    ^

This specific problem has gone away in 2.12 because Either doesn't need .right anymore, but the language-level problem is still there: y = 2 ends up causing strange, difficult-to-debug errors due to the weirdness of the desugaring. This would not be an issue at all given the desugaring I proposed.

@odersky
Copy link
Contributor

odersky commented Jun 19, 2017

It would be great if we could use a more efficient and intuitive desugaring. But there's the problem with conditions. How would you translate the following:

for {
  x <- List(1, 2, 3)
  y = x * x
  if y * x > 4
} yield y + x

I don't think there's a way to do it that does not essentially fall back to the scheme we have. We could think of using different schemes for for-comprehensions with val defs before tests and for-comprehensions without. But then the whole thing would get more complex, not less so.

Alternatively, we could outlaw val bindings in front of tests. Then we could use the simple desugaring (or so I hope, I have not checked all cases!). But some Scala programs would no longer compile. It's a value judgement whether this is worth it. I would be interested in people's opinions on this.

@lihaoyi
Copy link
Contributor Author

lihaoyi commented Jun 19, 2017

Here's one idea for solving the "val assignment before if" problem. Not sure if it's too crazy, but what if your example

for {
  x <- List(1, 2, 3)
  y = x * x
  if y * x > 4
} yield y + x

Desugared into

{
  val tmp = List(1, 2, 3)
  tmp.flatMap{ x =>
    val y = x * x
    tmp.point(()).filter(y * x > 4).map{ _ =>
      y + x
    }
  }
}

Where in addition to needing .withFilter on a type to make if-guards work, we would ask for a monadic .point. List().point(x) would be List(x), and similar things could be defined for Option, Future, etc.. like Scalaz/Cats already do

After .point and .filter are called, the "normal" .map or .flatMap takes over from there.

The choice of methods is a bit arbitrary, there are other pairs of methods that could work, e.g. .pointUnit: M[Unit], rather than a more generic .point: M[T] or .point and .empty rather than .point and .filter. We could also make it implicit based-on-type (e.g. with some sort of def point[T](t: T)(implicit point: Point[T]) = point and point(tmp)) rather than as a method, but all that is orthogonal to the core idea.

I think this should work for all arrangements of val assignments, if-guards and <- bindings, since in this scheme if-guards:

if foo

basically desugar into a

_ <- tmp.point(()).filter(foo)

binding on tmp object that was last <-ed.

There are probably other subtleties with this (performance???) that I haven't thought through, and of course it's a much bigger change than the initial post describes. But the fundamental idea seems like it might work, and even with the unusualness of .point it would still be a simpler desugaring than the status quo.

EDIT: on further thought, .point feels a lot less sketchy than .forUnit, so restructured the comment around that

@smarter
Copy link
Member

smarter commented Jun 19, 2017

@lihaoyi This is pretty close to the way guard works in Haskell: https://en.wikibooks.org/wiki/Haskell/Alternative_and_MonadPlus#guard

@odersky
Copy link
Contributor

odersky commented Jun 19, 2017

It would indeed be interesting to look into how we can use something like guard for an alternative desugaring.

@lihaoyi
Copy link
Contributor Author

lihaoyi commented Jun 19, 2017

Thanks for the pointer @smarter! If we follow Haskell's guard, and thus choose to go with point and empty rather than point and filter, the desugaring ends up being quite nice:

for {
  x <- List(1, 2, 3)
  y = x * x
  if y * x > 4
} yield y + x
{
  val tmp = List(1, 2, 3)
  tmp.flatMap{ x =>
    val y = x * x
    if (!(y * x > 4)) tmp.empty
    else tmp.point(()).map(_ => y + x) // or flatMap if there's more stuff
  }
}

The tmp.point(()).map is a bit redundant: it could be just tmp.point(...) in this case:

{
  val tmp = List(1, 2, 3)
  tmp.flatMap{ x =>
    val y = x * x
    if (!(y * x > 4)) tmp.empty
    else tmp.point(y + x)
  }
}

and if there are subsequent generators we wouldn't need it at all:

for {
  x <- List(1, 2, 3)
  y = x * x
  if y * x > 4
  z <- 0 until y
} yield y + x + z
{
  val tmp = List(1, 2, 3)
  tmp.flatMap{ x =>
    val y = x * x
    if (!(y * x > 4)) tmp.empty
    else (0 until y).map(z => y + x + z)
  }
}

I think this beautifully captures the symmetry between the if-guard within the for-comprehension and the equivalent if-statement in the desugared flatMap/map code.

Alternately, we could go with flatMap and always include a point at the end, and not use map at all. I think that's what Haskell basically does?

I guess there's some value in making sure both point and map always end up being called, even if just for the not-changing-too-many-things-at-once value? If so we could stick with tmp.point(()).map. Not really sure what the tradeoffs here are

Anyway, I think any of these variants, if we could make it work, would be superior to the status quo desugaring

@odersky
Copy link
Contributor

odersky commented Jun 20, 2017

It would be great if this could work, but there's a problem.

for {
  x <- List(1, 2, 3)
  y = x * x
  if y * x > 4
  z <- 0 until y
} yield y + x + z
{
  val tmp = List(1, 2, 3)
  tmp.flatMap{ x =>
    val y = x * x
    if (!(y * x > 4)) tmp.empty
    else (0 until y).map(z => y + x + z)
  }
}

What is the type of the if-else? It's the lub of the type of tmp.empty, which is List[Int] and the type of the mapped range, which is IndexedSeq[Int]. So the result type is Seq here which is fine. But in general the type of the argument to flatMap can be completely unrelated to the type of the original generator. Our translations work regardless of types, so we cannot assume that the generated if-else would be typable. Logically we'd have to pick the empty on the argument to flatmap argument instead of the empty of the original generator. But I don't see a general way to get that.

@odersky
Copy link
Contributor

odersky commented Jun 20, 2017

EDIT: Maybe it's not so bad. We could simply require that empty is of a type which is compatible with the argument type of the flatMap. Is there a use for-expressions where this would not be the case?

So if we accept that, the next problem is how to retrofit all collections and other types on which for-expressions with ifs are performed so that they have an empty. That's still a large problem.

@ShaneDelmore
Copy link
Contributor

What is the empty value for Either[A, B]? It seems it must be Left[A] but I don’t know where I could get an A value from, unless there was the ability to put an implicit left in scope that would be used by the for comprehension desugaring.

@lihaoyi
Copy link
Contributor Author

lihaoyi commented Jul 20, 2017

Either[A, B] does not support .filter even today, so we probably don't need to worry about it not supporting .filter due to lack of empty values or whatever

@ Right(1).filter(_ => false)
cmd0.sc:1: value filter is not a member of scala.util.Right[Nothing,Int]
val res0 = Right(1).filter(_ => false)
                    ^
Compilation Failed

@ShaneDelmore
Copy link
Contributor

It’s true, but it does confuse the heck out of everyone the first time they hit it so if we were able to come up with a solution at the same time it would be useful. The only idea I have so far is as mentioned above, allow an implicit empty to be found in scope which would allow the use of filter as long as the user is able to supply an empty in scope, which for me would have been possible every time I have run into the issue.

@lihaoyi
Copy link
Contributor Author

lihaoyi commented Jul 20, 2017

I think the correct answer to the confusion is to get people used to the fact that you can't always filter on a monad. There is a huge number of monads for which filter doesn't make sense: State, Reader, Writer, Function0, etc.. In a way I feel that Scala's "put .filter on anything" convention is itself a bit of a wart

@ShaneDelmore
Copy link
Contributor

You are winning me over, I agree, the confusion has always arisen from the expectation that it would just work and eliminating that expectation is probably a better way to go.

@pathikrit
Copy link

Actually ran into a big wart recently related to this. See @frosforever's SO post here: https://stackoverflow.com/questions/45333889/regex-unapply-in-for-comprehension-with-if-guard-not-compiling

tl;dr -

This does not compile:

for {
    s <- List.empty[String]
    regex <- List.empty[scala.util.matching.Regex]
    regex(ss) = s
    if ss == "foo"
} yield s

But removing the if:

for {
    s <- List.empty[String]
    regex <- List.empty[scala.util.matching.Regex]
    regex(ss) = s
} yield s

or rearranging the order of the two lists in the for comprehension:

for {
    regex <- List.empty[scala.util.matching.Regex]
    s <- List.empty[String]
    regex(ss) = s
    if ss == "foo"
} yield s

makes it compile

@odersky
Copy link
Contributor

odersky commented Jan 24, 2018

It would be nice if we could improve on this. But we'd need someone to put in the effort.

@LPTK
Copy link
Contributor

LPTK commented Apr 21, 2018

Can we at least do something about imperative for comprehensions? They have no justification for generating such horrible code.

The following:

for { x <- xs; y = x; if y > 0; z <- zs } println(x + z)

should generate:

xs.foreach { x =>
  val y = x
  if (y > 0) zs.foreach { z =>
    println(x + z)
  }
}

instead of the current convoluted and inefficient:

xs.map { x =>
  val y = x
  (x, y)
}.withFilter {
  case (x,y) => y > 0
}.foreach {
  case (x,y) => zs.foreach { z => println(x + z) }
}

which creates an intermediate list, allocates tuples, and calls four closure-consuming methods instead of just 2.

No wonder people are surprised how much more efficient it is to use while loops rather than for loops in Scala 😅

I believe the fix for this one is a low-hanging fruit. I could try to do it at some point if people agree with the idea.

@LPTK
Copy link
Contributor

LPTK commented Apr 21, 2018

As for functional loops, I think it could be solved in a less artificial way by using flatten and a simple method called flattenOptions, of type M[Option[A]] => M[A].

This code:

for { x <- xs; t = x+1; if t > 0; y <- ys } yield x + y

would produce:

xs.map { x =>
  val t = x+1
  if (t > 0) Some(ys.map { y =>
    x + y
  })
  else None
}.flattenOptions.flatten

and this code:

for { x <- xs; y <- ys; t = y+1; if x > t } yield x + y

would produce:

xs.flatMap { x =>
  ys.map { y =>
    val t = y+1
    if (x > t) Some(x + y)
    else None
  }.flattenOptions
}

Compared to point and empty, which have no business whatsoever being defined on monad/collection values, flattenOptions is an independently useful method. In the case of Scala standard collections, it can actually be the same as flatten (which only requires the element type to be traversable, and so is Option).

In addition, flattenOptions can somewhat easily be defined as an extension method:

implicit class FlattenOptions[A,B](xs: FilterMonadic[Option[A],B]) {
  def flattenOptions[That](implicit cbf: CanBuildFrom[B,A,That]) =
    xs.withFilter(_.isDefined).map(_.get)
}

Also, not sure if it would fit in the complexity budget because it makes things less regular, but I would still prefer something easier like:

for { x <- xs; y <- ys if x > 0 } yield x + y

to desugar to the more natural:

xs.flatMap { x =>
  ys.filter(y => x > 0).map { y =>
    x + y
  }
}

which does not allocate any options, instead of:

xs.flatMap { x =>
  ys.map { y =>
    if (x > 0) Some(x + y)
    else None
  }.flattenOptions
}

@smarter
Copy link
Member

smarter commented Apr 21, 2018

Related: https://github.com/oleg-py/better-monadic-for

@mdmilosz
Copy link

mdmilosz commented Apr 30, 2018

Hi all.

I've been thinking about this issue for a while and it looks like the main issue stems from how assignment and condition clauses are desugared in the for-comprehensions. With that assumption, I started considering a plausible unification, so to speak, of the different clauses.

Let be given the following for-comprehension:

      for {
        x <- List(1, 2, 3)
        z = 1
        if x > z
        y <- List(10, 20, 30)
      } yield x + y

Let's assume the existence of such a unary constructor single(a: A): F[A] that we can rewrite z = 1 as z <- single(1). (For instance, for the List type, single(a: A): List[A] = List(a).)

Analogically, let's assume the existence of such a unary constructor cond(b: Boolean): F[Unit] that we can rewrite if x > z as _ <- cond(x > z). (For instance, for the List type, cond(b: Boolean): List[Unit] = List(()).filter(_ => b).)

Then we can write the above for-comprehension as follows:

      for {
        x <- List(1, 2, 3)
        z <- single(1)
        _ <- cond(x > z)
        y <- List(10, 20, 30)
      } yield x + y

Or, after replacing the constructors with their implementation:

    for {
      x <- List(1, 2, 3)
      z <- List(1)
      _ <- List(()).filter(_ => x > z)
      y <- List(10, 20, 30)
    } yield x + y

It might look that, since we only added boilerplate, the resulting desugared code would be longer and more complex. However, if my IDE and Lihaoyi's Ammonite shell are to be believed, the opposite happens. (In fact, the desugared version of the original, generated by IntelliJ IDEA with the Scala plugin, doesn't even compile!)

I've tried to write a minimalistic type class system in 2.12 that would encompass implicit resolution of the single and cond constructors, but encountered either type mismatch errors or ambiguous implicit values errors whenever I tried to implement the constructors for more than one type.

(Here's a small semi-working example using one type — GenTraversable.)

Regardless of the need of providing simple and cond for every type that might need to be used in a for-comprehension, there are some other drawbacks to this concept:

  • the source code needs to be pre-processed even before applying the desugaring stage (perhaps a bit of this logic could be refined and integrated into the desugaring proper?);

  • the source after desugaring might need to be post-processed in order to simplify some specific cases (e.g. single(x).flatMap(f)f(x), single(x).map(f)single(f(x)) etc; as above, maybe it could be turned into a part of a larger piece of logic).

I would be happy to receive feedback on whether there is any merit to this idea.

@pathikrit
Copy link

What is the downside to just desugaring into what this does: https://github.com/oleg-py/better-monadic-for? This is the least surprising behaviour (it is what an user expects in her mental model) and does not break current code (even if it does you will get a compile error)

@Blaisorblade
Copy link
Contributor

Blaisorblade commented May 1, 2018

This is the least surprising behaviour (it is what an user expects in her mental model)

"Desugar bindings as vals instead of tuples" might be fine, but desugaring without withFilter isn't what I expect as an user.

and does not break current code (even if it does you will get a compile error)

Which compile error? If some members fail the match you get a runtime failure instead of skipping the members — nothing in the README makes me expect errors here:

object TestExtract { def unapply(x: Any): Boolean = false }
1 match { case TestExtract() => 1; case _ => 0 }
for (TestExtract() <- List(1, 2, 3)) yield ()

The only error I anticipate is for extractors that affect the expected type, unlike the above one. Tho the README doesn't present any such example.

(_: List[Int]).map( { case (x, y) => x + y })

EDIT: just confirmed that code gives a runtime failure.

// without plugin
scala> for (TestExtract() <- List(1, 2, 3)) yield ()
res1: List[Unit] = List()

// with plugin
scala> for (TestExtract() <- List(1, 2, 3)) yield ()
scala.MatchError: 1 (of class java.lang.Integer)
  at .$anonfun$res2$1(<console>:13)
  at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.java:12)
  at scala.collection.immutable.List.map(List.scala:283)
  ... 36 elided

Of course that is not meant as a realistic example, but using an extractor with type unapply(x: T) on a List[T] will not give compiler errors, whether the extractor is going to always succeed, or not. What you'd want is to only omit withFilter for irrefutable patterns, that is (maybe) those built out of the following elementary irrefutable patterns:

  • extractors returning type Some[...] or true.type
  • matches on the only case class of a sealed hierarchy
  • maybe others ???

@tarsa
Copy link

tarsa commented Jan 22, 2019

Alternatively, we could outlaw val bindings in front of tests. Then we could use the simple desugaring (or so I hope, I have not checked all cases!). But some Scala programs would no longer compile. It's a value judgement whether this is worth it. I would be interested in people's opinions on this.

Here's my opinion: I like this idea :)

I think scanning through some big Scala codebase (Scala community build?) would reveal how much code is going to be affected. Maybe the affected code wouldn't be hard to adjust?

julienrf added a commit to scalacenter/dotty that referenced this issue Jan 29, 2019
This will allow us to experiment with a different desugaring of `for`
expressions, as described in scala#2573.
@linasm
Copy link

linasm commented Oct 23, 2019

What is the current status of this issue?
For the record, I want to mention one more wart, caused by current desugaring of value definitions in for loop when used in very imperative/mutable style:

val xs = Array(1, 0, 0)

for {
  i <- 1 until xs.length
  previous = xs(i - 1)
} xs(i) = previous

Expected (before one looks at the language spec) result: 1 1 1.
Actual result (because of intermediate tupling): 1 1 0.

@smarter
Copy link
Member

smarter commented Oct 23, 2019

What is the current status of this issue?

The current status is: no one has come up with a full proposal here, no one at EPFL is working on this so this will have to come from the community, see also https://contributors.scala-lang.org/t/pre-sip-improve-for-comprehensions-functionality/3509/21?u=smarter.

@djspiewak
Copy link

My two cents on this…

I think there are a number of historical problems here which have compounded to create the situation we're in today. I'm not entirely sure how to unravel it, but in the interests of clarity of discussion, it's worth stating them as I see them.

Untyped Desugaring

Rather than the desugaring of for being in terms of some typeclass like Monad and MonadPlus (or similar), the desugaring has always been a syntax sugar. That sugar is so unconstrained that whole libraries have been built around the idea of using for as a DSL for entirely non-monadic things (like SQL statements), and that ecosystem cannot be simply dismissed.

All of this means that, while it's theoretically possible to leverage enrichments and typeclasses to push for desugaring into something a bit more type-driven (similar to what Cats does with its syntax.flatMap._), it's always going to be a bit second-class.

I don't see a realistic way to correct this issue without breaking vast swathes of the ecosystem, which is to say, I think we're stuck with it.

Inconsistent User Expectations

Due to some of the vagaries of how for-comprehensions have always been treated, many users now expect behavior that is, when looked at in isolation, actually rather strange. @Blaisorblade's example from above is a good one to look at:

for (TestExtract() <- List(1, 2, 3)) yield ()

Why would you expect this to produce an empty list?

val TestExtract() = List(1, 2, 3)

That produces a MatchError. So would using match directly. So why are for-comprehensions magical in this way? The only reason I can think of is to unnecessarily lean into the syntactic legacy of for as a construct for iterating over collections. But if you think of for as a mechanism for iterating over collections, then libraries such as the aforementioned DSL tricks (leveraging the untyped nature of the desugaring) make absolutely no sense, and more semantically-rigorous constructs such as monads look even stranger.

What I'm saying is that this kind of user expectation paints us into a very inconsistent semantic corner. At the risk of sounding dismissive, the expectation is wrong and does not align with the rest of the language.

Side Effects

And here's the other problem: many users have some wildly problematic expectations on how side effects are handled within the desguaring:

for {
  x <- List(1, 2, 3)
  y = x * x
  if y * x > 4
  z <- 0 until y
} yield y + x + z

There's no particular reason that this cannot be desugared into the following:

List(1, 2, 3).filter(x => (x * x) * x > 4) flatMap { x =>
  val y = x * x
  0.until(y).map(y + x + _)
}

The user certainly cannot see a difference except when using a stop watch. The problem is that this cannot be thus desugared without breaking expectations:

for {
  x <- List(1, 2, 3)
  y = {
    println("derp")
    x * x
  }
  if y * x > 4
  z <- 0 until y
} yield y + x + z

Users expect that the println is only going to run once per x, which is of course not the case when you have the if in this context. This is kind of a crazy expectation, but one which is pushed rather heavily by non-yielding for-comprehensions, so I don't really blame people for believing it should work in this way. It is, however, still an expectation which creates significant friction when attempting to improve things.

We could also feasibly do something like this:

List(1, 2, 3).map(x => (x, x * x)).filter(_._2 > 4) flatMap {
  case (x, y) => 0.until(y).map(y + x + _)
}

This works (even with side effects)! But it also assumes the container is fully polymorphic and imposes extra boxing (as well as other, more deeply subtle and now mostly obsolete issues, such as confusing CanBuildFrom when filtering a String). And here again we run into the issues with untyped things, because we cannot simply assume that the F[_] in question forms a lawful monad (or even functor), and thus we cannot assume that it's safe to just shovel tuples around.

…and more

This is a real problem that's going to be a very serious issue for a meaningful chunk of the ecosystem when migrating to Scala 3. For example: typelevel/cats-effect#749 In case it isn't clear, the problem there is being caused by the unnecessary trailing map generated by the for, which effectively creates an unexpected memory "leak" due to the stack of continuations which must be maintained. Basically the only solution that I can think of at present is a scalafix rule which removes for-comprehensions from most code which touches these kinds of libraries, but that's… very frustrating and not a great experience. It effectively takes a major language feature (for) entirely off the table.

So really, I don't want this to be underestimated: this is going to cause a lot of friction in the Scala 3 migration if not addressed in some way.

A Modest Proposal

Trailing Map

Eliminating trailing identity maps is an easy fix:

for (x <- List(1, 2, 3)) yield x

…should desugar into simply List(1, 2, 3). This would resolve the majority of the migration friction.

Partial Patterns

Silent filtering instead of a more uniform match semantic on falsifiable patterns should be removed. In other words, @Blaisorblade's example should indeed produce a MatchError, consistent with val, match, and absolutely everything else in the language. I can think of dozens of times over the past decade that I've had bugs that were caused by this feature, including at least three separate times where it required meaningful amounts of time to track down exactly what was going on (most recently, about 18 months ago on Scala 2.12). I have never even once wanted this feature. I realize I'm not a representative sample of the entire Scala ecosystem, but if someone has never seen merit to a language feature over the course of 10+ years of experience and has lost man-weeks of cumulative time to tracking down bugs it has caused, it seems like the feature's continued existence past a breaking migration is probably at least somewhat dubious.

Limited Polymorphic Assumption

We should bite the bullet and make the assumption that tuples can be stuffed into whatever we're mapping over. Thus:

for {
  x <- List(1, 2, 3)
  y = x * x
  if y * x > 4
  z <- 0 until y
} yield y + x + z

// should desugar into...

List(1, 2, 3).map(x => (x, x * x)).filter(_._2 > 4) flatMap {
  case (x, y) => 0.until(y).map(y + x + _)
}

This is no more problematic than the current assumptions around withFilter, and considerably easier to explain. It imposes no meaningful practical performance penalty (the added boxing of the Tuple2 is entirely dwarfed by the raw collection allocation overhead, and if someone really cares about eliminating this, they can do so by making their map a macro).

Taken together, I believe these last two changes entirely remove the need for withFilter.

Summary

We still are left with a for-comprehension desugaring which has some unfortunate backwaters and inconsistencies, but these changes will, at the least, eliminate the majority of the current foot-guns while considerably easing the migration from Scala 2.

Needless to say, we could keep the old desugaring available under the -language:Scala2 flag.

@scf37
Copy link

scf37 commented Jan 16, 2020

Can new syntax or keyword be added to yield monad, not value?
e.g.

for {
  a <- readHello
  b <- readWorld
} yieldF say(a + b)

// desugars to
readHello.flatMap { a =>
  readWorld.flatMap { b =>
    say(a + b)
  }
}

@smarter
Copy link
Member

smarter commented Jan 16, 2020

Silent filtering instead of a more uniform match semantic on falsifiable patterns should be removed.

I think -strict in Dotty (which contains things slated to become default after 3.0) has the behavior you want: http://dotty.epfl.ch/docs/reference/changed-features/pattern-bindings.html#pattern-bindings-in-for-expressions

We should bite the bullet and make the assumption that tuples can be stuffed into whatever we're mapping over.

I like it, it'd be interesting to tweak scalac to do this and see what breaks in the community build.

@smarter
Copy link
Member

smarter commented Jan 16, 2020

@scf37 we'd like to fix the existing stuff before adding new stuff on top of it :).

@djspiewak
Copy link

djspiewak commented Jan 16, 2020

@smarter I agree that would be a fascinating experiment, and very helpful in determining the right approach here. I think such an experiment should be done with both the tupling and the trailing identity map elimination (which seems the most likely to be innocuous). If -strict already has sane semantics for non-falsifiable patterns in for, then I’m quite happy on that front.

@LPTK
Copy link
Contributor

LPTK commented Jan 17, 2020

We should bite the bullet and make the assumption that tuples can be stuffed into whatever we're mapping over.

I do not understand what you are proposing. The code example you show is already desugared in the way you suggest. You can check it as follows:

Welcome to Scala 2.13.1 (OpenJDK 64-Bit Server VM, Java 1.8.0_121).
Type in expressions for evaluation. Or try :help.

scala> scala.reflect.runtime.universe.reify {
     | for {
     |   x <- List(1, 2, 3)
     |   y = x * x
     |   if y * x > 4
     |   z <- 0 until y
     | } yield y + x + z
     | }
res0: reflect.runtime.universe.Expr[List[Int]] =
Expr[List[Int]](List.apply(1, 2, 3).map(((x) => {
  val y = x.$times(x);
  Tuple2.apply(x, y)
})).withFilter(((x$1) => x$1: @unchecked match {
  case Tuple2((x @ _), (y @ _)) => y.$times(x).$greater(4)
})).flatMap(((x$2) => x$2: @unchecked match {
  case Tuple2((x @ _), (y @ _)) => Predef.intWrapper(0).until(y).map(((z) => y.$plus(x).$plus(z)))
})))

The only meaningful difference with your desugaring is the use of withFilter instead of filter.

What do you mean by "the current assumptions around withFilter" when you say "this is no more problematic than the current assumptions around withFilter, and considerably easier to explain"?

There's no particular reason that this cannot be desugared into the following:

List(1, 2, 3).filter(x => (x * x) * x > 4) flatMap { x =>
  val y = x * x
  0.until(y).map(y + x + _)
}

Actually, there are very good reasons not to do that: not only do you duplicate code (what if we had big_expression(x) instead of x * x?), but more importantly you also duplicate runtime work: what if you had expensive_computation(x) instead of x * x?

@djspiewak
Copy link

I do not understand what you are proposing. The code example you show is already desugared in the way you suggest. You can check it as follows:

Cool, so let's just remove withFilter then. If tuples are already the mechanism in play here then I don't see a reason to change it.

What do you mean by "the current assumptions around withFilter" when you say "this is no more problematic than the current assumptions around withFilter, and considerably easier to explain"?

Simply referring to the fact that, in order to participate in if within a for comprehension, types must currently define withFilter, with its own type foibles and such.

Actually, there are very good reasons not to do that: not only do you duplicate code (what if we had big_expression(x) instead of x * x?), but more importantly you also duplicate runtime work: what if you had expensive_computation(x) instead of x * x?

As I said, effects. Expensive work in this case is effectively a sort of effect. The point being that people expect that val bindings immediately prior to a filter are not repeatedly evaluated. That's a weird expectation, but it is what they expect, so it's fair to treat it as a constraint.

What I'm trying to avoid in all of this is a .empty method on values, which frankly doesn't make any sense (neither would an analogous point method) and would pollute the public APIs of any type which wants to enable if within for.

@neko-kai
Copy link
Contributor

neko-kai commented Jan 17, 2020

I'm abolutely all for removing the trailing map when the yielded expression is a value of the previous expression – the trailing map makes no sense at all, it's harmful AND there's no way to abuse it meaningfully even if you wanted to. (It can't be used as a "finalizing" symbol for a builder because you can't distinguish between the final map and all the other maps in the middle of the expression.)

@LPTK
Copy link
Contributor

LPTK commented Jan 19, 2020

Cool, so let's just remove withFilter then. If tuples are already the mechanism in play here then I don't see a reason to change it.

What is your rationale for removing withFilter? For comprehensions with guards desugar to calls to withFilter, there is nothing surprising or problematic about that. Changing it to calling filter will not improve anything; it will just break existing code.

types must currently define withFilter, with its own type foibles and such

What foibles and such?

people expect that val bindings immediately prior to a filter are not repeatedly evaluated. That's a weird expectation

Why is that a weird expectation? Do you know of any other language that have for constructs which implicitly re-executes code because it's before a guard? Even Haskell does not do that.

@Blaisorblade
Copy link
Contributor

Do you know of any other language that have for constructs which implicitly re-executes code because it's before a guard? Even Haskell does not do that.

Well, in Haskell, purity ensures you can't tell — so that's up to the optimizer.

@LPTK
Copy link
Contributor

LPTK commented Jan 19, 2020

Yes, the GHC optimizer could duplicate runtime work if it wanted, but it generally tries very hard not to, because this could have very bad consequences on program performance — for example, see the comments in this paper about not floating let bindings into lambdas (which is similar to what was proposed here). I am not aware of GHC optimizations which duplicate runtime work, and users can safely assume that their expensive computations will not be done several times. This is even the original motivations for the monomorphism restriction.

And the Haskell language reference defines the semantics of list comprehension (on which monad comprehension is based, which is the closest thing to Scala's for with guards) as not duplicating the work of guards.

@djspiewak
Copy link

What is your rationale for removing withFilter? For comprehensions with guards desugar to calls to withFilter, there is nothing surprising or problematic about that. Changing it to calling filter will not improve anything; it will just break existing code.

It considerably complicates the definition of what if does within for. You can't just say "it uses filter", because it doesn't, and the only reason it doesn't is to magically enrich strict datatypes with non-strict semantics for this one specific use-case.

This kind of "eager, except when magically lazy, but only right here" type of thing has been tried and I think is broadly considered to be a failed experiment by the ecosystem. Maybe I'm being a bit bold in asserting that, but flip-flopping between runtime semantics is something that most of the "pure FP Scala" ecosystem avoids very aggressively because it's just hard to reason about. Eager datatypes are eager, lazy datatypes are lazy, and the only exceptions to this are very, very, very carefully reasoned and justified (e.g. the ifM operator in Cats), and tend to not affect the data itself.

In other words, I think the expectation of laziness even when the underlying datatype is eager is in and of itself a problem.

I mean, there are more important windmills to tilt, so if this is going to become a Thing™, then I'll just drop it. Removal of the trailing map along with the -strict mode for patterns resolves 90% of my objections to the current semantics. Compared to those, withFilter is quite ugly and (IMO) unnecessary, but not so much in the way.

What foibles and such?

You can just define withFilter to be an alias for filter and it, broadly-speaking, works. If you follow the example from the collections though you end up with this weird builder type off to the side.

Why is that a weird expectation? Do you know of any other language that have for constructs which implicitly re-executes code because it's before a guard? Even Haskell does not do that.

Haskell has uniformly by-need semantics. This means that at the very least, all of its data structures are lazy, meaning that filter is lazy by definition, meaning that work isn't going to be repeated in any case. Scala has a more complex problem here because it mixes eager and lazy evaluation within a syntactically uniform framework, which can in turn mean that the semantics of higher level constructs (such as for) can be radically different depending on which semantic is underneath.

@LPTK
Copy link
Contributor

LPTK commented Jan 20, 2020

It considerably complicates the definition of what if does within for. You can't just say "it uses filter", because it doesn't

Well, you can just say "it uses withFilter"... because it does.

I don't think withFilter is particularly confusing or hard. It's conceptually similar to .view.filter in the new collections infrastructure, and can be explained as such.

flip-flopping between runtime semantics is something that most of the "pure FP Scala" ecosystem avoids very aggressively because it's just hard to reason about. Eager datatypes are eager, lazy datatypes are lazy

The fact that withFilter is lazy is not a problem in terms of semantics. I believe that this laziness cannot be observed, and cannot cause surprises.

The ironic thing is that the above holds precisely because of our pesky terminating map call, which always forces evaluation on strict types. Look at the type of List(1,2,3).withFilter(_ => true).map(identity), which is List[Int]. This is why in practice we don't normally "end up with this weird builder type off to the side".

So it seems we're getting to the bottom of the problem. I would agree with you if you made the following point: using withFilter is problematic if we decide to get rid of the terminating map.

I don't know if it would be feasible to change for comprehension to the extent this suggests, without breaking and degrading the performance of too much existing code.

Maybe an alternative comprehension syntax could be used, like do { a <- ... ; b <- ... } yield e(a, b), which could be type-class-based and incorporate other improvements proposed in the contributors thread.


Haskell has uniformly by-need semantics. This means that at the very least, all of its data structures are lazy, meaning that filter is lazy by definition

(A nitpick, but this is just plain false. Haskell is non-strict, and has plenty of ways to control the amount of strictness you want in your programs, both in the language itself and in the compiler options — see: bang patterns, seq, strict data types...)

meaning that work isn't going to be repeated in any case

What we are talking about here is whether or not to duplicate code, which is what you originally suggested. Of course, duplicating code is going to duplicate runtime work in the situation at hand, even in Haskell.

@sideeffffect
Copy link
Contributor

sideeffffect commented Jan 20, 2020

Maybe an alternative comprehension syntax could be used

Inventing a new comprehension syntax from scratch, would give us an opportunity to learn from successes in this area in other language, for example F# Computation Expressions. There the syntax for monadic bindings is very similar to normal one, you just append the ! character. For example, let x = y would be Scala's val x = y and let! x = y would be x <- y. And there are also features which currently don't have a Scala counterpart, like do! x, which would be something like (): Unit <- x, or use! x = y which would be quite useful with cats.effect.Resource, or match! and many others. The details would probably be for a longer discussion elsewhere, but you can at least check out the linked website.

@SethTisue
Copy link
Member

there is a new SIP in this area: scala/improvement-proposals#79

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests