-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scala Wart: Convoluted de-sugaring of for-comprehensions #2573
Comments
It would be great if we could use a more efficient and intuitive desugaring. But there's the problem with conditions. How would you translate the following:
I don't think there's a way to do it that does not essentially fall back to the scheme we have. We could think of using different schemes for for-comprehensions with val defs before tests and for-comprehensions without. But then the whole thing would get more complex, not less so. Alternatively, we could outlaw val bindings in front of tests. Then we could use the simple desugaring (or so I hope, I have not checked all cases!). But some Scala programs would no longer compile. It's a value judgement whether this is worth it. I would be interested in people's opinions on this. |
Here's one idea for solving the "val assignment before if" problem. Not sure if it's too crazy, but what if your example for {
x <- List(1, 2, 3)
y = x * x
if y * x > 4
} yield y + x Desugared into {
val tmp = List(1, 2, 3)
tmp.flatMap{ x =>
val y = x * x
tmp.point(()).filter(y * x > 4).map{ _ =>
y + x
}
}
} Where in addition to needing After The choice of methods is a bit arbitrary, there are other pairs of methods that could work, e.g. I think this should work for all arrangements of if foo basically desugar into a _ <- tmp.point(()).filter(foo) binding on There are probably other subtleties with this (performance???) that I haven't thought through, and of course it's a much bigger change than the initial post describes. But the fundamental idea seems like it might work, and even with the unusualness of EDIT: on further thought, |
@lihaoyi This is pretty close to the way |
It would indeed be interesting to look into how we can use something like |
Thanks for the pointer @smarter! If we follow Haskell's for {
x <- List(1, 2, 3)
y = x * x
if y * x > 4
} yield y + x {
val tmp = List(1, 2, 3)
tmp.flatMap{ x =>
val y = x * x
if (!(y * x > 4)) tmp.empty
else tmp.point(()).map(_ => y + x) // or flatMap if there's more stuff
}
} The {
val tmp = List(1, 2, 3)
tmp.flatMap{ x =>
val y = x * x
if (!(y * x > 4)) tmp.empty
else tmp.point(y + x)
}
} and if there are subsequent generators we wouldn't need it at all: for {
x <- List(1, 2, 3)
y = x * x
if y * x > 4
z <- 0 until y
} yield y + x + z {
val tmp = List(1, 2, 3)
tmp.flatMap{ x =>
val y = x * x
if (!(y * x > 4)) tmp.empty
else (0 until y).map(z => y + x + z)
}
} I think this beautifully captures the symmetry between the Alternately, we could go with I guess there's some value in making sure both Anyway, I think any of these variants, if we could make it work, would be superior to the status quo desugaring |
It would be great if this could work, but there's a problem. for {
x <- List(1, 2, 3)
y = x * x
if y * x > 4
z <- 0 until y
} yield y + x + z {
val tmp = List(1, 2, 3)
tmp.flatMap{ x =>
val y = x * x
if (!(y * x > 4)) tmp.empty
else (0 until y).map(z => y + x + z)
}
} What is the type of the if-else? It's the lub of the type of |
EDIT: Maybe it's not so bad. We could simply require that So if we accept that, the next problem is how to retrofit all collections and other types on which for-expressions with ifs are performed so that they have an |
What is the empty value for Either[A, B]? It seems it must be Left[A] but I don’t know where I could get an A value from, unless there was the ability to put an implicit left in scope that would be used by the for comprehension desugaring. |
@ Right(1).filter(_ => false)
cmd0.sc:1: value filter is not a member of scala.util.Right[Nothing,Int]
val res0 = Right(1).filter(_ => false)
^
Compilation Failed |
It’s true, but it does confuse the heck out of everyone the first time they hit it so if we were able to come up with a solution at the same time it would be useful. The only idea I have so far is as mentioned above, allow an implicit empty to be found in scope which would allow the use of filter as long as the user is able to supply an empty in scope, which for me would have been possible every time I have run into the issue. |
I think the correct answer to the confusion is to get people used to the fact that you can't always |
You are winning me over, I agree, the confusion has always arisen from the expectation that it would just work and eliminating that expectation is probably a better way to go. |
Actually ran into a big wart recently related to this. See @frosforever's SO post here: https://stackoverflow.com/questions/45333889/regex-unapply-in-for-comprehension-with-if-guard-not-compiling tl;dr - This does not compile: for {
s <- List.empty[String]
regex <- List.empty[scala.util.matching.Regex]
regex(ss) = s
if ss == "foo"
} yield s But removing the if: for {
s <- List.empty[String]
regex <- List.empty[scala.util.matching.Regex]
regex(ss) = s
} yield s or rearranging the order of the two lists in the for comprehension: for {
regex <- List.empty[scala.util.matching.Regex]
s <- List.empty[String]
regex(ss) = s
if ss == "foo"
} yield s makes it compile |
It would be nice if we could improve on this. But we'd need someone to put in the effort. |
Can we at least do something about imperative The following: for { x <- xs; y = x; if y > 0; z <- zs } println(x + z) should generate: xs.foreach { x =>
val y = x
if (y > 0) zs.foreach { z =>
println(x + z)
}
} instead of the current convoluted and inefficient: xs.map { x =>
val y = x
(x, y)
}.withFilter {
case (x,y) => y > 0
}.foreach {
case (x,y) => zs.foreach { z => println(x + z) }
} which creates an intermediate list, allocates tuples, and calls four closure-consuming methods instead of just 2. No wonder people are surprised how much more efficient it is to use I believe the fix for this one is a low-hanging fruit. I could try to do it at some point if people agree with the idea. |
As for functional loops, I think it could be solved in a less artificial way by using This code: for { x <- xs; t = x+1; if t > 0; y <- ys } yield x + y would produce: xs.map { x =>
val t = x+1
if (t > 0) Some(ys.map { y =>
x + y
})
else None
}.flattenOptions.flatten and this code: for { x <- xs; y <- ys; t = y+1; if x > t } yield x + y would produce: xs.flatMap { x =>
ys.map { y =>
val t = y+1
if (x > t) Some(x + y)
else None
}.flattenOptions
} Compared to In addition, implicit class FlattenOptions[A,B](xs: FilterMonadic[Option[A],B]) {
def flattenOptions[That](implicit cbf: CanBuildFrom[B,A,That]) =
xs.withFilter(_.isDefined).map(_.get)
} Also, not sure if it would fit in the complexity budget because it makes things less regular, but I would still prefer something easier like: for { x <- xs; y <- ys if x > 0 } yield x + y to desugar to the more natural: xs.flatMap { x =>
ys.filter(y => x > 0).map { y =>
x + y
}
} which does not allocate any options, instead of: xs.flatMap { x =>
ys.map { y =>
if (x > 0) Some(x + y)
else None
}.flattenOptions
} |
Hi all. I've been thinking about this issue for a while and it looks like the main issue stems from how assignment and condition clauses are desugared in the Let be given the following for {
x <- List(1, 2, 3)
z = 1
if x > z
y <- List(10, 20, 30)
} yield x + y Let's assume the existence of such a unary constructor Analogically, let's assume the existence of such a unary constructor Then we can write the above for {
x <- List(1, 2, 3)
z <- single(1)
_ <- cond(x > z)
y <- List(10, 20, 30)
} yield x + y Or, after replacing the constructors with their implementation: for {
x <- List(1, 2, 3)
z <- List(1)
_ <- List(()).filter(_ => x > z)
y <- List(10, 20, 30)
} yield x + y It might look that, since we only added boilerplate, the resulting desugared code would be longer and more complex. However, if my IDE and Lihaoyi's Ammonite shell are to be believed, the opposite happens. (In fact, the desugared version of the original, generated by IntelliJ IDEA with the Scala plugin, doesn't even compile!) I've tried to write a minimalistic type class system in 2.12 that would encompass implicit resolution of the (Here's a small semi-working example using one type — Regardless of the need of providing
I would be happy to receive feedback on whether there is any merit to this idea. |
What is the downside to just desugaring into what this does: https://github.com/oleg-py/better-monadic-for? This is the least surprising behaviour (it is what an user expects in her mental model) and does not break current code (even if it does you will get a compile error) |
"Desugar bindings as vals instead of tuples" might be fine, but desugaring without
Which compile error? If some members fail the match you get a runtime failure instead of skipping the members — nothing in the README makes me expect errors here: object TestExtract { def unapply(x: Any): Boolean = false }
1 match { case TestExtract() => 1; case _ => 0 }
for (TestExtract() <- List(1, 2, 3)) yield () The only error I anticipate is for extractors that affect the expected type, unlike the above one. Tho the README doesn't present any such example. (_: List[Int]).map( { case (x, y) => x + y }) EDIT: just confirmed that code gives a runtime failure. // without plugin
scala> for (TestExtract() <- List(1, 2, 3)) yield ()
res1: List[Unit] = List()
// with plugin
scala> for (TestExtract() <- List(1, 2, 3)) yield ()
scala.MatchError: 1 (of class java.lang.Integer)
at .$anonfun$res2$1(<console>:13)
at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.java:12)
at scala.collection.immutable.List.map(List.scala:283)
... 36 elided Of course that is not meant as a realistic example, but using an extractor with type
|
Here's my opinion: I like this idea :) I think scanning through some big Scala codebase (Scala community build?) would reveal how much code is going to be affected. Maybe the affected code wouldn't be hard to adjust? |
This will allow us to experiment with a different desugaring of `for` expressions, as described in scala#2573.
What is the current status of this issue? val xs = Array(1, 0, 0)
for {
i <- 1 until xs.length
previous = xs(i - 1)
} xs(i) = previous Expected (before one looks at the language spec) result: |
The current status is: no one has come up with a full proposal here, no one at EPFL is working on this so this will have to come from the community, see also https://contributors.scala-lang.org/t/pre-sip-improve-for-comprehensions-functionality/3509/21?u=smarter. |
My two cents on this… I think there are a number of historical problems here which have compounded to create the situation we're in today. I'm not entirely sure how to unravel it, but in the interests of clarity of discussion, it's worth stating them as I see them. Untyped DesugaringRather than the desugaring of All of this means that, while it's theoretically possible to leverage enrichments and typeclasses to push I don't see a realistic way to correct this issue without breaking vast swathes of the ecosystem, which is to say, I think we're stuck with it. Inconsistent User ExpectationsDue to some of the vagaries of how for (TestExtract() <- List(1, 2, 3)) yield () Why would you expect this to produce an empty list? val TestExtract() = List(1, 2, 3) That produces a What I'm saying is that this kind of user expectation paints us into a very inconsistent semantic corner. At the risk of sounding dismissive, the expectation is wrong and does not align with the rest of the language. Side EffectsAnd here's the other problem: many users have some wildly problematic expectations on how side effects are handled within the desguaring: for {
x <- List(1, 2, 3)
y = x * x
if y * x > 4
z <- 0 until y
} yield y + x + z There's no particular reason that this cannot be desugared into the following: List(1, 2, 3).filter(x => (x * x) * x > 4) flatMap { x =>
val y = x * x
0.until(y).map(y + x + _)
} The user certainly cannot see a difference except when using a stop watch. The problem is that this cannot be thus desugared without breaking expectations: for {
x <- List(1, 2, 3)
y = {
println("derp")
x * x
}
if y * x > 4
z <- 0 until y
} yield y + x + z Users expect that the We could also feasibly do something like this: List(1, 2, 3).map(x => (x, x * x)).filter(_._2 > 4) flatMap {
case (x, y) => 0.until(y).map(y + x + _)
} This works (even with side effects)! But it also assumes the container is fully polymorphic and imposes extra boxing (as well as other, more deeply subtle and now mostly obsolete issues, such as confusing …and moreThis is a real problem that's going to be a very serious issue for a meaningful chunk of the ecosystem when migrating to Scala 3. For example: typelevel/cats-effect#749 In case it isn't clear, the problem there is being caused by the unnecessary trailing So really, I don't want this to be underestimated: this is going to cause a lot of friction in the Scala 3 migration if not addressed in some way. A Modest ProposalTrailing MapEliminating trailing identity for (x <- List(1, 2, 3)) yield x …should desugar into simply Partial PatternsSilent filtering instead of a more uniform Limited Polymorphic AssumptionWe should bite the bullet and make the assumption that tuples can be stuffed into whatever we're mapping over. Thus: for {
x <- List(1, 2, 3)
y = x * x
if y * x > 4
z <- 0 until y
} yield y + x + z
// should desugar into...
List(1, 2, 3).map(x => (x, x * x)).filter(_._2 > 4) flatMap {
case (x, y) => 0.until(y).map(y + x + _)
} This is no more problematic than the current assumptions around Taken together, I believe these last two changes entirely remove the need for SummaryWe still are left with a Needless to say, we could keep the old desugaring available under the |
Can new syntax or keyword be added to yield monad, not value? for {
a <- readHello
b <- readWorld
} yieldF say(a + b)
// desugars to
readHello.flatMap { a =>
readWorld.flatMap { b =>
say(a + b)
}
} |
I think
I like it, it'd be interesting to tweak scalac to do this and see what breaks in the community build. |
@scf37 we'd like to fix the existing stuff before adding new stuff on top of it :). |
@smarter I agree that would be a fascinating experiment, and very helpful in determining the right approach here. I think such an experiment should be done with both the tupling and the trailing identity map elimination (which seems the most likely to be innocuous). If -strict already has sane semantics for non-falsifiable patterns in for, then I’m quite happy on that front. |
I do not understand what you are proposing. The code example you show is already desugared in the way you suggest. You can check it as follows: Welcome to Scala 2.13.1 (OpenJDK 64-Bit Server VM, Java 1.8.0_121).
Type in expressions for evaluation. Or try :help.
scala> scala.reflect.runtime.universe.reify {
| for {
| x <- List(1, 2, 3)
| y = x * x
| if y * x > 4
| z <- 0 until y
| } yield y + x + z
| }
res0: reflect.runtime.universe.Expr[List[Int]] =
Expr[List[Int]](List.apply(1, 2, 3).map(((x) => {
val y = x.$times(x);
Tuple2.apply(x, y)
})).withFilter(((x$1) => x$1: @unchecked match {
case Tuple2((x @ _), (y @ _)) => y.$times(x).$greater(4)
})).flatMap(((x$2) => x$2: @unchecked match {
case Tuple2((x @ _), (y @ _)) => Predef.intWrapper(0).until(y).map(((z) => y.$plus(x).$plus(z)))
}))) The only meaningful difference with your desugaring is the use of What do you mean by "the current assumptions around
Actually, there are very good reasons not to do that: not only do you duplicate code (what if we had |
Cool, so let's just remove
Simply referring to the fact that, in order to participate in
As I said, effects. Expensive work in this case is effectively a sort of effect. The point being that people expect that What I'm trying to avoid in all of this is a |
I'm abolutely all for removing the trailing map when the yielded expression is a value of the previous expression – the trailing map makes no sense at all, it's harmful AND there's no way to abuse it meaningfully even if you wanted to. (It can't be used as a "finalizing" symbol for a builder because you can't distinguish between the final map and all the other maps in the middle of the expression.) |
What is your rationale for removing
What foibles and such?
Why is that a weird expectation? Do you know of any other language that have |
Well, in Haskell, purity ensures you can't tell — so that's up to the optimizer. |
Yes, the GHC optimizer could duplicate runtime work if it wanted, but it generally tries very hard not to, because this could have very bad consequences on program performance — for example, see the comments in this paper about not floating let bindings into lambdas (which is similar to what was proposed here). I am not aware of GHC optimizations which duplicate runtime work, and users can safely assume that their expensive computations will not be done several times. This is even the original motivations for the monomorphism restriction. And the Haskell language reference defines the semantics of list comprehension (on which monad comprehension is based, which is the closest thing to Scala's |
It considerably complicates the definition of what This kind of "eager, except when magically lazy, but only right here" type of thing has been tried and I think is broadly considered to be a failed experiment by the ecosystem. Maybe I'm being a bit bold in asserting that, but flip-flopping between runtime semantics is something that most of the "pure FP Scala" ecosystem avoids very aggressively because it's just hard to reason about. Eager datatypes are eager, lazy datatypes are lazy, and the only exceptions to this are very, very, very carefully reasoned and justified (e.g. the In other words, I think the expectation of laziness even when the underlying datatype is eager is in and of itself a problem. I mean, there are more important windmills to tilt, so if this is going to become a Thing™, then I'll just drop it. Removal of the trailing map along with the
You can just define
Haskell has uniformly by-need semantics. This means that at the very least, all of its data structures are lazy, meaning that |
Well, you can just say "it uses I don't think
The fact that The ironic thing is that the above holds precisely because of our pesky terminating So it seems we're getting to the bottom of the problem. I would agree with you if you made the following point: using I don't know if it would be feasible to change Maybe an alternative comprehension syntax could be used, like
(A nitpick, but this is just plain false. Haskell is non-strict, and has plenty of ways to control the amount of strictness you want in your programs, both in the language itself and in the compiler options — see: bang patterns,
What we are talking about here is whether or not to duplicate code, which is what you originally suggested. Of course, duplicating code is going to duplicate runtime work in the situation at hand, even in Haskell. |
Inventing a new comprehension syntax from scratch, would give us an opportunity to learn from successes in this area in other language, for example F# Computation Expressions. There the syntax for monadic bindings is very similar to normal one, you just append the |
there is a new SIP in this area: scala/improvement-proposals#79 |
Opening this issue, as suggested by Martin, to provide a place to discuss the individual warts brought up in the blog post Warts of the Scala Programming Language and the possibility of mitigating/fixing them in Dotty (and perhaps later in Scala 2.x). These are based on Scala 2.x behavior, which I understand Dotty follows closely, apologies in advance if it has already been fixed
Scala lets you write for-comprehensions, which are converted into a chain
of
flatMap
s anmap
s as shown below:I have nicely formatted the desugared code for you, but you can try this
yourself in the Ammonite Scala REPL to
verify that this is what the for-comprehension gets transformed into.
This is a convenient way of implementing nested loops over lists, and happily
works with things that aren't lists:
Option
s (as shown above),Future
s,and many other things.
You can also assign local values within the for-comprehension, e.g.
The syntax is a bit wonky (you don't need a
val
, you can't definedef
s orclass
es or run imperative commands without_ = println("debug")
) but forsimple local assignments it works. You may expect the above code to be
transformed into something like this
But it turns out it instead gets converted into something like this:
Although it is roughly equivalent, and ends up with the same result in most
cases, this output format is tremendously convoluted and wastefully inefficient
(e.g. creating and taking-apart unnecessary
Tuple2
s). As far as I can tell,there is no reason at all not to generated the simpler version of the code
shown above.
PostScript:
Notably, these two desugarings do not always produce the same results! The current desugaring behaves weirdly in certain cases; here is one that just bit me an hour ago:
This specific problem has gone away in 2.12 because
Either
doesn't need.right
anymore, but the language-level problem is still there:y = 2
ends up causing strange, difficult-to-debug errors due to the weirdness of the desugaring. This would not be an issue at all given the desugaring I proposed.The text was updated successfully, but these errors were encountered: