-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SIP-62 - For comprehension improvements #79
Merged
+381
−0
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,381 @@ | ||
--- | ||
layout: sip | ||
permalink: /sips/:title.html | ||
stage: design | ||
status: submitted | ||
title: SIP-62 - For comprehension improvements | ||
--- | ||
|
||
**By: Kacper Korban (VirtusLab)** | ||
|
||
## History | ||
|
||
| Date | Version | | ||
|---------------|--------------------| | ||
| June 6th 2023 | Initial Draft | | ||
| Feb 15th 2024 | Reviewed Version | | ||
|
||
## Summary | ||
|
||
`for`-comprehensions in Scala 3 improved their usability in comparison to Scala 2, but there are still some pain points relating both usability of `for`-comprehensions and simplicity of their desugaring. | ||
|
||
This SIP tries to address some of those problems, by changing the specification of `for`-comprehensions. From user perspective, the biggest change is allowing aliases at the start of the `for`-comprehensions. e.g. | ||
|
||
``` | ||
for { | ||
x = 1 | ||
y <- Some(2) | ||
} yield x + y | ||
``` | ||
|
||
## Motivation | ||
|
||
There are some clear pain points related to Scala'3 `for`-comprehensions and those can be divided into two categories: | ||
|
||
1. User-facing and code simplicity problems | ||
|
||
Specifically, for the following example written in a Haskell-style do-comprehension | ||
|
||
```haskell | ||
do | ||
a = largeExpr(arg) | ||
b <- doSth(a) | ||
combineM(a, b) | ||
``` | ||
in Scala we would have to write | ||
|
||
```scala | ||
val a = largeExpr(b) | ||
for | ||
b <- doSth(a) | ||
x <- combineM(a, b) | ||
yield x | ||
``` | ||
|
||
This complicates the code, even in this simple example. | ||
2. The simplicity of desugared code | ||
|
||
The second pain point is that the desugared code of `for`-comprehensions can often be surprisingly complicated. | ||
|
||
e.g. | ||
```scala | ||
for | ||
a <- doSth(arg) | ||
b = a | ||
yield a + b | ||
``` | ||
|
||
Intuition would suggest for the desugared code will be of the form | ||
|
||
```scala | ||
doSth(arg).map { a => | ||
val b = a | ||
a + b | ||
} | ||
``` | ||
|
||
But because of the possibility of an `if` guard being immediately after the pure alias, the desugared code is of the form | ||
|
||
```scala | ||
doSth(arg).map { a => | ||
val b = a | ||
(a, b) | ||
}.map { case (a, b) => | ||
a + b | ||
} | ||
``` | ||
|
||
These unnecessary assignments and additional function calls not only add unnecessary runtime overhead but can also block other optimizations from being performed. | ||
|
||
## Proposed solution | ||
|
||
This SIP suggests the following changes to `for` comprehensions: | ||
|
||
1. Allow `for` comprehensions to start with pure aliases | ||
|
||
e.g. | ||
```scala | ||
for | ||
a = 1 | ||
b <- Some(2) | ||
c <- doSth(a) | ||
yield b + c | ||
``` | ||
2. Simpler conditional desugaring of pure aliases. i.e. whenever a series of pure aliases is not immediately followed by an `if`, use a simpler way of desugaring. | ||
|
||
e.g. | ||
```scala | ||
for | ||
a <- doSth(arg) | ||
b = a | ||
yield a + b | ||
``` | ||
|
||
will be desugared to | ||
|
||
```scala | ||
doSth(arg).map { a => | ||
val b = a | ||
a + b | ||
} | ||
``` | ||
|
||
but | ||
|
||
```scala | ||
for | ||
a <- doSth(arg) | ||
b = a | ||
if b > 1 | ||
yield a + b | ||
``` | ||
|
||
will be desugared to | ||
|
||
```scala | ||
doSth(arg).map { a => | ||
val b = a | ||
(a, b) | ||
}.withFilter { case (a, b) => | ||
b > 1 | ||
}.map { case (a, b) => | ||
a + b | ||
} | ||
``` | ||
|
||
3. Avoiding redundant `map` calls if the yielded value is the same as the last bound value. | ||
|
||
e.g. | ||
```scala | ||
for | ||
a <- List(1, 2, 3) | ||
yield a | ||
``` | ||
|
||
will just be desugared to | ||
|
||
```scala | ||
List(1, 2, 3) | ||
``` | ||
|
||
### Detailed description | ||
|
||
#### Ad 1. Allow `for` comprehensions to start with pure aliases | ||
|
||
Allowing `for` comprehensions to start with pure aliases is a straightforward change. | ||
|
||
The Enumerators syntax will be changed from: | ||
|
||
``` | ||
Enumerators ::= Generator {semi Enumerator | Guard} | ||
``` | ||
to | ||
``` | ||
Enumerators ::= {Pattern1 `=' Expr semi} Generator {semi Enumerator | Guard} | ||
``` | ||
Which will allow adding 0 or more aliases before the first generator. | ||
When desugaring is concerned, a for comprehension starting with pure aliases will generate a block with those aliases as `val` declarations and the rest of the desugared `for` as an expression. Unless the aliases are followed by a guard, then the desugaring should result in an error. | ||
New desugaring rule will be added: | ||
```scala | ||
For any N: | ||
for (P_1 = E_1; ... P_N = E_N; ...) | ||
==> | ||
{ | ||
val x_2 @ P_2 = E_2 | ||
... | ||
val x_N @ P_N = E_N | ||
for (...) | ||
} | ||
``` | ||
|
||
e.g. | ||
|
||
```scala | ||
for | ||
a = 1 | ||
b <- Some(2) | ||
c <- doSth(a) | ||
yield b + c | ||
``` | ||
|
||
will desugar to | ||
|
||
```scala | ||
{ | ||
val a = 1 | ||
for | ||
b <- Some(2) | ||
c <- doSth(a) | ||
yield b + c | ||
} | ||
``` | ||
|
||
#### Ad 2. Simpler conditional desugaring of pure aliases. i.e. whenever a series of pure aliases is not immediately followed by an `if`, use a simpler way of desugaring. | ||
|
||
Currently, for consistency, all pure aliases are desugared as if they are followed by an `if` condition. Which makes the desugaring more complicated than expected. | ||
|
||
e.g. | ||
|
||
The following code: | ||
|
||
```scala | ||
for | ||
a <- doSth(arg) | ||
b = a | ||
yield a + b | ||
``` | ||
|
||
will be desugared to: | ||
|
||
```scala | ||
doSth(arg).map { a => | ||
val b = a | ||
(a, b) | ||
}.map { case (a, b) => | ||
a + b | ||
} | ||
``` | ||
|
||
The proposed change is to introduce a simpler desugaring for common cases, when aliases aren't followed by a guard, and keep the old desugaring method for the other cases. | ||
|
||
A new desugaring rules will be introduced for simple desugaring. | ||
|
||
```scala | ||
For any N: | ||
for (P <- G; P_1 = E_1; ... P_N = E_N; ...) | ||
==> | ||
G.flatMap (P => for (P_1 = E_1; ... P_N = E_N; ...)) | ||
|
||
And: | ||
|
||
for () yield E ==> E | ||
|
||
(Where empty for-comprehensions are excluded by the parser) | ||
``` | ||
|
||
It delegares desugaring aliases to the newly introduced rule from the previous impreovement. i.e. | ||
|
||
```scala | ||
For any N: | ||
for (P_1 = E_1; ... P_N = E_N; ...) | ||
==> | ||
{ | ||
val x_2 @ P_2 = E_2 | ||
... | ||
val x_N @ P_N = E_N | ||
for (...) | ||
} | ||
``` | ||
|
||
One other rule also has to be changed, so that the current desugaring method, of passing all the aliases in a tuple with the result, will only be used when desugaring a generator, followed by some aliases, followed by a guard. | ||
|
||
```scala | ||
For any N: | ||
for (P <- G; P_1 = E_1; ... P_N = E_N; if E; ...) | ||
==> | ||
for (TupleN(P, P_1, ... P_N) <- | ||
for (x @ P <- G) yield { | ||
val x_1 @ P_1 = E_2 | ||
... | ||
val x_N @ P_N = E_N | ||
TupleN(x, x_1, ..., x_N) | ||
}; if E; ...) | ||
``` | ||
|
||
This changes will make the desugaring work in the following way: | ||
|
||
```scala | ||
for | ||
a <- doSth(arg) | ||
b = a | ||
yield a + b | ||
``` | ||
|
||
will be desugared to | ||
|
||
```scala | ||
doSth(arg).map { a => | ||
val b = a | ||
a + b | ||
} | ||
``` | ||
|
||
but | ||
|
||
```scala | ||
for | ||
a <- doSth(arg) | ||
b = a | ||
if b > 1 | ||
yield a + b | ||
``` | ||
|
||
will be desugared to | ||
KacperFKorban marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
```scala | ||
doSth(arg).map { a => | ||
val b = a | ||
(a, b) | ||
}.withFilter { case (a, b) => | ||
b > 1 | ||
}.map { case (a, b) => | ||
a + b | ||
} | ||
``` | ||
|
||
#### Ad 3. Avoiding redundant `map` calls if the yielded value is the same as the last bound value. | ||
|
||
This change is strictly an optimization. This allows for the compiler to get rid of the final `map` call, if the yielded value is the same as the last bound pattern. The pattern can be either a single variable binding or a tuple. | ||
|
||
One desugaring rule has to be modified for this purpose. | ||
|
||
```scala | ||
for (P <- G) yield P ==> G | ||
If P is a variable or a tuple of variables and G is not a withFilter. | ||
|
||
for (P <- G) yield E ==> G.map (P => E) | ||
Otherwise | ||
``` | ||
|
||
e.g. | ||
```scala | ||
for | ||
a <- List(1, 2, 3) | ||
yield a | ||
``` | ||
|
||
will just be desugared to | ||
|
||
```scala | ||
List(1, 2, 3) | ||
``` | ||
|
||
### Compatibility | ||
|
||
This change may change the semantics of some programs. It may remove some `map` calls in the desugared code, which may change the program semantics (if the `map` implementation was side-effecting). | ||
|
||
For example the following code will now have only one `map` call, instead of two: | ||
```scala | ||
for | ||
a <- doSth(arg) | ||
b = a | ||
yield a + b | ||
``` | ||
|
||
### Other concerns | ||
|
||
As far as I know, there are no widely used Scala 3 libraries that depend on the desugaring specification of `for`-comprehensions. | ||
|
||
## Links | ||
|
||
1. Scala contributors discussion thread (pre-SIP): https://contributors.scala-lang.org/t/pre-sip-improve-for-comprehensions-functionality/3509/51 | ||
2. Github issue discussion about for desugaring: https://github.com/lampepfl/dotty/issues/2573 | ||
3. Scala 2 implementation of some of the improvements: https://github.com/oleg-py/better-monadic-for | ||
4. Implementation of one of the simplifications: https://github.com/lampepfl/dotty/pull/16703 | ||
5. Draft implementation branch: https://github.com/dotty-staging/dotty/tree/improved-fors |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Elsewhere, we discussed how making the final
map
optional would be a breaking change. But what if we went the haskell route, and only removed the finalmap
if the finalyield
is also removed?map
from the generated codeyield
from the source code!map
andyield
is direct:yield
meansmap
, no-yield
means no-map
. No "magic" map-removal depending on the shape of the finalyield
expressiondo
or F#'sreturn!
in computation expressionsSeems like a win-win-win-win-win-win overall
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an interesting possibility. Parsing will be a problem since we don't know anymore whether the next token sequence is a pattern or an expression. I am sure it can be overcome, but don't know how much complexity it would add to the parser.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did consider this change, but I didn't manage to get it to work in the parser. (I tried going in the reflect-like direction to get better syntax for this case)
But if we were to be able to distinguish a pattern from an expression, then yet another problem is automatically solved -- we can get rid of the unnecessary
_ <-
with monadic 'unit' expressions.One workaround from a pre-SIP is to have a
yield <- monadicExpr
syntax.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yield <-
looks super ugly syntactically IMO, but semantically it's exactly the same asyield from
in python andreturn!
in F#. Maybe we can tweak the syntax to make it prettier, but the idea of using a keyword to work around parser ambiguity seems reasonable if a 'direct' syntwx proves difficult to implementOne issue here is we have two things we may want to simplify
_ <- too
_ = foo
For non-trailing raw expressions in a
for
block, we can only assign one meaning, and the other meaning will still have to be explicitly spelled out as above. Maybe that's ok, but it's something worth consideringThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be controversial to reuse
return
as inThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be against reviving
return
for several reasons, including: 1) yet another syntax to learn besidesdo
andyield
, but worse: 2) my experience is that many learners have all sorts of connotations on thereturn
keyword with all sorts of misconceptions about how it works in Scala so I think it is best to say "don't use return" and for some it takes a while not to end everything with return if they are used to that...