Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIP-62 - For comprehension improvements #79

Merged
merged 3 commits into from
Aug 19, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
381 changes: 381 additions & 0 deletions content/better-fors.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,381 @@
---
layout: sip
permalink: /sips/:title.html
stage: design
status: submitted
title: SIP-62 - For comprehension improvements
---

**By: Kacper Korban (VirtusLab)**

## History

| Date | Version |
|---------------|--------------------|
| June 6th 2023 | Initial Draft |
| Feb 15th 2024 | Reviewed Version |

## Summary

`for`-comprehensions in Scala 3 improved their usability in comparison to Scala 2, but there are still some pain points relating both usability of `for`-comprehensions and simplicity of their desugaring.

This SIP tries to address some of those problems, by changing the specification of `for`-comprehensions. From user perspective, the biggest change is allowing aliases at the start of the `for`-comprehensions. e.g.

```
for {
x = 1
y <- Some(2)
} yield x + y
```

## Motivation

There are some clear pain points related to Scala'3 `for`-comprehensions and those can be divided into two categories:

1. User-facing and code simplicity problems

Specifically, for the following example written in a Haskell-style do-comprehension

```haskell
do
a = largeExpr(arg)
b <- doSth(a)
combineM(a, b)
```
in Scala we would have to write

```scala
val a = largeExpr(b)
for
b <- doSth(a)
x <- combineM(a, b)
yield x
```

Copy link
Contributor

@lihaoyi lihaoyi Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elsewhere, we discussed how making the final map optional would be a breaking change. But what if we went the haskell route, and only removed the final map if the final yield is also removed?

    val a = largeExpr(b)
    for
      b <- doSth(a)
      combineM(a, b)
  1. This is currently invalid syntax, so no backwards compat concerns
  2. We get to remove the useless map from the generated code
  3. We also get to remove the useless yield from the source code!
  4. The correspondence between the map and yield is direct: yield means map, no-yield means no-map. No "magic" map-removal depending on the shape of the final yield expression
  5. The final source code is shorter and more concise
  6. The final source code looks more like other languages with similar syntax, e.g. Haskell's do or F#'s return! in computation expressions

Seems like a win-win-win-win-win-win overall

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an interesting possibility. Parsing will be a problem since we don't know anymore whether the next token sequence is a pattern or an expression. I am sure it can be overcome, but don't know how much complexity it would add to the parser.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did consider this change, but I didn't manage to get it to work in the parser. (I tried going in the reflect-like direction to get better syntax for this case)
But if we were to be able to distinguish a pattern from an expression, then yet another problem is automatically solved -- we can get rid of the unnecessary _ <- with monadic 'unit' expressions.

One workaround from a pre-SIP is to have a yield <- monadicExpr syntax.

Copy link
Contributor

@lihaoyi lihaoyi Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yield <- looks super ugly syntactically IMO, but semantically it's exactly the same as yield from in python and return! in F#. Maybe we can tweak the syntax to make it prettier, but the idea of using a keyword to work around parser ambiguity seems reasonable if a 'direct' syntwx proves difficult to implement

we can get rid of the unnecessary _ <- with monadic 'unit' expressions.

One issue here is we have two things we may want to simplify

  • _ <- too
  • _ = foo

For non-trailing raw expressions in a for block, we can only assign one meaning, and the other meaning will still have to be explicitly spelled out as above. Maybe that's ok, but it's something worth considering

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be controversial to reuse return as in

for x <- List(1,2,3) return List(x, x)

Copy link

@bjornregnell bjornregnell Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be against reviving return for several reasons, including: 1) yet another syntax to learn besides do and yield, but worse: 2) my experience is that many learners have all sorts of connotations on the return keyword with all sorts of misconceptions about how it works in Scala so I think it is best to say "don't use return" and for some it takes a while not to end everything with return if they are used to that...

This complicates the code, even in this simple example.
2. The simplicity of desugared code

The second pain point is that the desugared code of `for`-comprehensions can often be surprisingly complicated.

e.g.
```scala
for
a <- doSth(arg)
b = a
yield a + b
```

Intuition would suggest for the desugared code will be of the form

```scala
doSth(arg).map { a =>
val b = a
a + b
}
```

But because of the possibility of an `if` guard being immediately after the pure alias, the desugared code is of the form

```scala
doSth(arg).map { a =>
val b = a
(a, b)
}.map { case (a, b) =>
a + b
}
```

These unnecessary assignments and additional function calls not only add unnecessary runtime overhead but can also block other optimizations from being performed.

## Proposed solution

This SIP suggests the following changes to `for` comprehensions:

1. Allow `for` comprehensions to start with pure aliases

e.g.
```scala
for
a = 1
b <- Some(2)
c <- doSth(a)
yield b + c
```
2. Simpler conditional desugaring of pure aliases. i.e. whenever a series of pure aliases is not immediately followed by an `if`, use a simpler way of desugaring.

e.g.
```scala
for
a <- doSth(arg)
b = a
yield a + b
```

will be desugared to

```scala
doSth(arg).map { a =>
val b = a
a + b
}
```

but

```scala
for
a <- doSth(arg)
b = a
if b > 1
yield a + b
```

will be desugared to

```scala
doSth(arg).map { a =>
val b = a
(a, b)
}.withFilter { case (a, b) =>
b > 1
}.map { case (a, b) =>
a + b
}
```

3. Avoiding redundant `map` calls if the yielded value is the same as the last bound value.

e.g.
```scala
for
a <- List(1, 2, 3)
yield a
```

will just be desugared to

```scala
List(1, 2, 3)
```

### Detailed description

#### Ad 1. Allow `for` comprehensions to start with pure aliases

Allowing `for` comprehensions to start with pure aliases is a straightforward change.

The Enumerators syntax will be changed from:

```
Enumerators ::= Generator {semi Enumerator | Guard}
```
to
```
Enumerators ::= {Pattern1 `=' Expr semi} Generator {semi Enumerator | Guard}
```
Which will allow adding 0 or more aliases before the first generator.
When desugaring is concerned, a for comprehension starting with pure aliases will generate a block with those aliases as `val` declarations and the rest of the desugared `for` as an expression. Unless the aliases are followed by a guard, then the desugaring should result in an error.
New desugaring rule will be added:
```scala
For any N:
for (P_1 = E_1; ... P_N = E_N; ...)
==>
{
val x_2 @ P_2 = E_2
...
val x_N @ P_N = E_N
for (...)
}
```

e.g.

```scala
for
a = 1
b <- Some(2)
c <- doSth(a)
yield b + c
```

will desugar to

```scala
{
val a = 1
for
b <- Some(2)
c <- doSth(a)
yield b + c
}
```

#### Ad 2. Simpler conditional desugaring of pure aliases. i.e. whenever a series of pure aliases is not immediately followed by an `if`, use a simpler way of desugaring.

Currently, for consistency, all pure aliases are desugared as if they are followed by an `if` condition. Which makes the desugaring more complicated than expected.

e.g.

The following code:

```scala
for
a <- doSth(arg)
b = a
yield a + b
```

will be desugared to:

```scala
doSth(arg).map { a =>
val b = a
(a, b)
}.map { case (a, b) =>
a + b
}
```

The proposed change is to introduce a simpler desugaring for common cases, when aliases aren't followed by a guard, and keep the old desugaring method for the other cases.

A new desugaring rules will be introduced for simple desugaring.

```scala
For any N:
for (P <- G; P_1 = E_1; ... P_N = E_N; ...)
==>
G.flatMap (P => for (P_1 = E_1; ... P_N = E_N; ...))

And:

for () yield E ==> E

(Where empty for-comprehensions are excluded by the parser)
```

It delegares desugaring aliases to the newly introduced rule from the previous impreovement. i.e.

```scala
For any N:
for (P_1 = E_1; ... P_N = E_N; ...)
==>
{
val x_2 @ P_2 = E_2
...
val x_N @ P_N = E_N
for (...)
}
```

One other rule also has to be changed, so that the current desugaring method, of passing all the aliases in a tuple with the result, will only be used when desugaring a generator, followed by some aliases, followed by a guard.

```scala
For any N:
for (P <- G; P_1 = E_1; ... P_N = E_N; if E; ...)
==>
for (TupleN(P, P_1, ... P_N) <-
for (x @ P <- G) yield {
val x_1 @ P_1 = E_2
...
val x_N @ P_N = E_N
TupleN(x, x_1, ..., x_N)
}; if E; ...)
```

This changes will make the desugaring work in the following way:

```scala
for
a <- doSth(arg)
b = a
yield a + b
```

will be desugared to

```scala
doSth(arg).map { a =>
val b = a
a + b
}
```

but

```scala
for
a <- doSth(arg)
b = a
if b > 1
yield a + b
```

will be desugared to
KacperFKorban marked this conversation as resolved.
Show resolved Hide resolved

```scala
doSth(arg).map { a =>
val b = a
(a, b)
}.withFilter { case (a, b) =>
b > 1
}.map { case (a, b) =>
a + b
}
```

#### Ad 3. Avoiding redundant `map` calls if the yielded value is the same as the last bound value.

This change is strictly an optimization. This allows for the compiler to get rid of the final `map` call, if the yielded value is the same as the last bound pattern. The pattern can be either a single variable binding or a tuple.

One desugaring rule has to be modified for this purpose.

```scala
for (P <- G) yield P ==> G
If P is a variable or a tuple of variables and G is not a withFilter.

for (P <- G) yield E ==> G.map (P => E)
Otherwise
```

e.g.
```scala
for
a <- List(1, 2, 3)
yield a
```

will just be desugared to

```scala
List(1, 2, 3)
```

### Compatibility

This change may change the semantics of some programs. It may remove some `map` calls in the desugared code, which may change the program semantics (if the `map` implementation was side-effecting).

For example the following code will now have only one `map` call, instead of two:
```scala
for
a <- doSth(arg)
b = a
yield a + b
```

### Other concerns

As far as I know, there are no widely used Scala 3 libraries that depend on the desugaring specification of `for`-comprehensions.

## Links

1. Scala contributors discussion thread (pre-SIP): https://contributors.scala-lang.org/t/pre-sip-improve-for-comprehensions-functionality/3509/51
2. Github issue discussion about for desugaring: https://github.com/lampepfl/dotty/issues/2573
3. Scala 2 implementation of some of the improvements: https://github.com/oleg-py/better-monadic-for
4. Implementation of one of the simplifications: https://github.com/lampepfl/dotty/pull/16703
5. Draft implementation branch: https://github.com/dotty-staging/dotty/tree/improved-fors