-
Notifications
You must be signed in to change notification settings - Fork 28
SIP-60 - Alternative bind patterns #74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
8888174
cf84351
50e2c8e
bd103cf
c81471b
5d039d5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,331 @@ | ||
--- | ||
layout: sip | ||
permalink: /sips/:title.html | ||
stage: pre-sip | ||
status: submitted | ||
presip-thread: https://contributors.scala-lang.org/t/pre-sip-bind-variables-for-alternative-patterns/6321/13 | ||
title: SIP-60 - Bind variables within alternative patterns | ||
--- | ||
|
||
**By: Yilin Wei** | ||
|
||
## History | ||
|
||
| Date | Version | | ||
|---------------|--------------------| | ||
| Sep 17th 2023 | Initial Draft | | ||
| Jan 16th 2024 | Amendments | | ||
|
||
## Summary | ||
|
||
Pattern matching is one of the most commonly used features in Scala by beginners and experts alike. Most of | ||
the features of pattern matching compose beautifully — for example, a user who learns about bind variables | ||
and guard patterns can mix the two features intuitively. | ||
|
||
One of the few outstanding cases where this is untrue, is when mixing bind variables and alternative patterns. The part of | ||
current [specification](https://scala-lang.org/files/archive/spec/2.13/08-pattern-matching.html) which we are concerned with is under section **8.1.12** and is copied below, with the relevant clause | ||
highlighted. | ||
|
||
> … All alternative patterns are type checked with the expected type of the pattern. **They may not bind variables other than wildcards**. The alternative … | ||
|
||
We propose that this restriction be lifted and this corner case be eliminated. | ||
|
||
Removing the corner case would make the language easier to teach, reduce friction and allow users to express intent in a more natural manner. | ||
|
||
## Motivation | ||
|
||
## Scenario | ||
|
||
The following scenario is shamelessly stolen from [PEP 636](https://peps.python.org/pep-0636), which introduces pattern matching to the | ||
Python language. | ||
|
||
Suppose a user is writing classic text adventure game such as [Zork](https://en.wikipedia.org/wiki/Zork). For readers unfamiliar with | ||
text adventure games, the player typically enters freeform text into the terminal in the form of commands to interact with the game | ||
world. Examples of commands might be `"pick up rabbit"` or `"open door"`. | ||
|
||
Typically, the commands are tokenized and parsed. After a parsing stage we may end up with a encoding which is similar to the following: | ||
|
||
```scala | ||
enum Word | ||
case Get, North, Go, Pick, Up | ||
case Item(name: String) | ||
|
||
case class Command(words: List[Word]) | ||
``` | ||
|
||
In this encoding, the string `pick up jar`, would be parsed as `Command(List(Pick, Up, Item("jar")))`. | ||
|
||
Once the command is parsed, we want to actually *do* something with the command. With this particular encoding, | ||
we would naturally reach for a pattern match — in the simplest case, we could get away with a single recursive function for | ||
our whole program. | ||
|
||
Suppose we take the simplest example where we want to match on a command like `"north"`. The pattern match consists of | ||
matching on a single stable identifier, `North` and the code would look like this: | ||
|
||
~~~ scala | ||
import Command.* | ||
|
||
def loop(cmd: Command): Unit = | ||
cmd match | ||
case Command(North :: Nil) => // Code for going north | ||
~~~ | ||
|
||
However as we begin play-testing the actual text adventure, we observe that users type `"go north"`. We decide | ||
our program should treat the two distinct commands as synonyms. At this point we would reach for an alternative pattern `|` and | ||
refactor the code like so: | ||
|
||
~~~ scala | ||
case Command(North :: Nil | Go :: North :: Nil) => // Code for going north | ||
~~~ | ||
|
||
This clearly expresses our intent that the two commands map to the same underlying logic. | ||
|
||
Later we decide that we want more complex logic in our game; perhaps allowing the user to pick up | ||
items with a command like `pick up jar`. We would then extend our function with another case, binding the variable `name`: | ||
|
||
~~~ scala | ||
case Command(Pick :: Up :: Item(name) :: Nil) => // Code for picking up items | ||
~~~ | ||
|
||
Again, we might realise through our play-testing that users type `get` as a synonym for `pick up`. After playing around | ||
with alternative patterns, we may reasonably write something like: | ||
|
||
~~~ scala | ||
case Command(Pick :: Up :: Item(name) :: Nil | Get :: Item(name) :: Nil) => // Code for picking up items | ||
~~~ | ||
|
||
Unfortunately at this point, we are stopped in our tracks by the compiler. The bind variable for `name` cannot be used in conjunction with alternative patterns. | ||
We must either choose a different encoding. We carefully consult the specification and that this is not possible. | ||
|
||
We can, of course, work around it by hoisting the logic to a helper function to the nearest scope which function definitions: | ||
|
||
~~~ scala | ||
def loop(cmd: Cmd): Unit = | ||
def pickUp(item: String): Unit = // Code for picking up item | ||
cmd match | ||
case Command(Pick :: Up :: Item(name)) => pickUp(name) | ||
case Command(Get :: Item(name)) => pickUp(name) | ||
~~~ | ||
|
||
Or any number of different encodings. However, all of them are less intuitive and less obvious than the code we tried to write. | ||
|
||
## Commentary | ||
|
||
Removing the restriction leads to more obvious encodings in the case of alternative patterns. Arguably, the language | ||
would be simpler and easier to teach — we do not have to remember that bind patterns and alternatives | ||
do not mix and need to teach newcomers the workarounds. | ||
|
||
For languages which have pattern matching, a significant number also support the same feature. Languages such as [Rust](https://github.com/rust-lang/reference/pull/957) and [Python](https://peps.python.org/pep-0636/#or-patterns) have | ||
supported it for some time. While | ||
this is not a great reason for Scala to do the same, having the feature exist in other languages means that users | ||
that are more likely to expect the feature. | ||
|
||
A smaller benefit for existing users, is that removing the corner case leads to code which is | ||
easier to review; the absolute code difference between adding a bind variable within an alternative versus switching to a different | ||
encoding entirely is smaller and conveys the intent of such changesets better. | ||
|
||
It is acknowledged, however, that such cases where we share the same logic with an alternative branches are relatively rare compared to | ||
the usage of pattern matching in general. The current restrictions are not too arduous to workaround for experienced practitioners, which | ||
can be inferred from the relatively low number of comments from the original [issue](https://github.com/scala/bug/issues/182) first raised in 2007. | ||
|
||
To summarize, the main arguments for the proposal are to make the language more consistent, simpler and easier to teach. The arguments | ||
against a change are that it will be low impact for the majority of existing users. | ||
|
||
## Proposed solution | ||
|
||
Removing the alternative restriction means that we need to specify some additional constraints. Intuitively, we | ||
need to consider the restrictions on variable bindings within each alternative branch, as well as the types inferred | ||
for each binding within the scope of the pattern. | ||
|
||
## Bindings | ||
|
||
The simplest case of mixing an alternative pattern and bind variables, is where we have two `UnApply` methods, with | ||
a single alternative pattern. For now, we specifically only consider the case where each bind variable is of the same | ||
type, like so: | ||
|
||
~~~ scala | ||
enum Foo: | ||
case Bar(x: Int) | ||
case Baz(y: Int) | ||
|
||
def fun = this match | ||
case Bar(z) | Baz(z) => ... // z: Int | ||
~~~ | ||
|
||
For the expression to make sense with the current semantics around pattern matches, `z` must be defined in both branches; otherwise the | ||
case body would be nonsensical if `z` was referenced within it (see [missing variables](#missing-variables) for a proposed alternative). | ||
|
||
Removing the restriction would also allow recursive alternative patterns: | ||
|
||
~~~ scala | ||
enum Foo: | ||
case Bar(x: Int) | ||
case Baz(x: Int) | ||
|
||
enum Qux: | ||
case Quux(y: Int) | ||
case Corge(x: Foo) | ||
|
||
def fun = this match | ||
case Quux(z) | Corge(Bar(z) | Baz(z)) => ... // z: Int | ||
~~~ | ||
|
||
Using an `Ident` within an `UnApply` is not the only way to introduce a binding within the pattern scope. | ||
We also expect to be able to use an explicit binding using an `@` like this: | ||
|
||
~~~ scala | ||
enum Foo: | ||
case Bar() | ||
case Baz(bar: Bar) | ||
|
||
def fun = this match | ||
case Baz(x) | x @ Bar() => ... // x: Foo.Bar | ||
~~~ | ||
|
||
## Types | ||
|
||
We propose that the type of each variable introduced in the scope of the pattern be the least upper-bound of the type | ||
inferred within within each branch. | ||
|
||
~~~ scala | ||
enum Foo: | ||
case Bar(x: Int) | ||
case Baz(y: String) | ||
|
||
def fun = this match | ||
case Bar(x) | Baz(x) => // x: Int | String | ||
~~~ | ||
|
||
We do not expect any inference to happen between branches. For example, in the case of a GADT we would expect the second branch of | ||
the following case to match all instances of `Bar`, regardless of the type of `A`. | ||
|
||
~~~ scala | ||
enum Foo[A]: | ||
case Bar(a: A) | ||
case Baz(i: Int) extends Foo[Int] | ||
|
||
def fun = this match | ||
case Baz(x) | Bar(x) => // x: Int | A | ||
~~~ | ||
|
||
### Given bind variables | ||
|
||
It is possible to introduce bindings to the contextual scope within a pattern match branch. | ||
|
||
Since most bindings will be anonymous but be referred to within the branches, we expect the _types_ present in the contextual scope for each branch to be the same rather than the _names_. | ||
|
||
~~~ scala | ||
case class Context() | ||
|
||
def run(using ctx: Context): Unit = ??? | ||
|
||
enum Foo: | ||
case Bar(ctx: Context) | ||
case Baz(i: Int, ctx: Context) | ||
|
||
def fun = this match | ||
case Bar(given Context) | Baz(_, given Context) => run // `Context` appears in both branches | ||
~~~ | ||
|
||
This begs the question of what to do in the case of an explicit `@` binding where the user binds a variable to the same _name_ but to different types. We can either expose a `String | Int` within the contextual scope, or simply reject the code as invalid. | ||
|
||
~~~ scala | ||
enum Foo: | ||
case Bar(s: String) | ||
case Baz(i: Int) | ||
|
||
def fun = this match | ||
case Bar(x @ given String) | Baz(x @ given Int) => ??? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would stay simple here and require the types to match, the user can always write There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That was originally my intention, but I didn't like the fact that it "clashes" with the rules set up for named bindings and thought it might be better to be consistent with the named binding rule and hence wouldn't have to carve out an exception in the specification. I don't feel strongly either way; in my opinion if I was binding a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After discussing it in the SIP meeting, we feel that it might be best to not include support for given bind variables in alternatives at least initially, since it opens a can of worms and it's not clear if there is a lot of usecases for it. (But for the record, it seems that if we wanted to support them we would have to require the types to be identical to handle |
||
~~~ | ||
|
||
To be consistent with the named bindings, we argue that the code should compile and a contextual variable added to the scope with the type of `String | Int`. | ||
|
||
### Quoted patterns | ||
|
||
[Quoted patterns](https://docs.scala-lang.org/scala3/guides/macros/quotes.html#quoted-patterns) will not be supported in this SIP and the behaviour of quoted patterns will remain the same as currently i.e. any quoted pattern appearing in an alternative pattern binding a variable or type variable will be rejected as illegal. | ||
|
||
### Alternatives | ||
|
||
#### Enforcing a single type for a bound variable | ||
|
||
We could constrain the type for each bound variable within each alternative branch to be the same type. Notably, this is what languages such as Rust, which do not have sub-typing do. | ||
|
||
However, since untagged unions are part of Scala 3 and the fact that both are represented by the `|`, it felt more natural to discard this restriction. | ||
|
||
#### Type ascriptions in alternative branches | ||
|
||
Another suggestion is that an _explicit_ type ascription by a user ought to be defined for all branches. For example, in the currently proposed rules, the following code would infer the return type to be `Int | A` even though the user has written the statement `id: Int`. | ||
|
||
~~~scala | ||
enum Foo[A]: | ||
case Bar[A](a: A) | ||
case Baz[A](a: A) | ||
|
||
def test = this match | ||
case Bar(id: Int) | Baz(id) => id | ||
~~~ | ||
|
||
In the author's subjective opinion, it is more natural to view the alternative arms as separate branches — which would be equivalent to the function below. | ||
|
||
~~~scala | ||
def test = this match | ||
case Bar(id: Int) => id | ||
case Baz(id) => id | ||
~~~ | ||
|
||
On the other hand, if it is decided that each bound variable ought to be the same type, then arguably "sharing" explicit type ascriptions across branches would reduce boilerplate. | ||
|
||
#### Missing variables | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Regarding missing variables, we need a decision. The spec should either specify this is an error or assign the type There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Even with explicit nulls I would not mix |
||
|
||
Unlike in other languages, we could assign a type, `A | Null`, to a bind variable which is not present in all of the alternative branches. Rust, for example, is constrained by the fact that the size of a variable must be known and untagged unions do not exist. | ||
|
||
Arguably, missing a variable entirely is more likely to be an error — the absence of a requirement for `var` declarations before assigning variables in Python means that beginners can easily assign variables to the wrong variable. | ||
|
||
It may be, that the enforcement of having to have the same bind variables within each branch ought to be left to a linter rather thana a hard restriction within the language itself. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Typo "thana" |
||
|
||
## Specification | ||
|
||
We do not believe there are any syntax changes since the current specification already allows the proposed syntax. | ||
|
||
We propose that the following clauses be added to the specification: | ||
|
||
Let $`p_1 | \ldots | p_n`$ be an alternative pattern at an arbitrary depth within a case pattern and $`\Gamma_n`$ is the named scope associated with each alternative. | ||
|
||
If `p_i` is a quoted pattern binding a variable or type variable, the alternative pattern is considered invalid. Otherwise, let the named variables introduced within each alternative $`p_n`$, be $`x_i \in \Gamma_n`$ and the unnamed contextual variables within each alternative have the type $`T_i \in \Gamma_n`$. | ||
|
||
Each $`p_n`$ must introduce the same set of bindings, i.e. for each $`n`$, $`\Gamma_n`$ must have the same **named** members $`\Gamma_{n+1}`$ and the set of $`{T_0, ... T_n}`$ must be the same. | ||
|
||
If $`X_{n,i}`$, is the type of the binding $`x_i`$ within an alternative $`p_n`$, then the consequent type, $`X_i`$, of the | ||
variable $`x_i`$ within the pattern scope, $`\Gamma`$ is the least upper-bound of all the types $`X_{n, i}`$ associated with | ||
the variable, $`x_i`$ within each branch. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks like the spec doesn't mention the effect of the presence of Ti on the pattern scope. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since the specification isn't complete for Scala 3 (in order for the amendments to be complete, I'd first need to change the specification to reflect the current behaviour in dotty), should I include the changes to the spec to reflect the current behaviour within this SIP as well? Or is a separate PR to the spec enough? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A separate PR to the spec would be great and highly appreciated! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Relevant PR is here. For the record, should the synthesized names part of the spec or are they an implementation detail i.e. should the following be valid code according to the spec? It currently compiles. type MyString = String
("foo", "foo") match:
case (given MyString, given String) => ??? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good question, I'd say they should be part of the spec since they have to be for non-local definitions anyway (for ABI stability) and since it doesn't seem like we need to treat them differently in local patterns. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would be in favour of that interpretation. It leads to a more straightforward interpretation of both:
and
Both are invalid now due to fact that the names introduced in each branch are different (if we don't take the alternative for missing variables), regardless of whatever instances are available in the implicit scope. I can change the given bind variables section to better reflect this interpretation if that would be useful. |
||
## Compatibility | ||
|
||
We believe the changes would be backwards compatible. | ||
|
||
# Related Work | ||
|
||
The language feature exists in multiple languages. Of the more popular languages, Rust added the feature in [2021](https://github.com/rust-lang/reference/pull/957) and | ||
Python within [PEP 636](https://peps.python.org/pep-0636/#or-patterns), the pattern matching PEP in 2020. Of course, Python is untyped and Rust does not have sub-typing | ||
but the semantics proposed are similar to this proposal. | ||
|
||
Within Scala, the [issue](https://github.com/scala/bug/issues/182) first raised in 2007. The author is also aware of attempts to fix this issue by [Lionel Parreaux](https://github.com/dotty-staging/dotty/compare/main...LPTK:dotty:vars-in-pat-alts) and the associated [feature request](https://github.com/lampepfl/dotty-feature-requests/issues/12) which | ||
was not submitted to the main dotty repository. | ||
|
||
The associated [thread](https://contributors.scala-lang.org/t/pre-sip-bind-variables-for-alternative-patterns/6321) has some extra discussion around semantics. Historically, there have been multiple similar suggestions — in [2023](https://contributors.scala-lang.org/t/qol-sound-binding-in-pattern-alternatives/6226) by Quentin Bernet and in [2021](https://contributors.scala-lang.org/t/could-it-be-possible-to-allow-variable-binging-in-patmat-alternatives-for-scala-3-x/5235) by Alexey Shuksto. | ||
|
||
## Implementation | ||
|
||
The author has a current in-progress implementation focused on the typer which compiles the examples with the expected types. Interested | ||
parties are welcome to see the WIP [here](https://github.com/lampepfl/dotty/compare/main...yilinwei:dotty:main). | ||
|
||
### Further work | ||
|
||
#### Quoted patterns | ||
|
||
More investigation is needed to see how quoted patterns with bind variables in alternative patterns could be supported. | ||
|
||
## Acknowledgements | ||
|
||
Many thanks to **Zainab Ali** for proof-reading the draft, **Nicolas Stucki** and **Guillaume Martres** for their pointers on the dotty | ||
compiler codebase. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The link to the "Missing variable" section does not seem to work.