Skip to content

Syntactic issues in pattern matching

graydon edited this page May 30, 2011 · 6 revisions

Syntactic issues in pattern matching

This page captures a bunch of axes in the space of pattern matching syntax.

Allow pattern guards, using if or when or where or |.

Pros

  • very expressive

Cons

  • requires conservative approximation for exhaustiveness checking

Use = or == for non-pattern expressions whose result is matched by ==.

Pros

  • very expressive

Cons

  • requires conservative approximation for exhaustiveness checking
  • could be simulated with just pattern guards

Use and or as for compound patterns.

This is important for destructuring but still being able to bind a name to the destructured thing:

var (x, y) as t = foo();

The right-hand side can in fact be an arbitrary pattern.

Distinguish bindings from assignments by context vs by keyword.

We could allow individual variables in a pattern to be distinguished as assignments vs. bindings by use of a keyword, but always requiring the keyword (such as var) is very verbose (see the discussion of distinguishing variables from tags below). The alternative is to have binding positions for patterns and assignment positions for patterns, and not to distinguish them.

A hybrid approach is to distinguish binding positions and assignment positions, but to optionally allow binding variables in an assignment position. For example:

var (a, b) = foo(); // always binding
(a, b) = bar();         // always assignment
(a, var c) = baz();     // assignment by default but 'var' binds a new variable

For the sake of cleanliness, the var pattern form shouldn’t be allowed in a binding pattern. This suggests two different non-terminals in the grammar:

BindingPattern ::= identifier | BindingPattern 'and' BindingPattern | ...
AssignmentPattern ::= identifier | AssignmentPattern 'and' AssignmentPattern | ...
                   |  'var' BindingPattern

How to distinguish variables from tags.

This is the biggest and subtlest issue in this space. Following are a bunch of alternatives we’ve discussed.

Use ? to mark variables, or to mark variable bindings only.

Cons

  • pattern matching becomes much more noisy and verbose
  • binding positions in the language can’t be defined as just patterns; it depends on whether they are a single variable (no sigil) or a pattern (sigils)
  • maybe looks weird for variables in non-binding position, when used in destructuring assignment

Use var to mark variables, or to mark variable bindings only.

Pros

  • possible to use arbitrary expressions without an = sigil (this is arguably not a pro)

Cons

  • Dave absolutely hates this option :)
  • pattern matching becomes extremely verbose

Use a sigil like $ to mark tags, or to mark nullary tags only.

Cons

  • papercut: nullary patterns are common; if they are a special case that needs disambiguation, programmers will curse us eternally

Use distinct lexical classes for tag names and variables.

This is what ML and Haskell do.

Pros

  • simple, relatively tasteful

Cons

  • Rust has so far avoided any special lexical classes
  • in the C tradition, CapitalizedNames are typically type, module, or class names
  • in the C tradition, ALL_CAPS_NAMES are typically constants

Use == to mark nullary tags.

Pros

  • since nullary tags don’t bind variables, the = case happens to coincide with the right meaning

Cons

  • again, papercut: special case for common case of nullary tags means death of a thousand papercuts

Use the type of the pattern to disambiguate, and disallow the use of tag names of the current pattern’s type as variables.

This is more hostile to tools, and trickier to define and implement, because it imposes a two-way dependency between type checking and variable resolution.

Use the current scope to disambiguate, and disallow the use of tag names as variables.

This is currently Patrick and Dave’s favorite.

Pros

  • better phase separation
  • no lexical classes
  • no special sigils
  • the clashes are probably unlikely to occur in practice
  • all binding and lvalue positions can be specified as patterns

Cons

  • somewhat of a refactoring hazard: introducing a new tag into scope can break a pattern match

Marijn’s feedback:

I was initially enthusiastic about the bottom proposal (disallow tag names as bound names), but after mulling it over for a while I think it is too awkward. Context-insensitive fresh binding names (I’m sure there’s a lambda-calculus word for this, but I unfortunately don’t know it, so I can’t throw it around) are a very nice thing to have. Not just for refactoring, but also for being able to import or define new tags without having some code a few pages down break.

I’m leaning towards the “USE A SIGIL TO MARK TAGS” option. If we mandate a single dot (which currently has no meaning in patterns) after nullary tag names, we have a pretty light-weight way of unambiguously distinguishing them from new bindings. Consider the dot to replace the parenthesized argument list. So you get something like

alt (mylist) {
  case (cons(head, tail)) { /* */ }
  case (nil.)             { /* */ }
}

Graydon’s odd proposal:

Delete tuples from the language as well as implicit concept of tag-args-are-tuples. Tuples are the villain in this whole conversation. Apply a tag to a record if you want them to carry arguments. Support the same sugar we have for tag declarators mixed with args as we have now, just use records not tuples.

Pros

  • No more “expected 7 args but found 6” error-message hazard
  • No more “I added a single field now I have to change 300 occurrences of foo(_,_) to foo(_,_,_) in the code” hazard
  • No more “did I mean foo._6 or foo._7?” hazard
  • No more weird parse cases like “1-arg tuple not possible” or “parens make comma change nature”
  • No more worry about the pattern matching thing on tags. If a tag carries an arg, you name the arg (with or without parens wrapping the juxtaposed arg, depending on whether you want it to look more or less like a function call, I don’t care) and access the field names using arg.foo, arg.bar etc.
  • No more worry about pattern matching on tuples, which (sadly) revive all the same issues as this page is about wrt. tag-args.

Cons

  • Someone who thinks tuples are more mathematically fundamental will cry out in misery and shame?
  • I can’t think of any others.

Seriously, almost all the meaningful code paths in the compiler for record and tuple are identical now anyways, with the exception that the tuple ones are littered with usability hazards. Lots of languages get by just fine with named record-fields only. I think the situation gets much nicer without them.

Pattern match would look like any or all of these:

alt (mylist) {
  case (c: cons) { c.head = ...; }
  case (nil)     { /* */ }
}
alt (mylist) {
  case (cons(c)) { c.head = ...; }
  case (nil)     { /* */ }
}
alt (mylist) {
  case (cons c)  { c.head = ...; }
  case (nil)     { /* */ }
}
alt (mylist) {
  case (cons {head: hd, tail: tl}) { hd = ...; }
  case (nil)     { /* */ }
}
alt (mylist) {
  case (cons {head, tail}) { head = ...; }
  case (nil)     { /* */ }
}

Anyone have more convincing “con” arguments I’m missing? I am really having a hard time pulling together a strong “we need it or else …” motive for keeping tuples.

Clone this wiki locally