Exhaustive integer matching #50912

varkor · 2018-05-20T01:36:27Z

This adds a new feature flag exhaustive_integer_patterns that enables exhaustive matching of integer types by their values. For example, the following is now accepted:

#![feature(exhaustive_integer_patterns)]
#![feature(exclusive_range_pattern)]

fn matcher(x: u8) {
  match x { // ok
    0 .. 32 => { /* foo */ }
    32 => { /* bar */ }
    33 ..= 255 => { /* baz */ }
  }
}

This matching is permitted on all integer (signed/unsigned and char) types. Sensible error messages are also provided. For example:

fn matcher(x: u8) {
  match x { //~ ERROR
    0 .. 32 => { /* foo */ }
  }
}

results in:

error[E0004]: non-exhaustive patterns: `32u8...255u8` not covered
 --> matches.rs:3:9
  |
6 |   match x {
  |         ^ pattern `32u8...255u8` not covered

This implements rust-lang/rfcs#1550 for #50907. While there hasn't been a full RFC for this feature, it was suggested that this might be a feature that obviously complements the existing exhaustiveness checks (e.g. for bool) and so a feature gate would be sufficient for now.

rust-highfive · 2018-05-20T01:36:37Z

r? @michaelwoerister

(rust_highfive has picked a reviewer for you, use r? to override)

scottmcm

I'm definitely fine with merging this without an RFC (feature-gated of course, as you did).

^ pattern 32u8...255u8 not covered

This is amazing even if all it did was give that error instead of just "pattern _ not covered".

scottmcm · 2018-05-21T02:35:26Z

src/test/ui/exhaustive_integer_patterns.rs

+
+    // Let's test other types too!
+    match '\u{0}' {
+        '\u{0}' ..= char::MAX => {} // ok


Since char is a USV, it doesn't actually need to cover that entire range. If it's feasible, could this handle the example in rust-lang/rfcs#1550 (comment) too? (Feel free to punt if hard.)

scottmcm · 2018-05-21T02:38:49Z

src/test/ui/exhaustive_integer_patterns.stderr

+  --> $DIR/exhaustive_integer_patterns.rs:50:11
+   |
+LL |     match x { //~ ERROR non-exhaustive patterns
+   |           ^ patterns `-128i8...-5i8`, `120i8...121i8` and `121i8...127i8` not covered


Could adjacent ranges be merged, or at least not overlap? 121 is in two of the uncovered ranges, and it would be nice if the message was "patterns -128i8...-5i8 and 120i8...127i8 not covered" instead.

scottmcm · 2018-05-21T02:42:11Z

src/test/ui/exhaustive_integer_patterns.stderr

+LL |     match x { //~ ERROR non-exhaustive patterns
+   |           ^ patterns `-128i8...-5i8`, `120i8...121i8` and `121i8...127i8` not covered
+
+error: aborting due to 5 previous errors


All the holes in the examples are non-singleton. Consider a UI test for something like -127i8..=127 or match 0 { i16::MIN..=-1 => {} 1..=i16::MAX => {} }.

scottmcm · 2018-05-21T02:43:54Z

src/librustc_mir/hair/pattern/_match.rs

+            value_constructors = true;
+            vec![ConstantRange(ty::Const::from_bits(cx.tcx, min, cx.tcx.types.char),
+                               ty::Const::from_bits(cx.tcx, max, cx.tcx.types.char),
+                               RangeEnd::Included)]


Hopefully the char request is just to initialize this as a two-element vec instead...

michaelwoerister · 2018-05-21T10:30:14Z

r? @eddyb maybe?

eddyb · 2018-05-21T10:32:45Z

src/librustc_mir/hair/pattern/_match.rs

+                I16   => min_max_ty!(i16, u16, cx.tcx.types.i16),
+                I32   => min_max_ty!(i32, u32, cx.tcx.types.i32),
+                I64   => min_max_ty!(i64, u64, cx.tcx.types.i64),
+                I128  => min_max_ty!(i128, u128, cx.tcx.types.i128),


cc @oli-obk This sort of thing can be done from the bit width (which you can get from the layout of these types).

eddyb · 2018-05-21T10:34:22Z

src/librustc_mir/hair/pattern/_match.rs

+struct Interval<'tcx> {
+    pub lo: u128,
+    pub hi: u128,
+    pub ty: Ty<'tcx>,


You should use ty::layout::Integer here or similar.

eddyb · 2018-05-21T10:36:13Z

src/librustc_mir/hair/pattern/_match.rs

+/// An inclusive interval, used for precise integer exhaustiveness checking.
+struct Interval<'tcx> {
+    pub lo: u128,
+    pub hi: u128,


Maybe you can use RangeInclusive<u128> instead?

eddyb · 2018-05-21T10:39:27Z

src/librustc_mir/hair/pattern/_match.rs

+        remaining_ranges.into_iter().map(|(lo, hi)| {
+            let (lo, hi) = Interval::offset_sign(ty, (lo, hi), false);
+            ConstantRange(ty::Const::from_bits(cx.tcx, lo, ty),
+                          ty::Const::from_bits(cx.tcx, hi, ty),


Should ConstantRange use ty::Const? Seems expensive, and useless unless the ty::Const is Bits (cc @oli-obk).

We should definitely be using ConstValue instead of ty::Const. I put it on my TODO list

ConstantRange can only hold integers, so ConstValue shouldn't be needed, right?

Probably. My TODO already says "ConstValue or smaller" ;). That should not block this PR though, since it's a preexisting issue.

eddyb · 2018-05-21T11:11:05Z

src/librustc_mir/hair/pattern/_match.rs

+                // The pattern intersects the middle of the subrange,
+                // so we create two ranges either side of the intersection.)
+                remaining_ranges.push((subrange_lo, pat_interval.lo));
+                remaining_ranges.push((pat_interval.hi, subrange_hi));


I think you missed -1 and +1 on the fields of pat_interval, here.

eddyb · 2018-05-21T11:12:33Z

src/librustc_mir/hair/pattern/_match.rs

+        for (subrange_lo, subrange_hi) in ranges {
+            if pat_interval.lo > subrange_hi || pat_interval.hi < subrange_lo {
+                // The pattern doesn't intersect with the subrange at all,
+                // so the subrange remains untouched.


You could construct the intersection by i = a.start.max(b.start)..=a.end.min(b.end), and the two halves of the subtraction a - b by sl = a.start..=i.start-1 and sh = i.end+1..=a.end (be careful with overflow).

AFAIK, the empty condition is range.start > range.end (for inclusive ranges).

Since I don't think you need the intersection by itself, the two halves can be:
sl = a.start..=a.start.max(b.start)-1 and
sh = a.end.min(b.end)+1..=a.end
(when a.start.max(b.start) <= a.end.min(b.end), otherwise sl = sh = a)

Therefore:

if a.start > b.end || b.start > a.end { // No intersection. remaining_ranges.push(a); } else { // Overflow is now not a problem, because of the conditions. if b.start > a.start { remaining_ranges.push(a.start..=b.start-1); } if b.end < a.end { remaining_ranges.push(b.end+1..=a.end); } }

I've just checked and your code corresponds to this, but you have the 4 possibilities of my two small ifs expanded out - my main suggestion from this comment would be to orthogonalize those.

eddyb · 2018-05-21T11:13:28Z

src/librustc_mir/hair/pattern/_match.rs

+        // `missing_ctors` are those that should have appeared
+        // as patterns in the `match` expression, but did not.
+        let mut missing_ctors = vec![];
+        'req: for req_ctor in all_ctors.clone() {


Could this iterate over &all_ctors instead?

eddyb · 2018-05-21T11:26:53Z

I left some nit-picky comments about implementation details, but r? @nikomatsakis

eddyb · 2018-05-21T20:31:50Z

src/librustc_mir/hair/pattern/_match.rs

+                }
+            }
+            let (min, max, ty) = match int_ty {
+                Isize => min_max_ty!(isize, usize, cx.tcx.types.isize),


Oh, this is broken, because it's using the host's usize / isize bitwidth instead of the target's.

varkor · 2018-05-22T00:04:36Z

I've addressed most of the comments, but there's still the issue with usize/isize for offset_sign_for_ty, which I'll try to get to tomorrow.

varkor · 2018-05-22T00:09:02Z

src/test/ui/exhaustive_integer_patterns.stderr

+  --> $DIR/exhaustive_integer_patterns.rs:50:11
+   |
+LL |     match x { //~ ERROR non-exhaustive patterns
+   |           ^ patterns `-128i8...-6i8` and `122i8...127i8` not covered


These offsets do not look right now. I'll look into that.

eddyb · 2018-05-22T06:06:44Z

src/librustc_mir/hair/pattern/_match.rs

-                U128  => ( u128::MAX as u128, cx.tcx.types.u128),
-            });
+            let min_max_ty = |sty| {
+                let size = cx.tcx.layout_of(ty::ParamEnv::reveal_all().and(sty))


You should use cx.param_env or something. Also, you don't need to match on uint_ty at all! Pass pcx.ty instead.

Oh, that's so much nicer, of course!

eddyb · 2018-05-22T06:07:27Z

src/librustc_mir/hair/pattern/_match.rs

+                if pat_interval_hi < subrange_hi {
+                    // The pattern intersects a lower section of the
+                    // subrange, so an upper section will remain.
+                    remaining_ranges.push((pat_interval_hi + 1, subrange_hi));


remaining_ranges could be a Vec<RangeInclusive<u128>>, i.e. use ..= syntax.

eddyb · 2018-05-23T08:29:14Z

src/librustc_mir/hair/pattern/_match.rs

@@ -34,6 +34,9 @@ use arena::TypedArena;
 use std::cmp::{self, Ordering};
 use std::fmt;
 use std::iter::{FromIterator, IntoIterator, repeat};
+use std::{char, usize, u8, u16, u32, u64, u128, isize, i8, i16, i32, i64, i128};


These, especially usize, are a bad sign - you shouldn't need them.

eddyb · 2018-05-23T10:00:46Z

src/librustc_mir/hair/pattern/_match.rs

+            let size = cx.tcx.layout_of(ty::ParamEnv::reveal_all().and(pcx.ty))
+                             .unwrap().size.bits() as u32;
+            let shift = 1u128.overflowing_shl(size);
+            let max = shift.0.wrapping_sub(1 + (shift.1 as u128));


This can be implemented with just !0 >> (128 - bits).

eddyb · 2018-05-23T10:02:28Z

src/librustc_mir/hair/pattern/_match.rs

+        ty::TyInt(_) if exhaustive_integer_patterns => {
+            let size = cx.tcx.layout_of(ty::ParamEnv::reveal_all().and(pcx.ty))
+                             .unwrap().size.bits() as u128;
+            let min = (1u128 << (size - 1)).wrapping_neg();


Why do you need wrapping_neg here?

eddyb · 2018-05-23T10:02:52Z

src/librustc_mir/hair/pattern/_match.rs

+            let size = cx.tcx.layout_of(ty::ParamEnv::reveal_all().and(pcx.ty))
+                             .unwrap().size.bits() as u128;
+            let min = (1u128 << (size - 1)).wrapping_neg();
+            let max = (1u128 << (size - 1)).wrapping_sub(1);


Don't need wrapping_sub, can just - 1.

varkor · 2018-05-23T21:17:42Z

Everything seems to be working correctly now, as far as I can tell.

@eddyb had some suggestions for refactoring ConstantRange and PatternKind::Range to use u128 instead of ty::Const, but after looking into it, it looks like it's going to require touching a lot of unrelated areas, which I'd rather avoid doing for this PR. At the very least, any negative effects of creating extra Consts are going to be limited to #![feature(exhaustive_integer_patterns)], so following this PR up later shouldn't be an issue. (Though let me know if I've misunderstood anything, @eddyb.)

varkor · 2018-08-20T23:19:07Z

I've tried to improve the comments in split_grouped_constructors, but it's hard to know just how detailed it should be (as it's fresher in my head than it is to a reader), so let me know how it can be improved further!

arielb1 · 2018-08-21T17:29:07Z

src/librustc_mir/hair/pattern/_match.rs

+                    }
+                    let c = match a.1 {
+                        Endpoint::Start => a.0,
+                        Endpoint::End | Endpoint::Both => a.0 + 1,


How does this handle integer overflow (i.e. a.0 = uint128::MAX)? worth a test case, at least.

(Edited) a.0 < b.0 strictly, so overflow shouldn't be possible. There are already test cases for covering the entire range of u128 and i128, so I think that covers the bases already. I'll add a comment to this effect.

Other than that and the duplication, this looks good.

A: integer overflow can't occur, because only the last point can be uint128::MAX, and only the first point can be 0.

arielb1 · 2018-08-21T18:15:35Z

src/librustc_mir/hair/pattern/_match.rs

+                // endpoint a point corresponds to. Whenever a point corresponds to both a start
+                // and an end, then we create a unit range for it.
+                #[derive(PartialEq, Clone, Copy, Debug)]
+                enum Endpoint {


A potential simplification:
I just noticed that range boundaries are always "in between" numbers - an Endpoint::Start is a boundary between p-1 and p, and an Endpoint::End is between p and p+1 (that means that an Endpoint::End is equivalent to an Endpoint::Start that immediately follows it). You could do something based on that.

e.g., I think this is a correct implementation:

/// Represents a border between 2 integers. Because of the "fencepost error", /// there are be 2^128+1 such borders. #[derive(Copy, Clone, PartialOrd, Ord, PartialEq, Eq)] enum Border { JustBefore(u128), AfterU128Max } impl Border { fn before(n: u128) -> Self { Border::JustBefore(n) } fn after(n: u128) -> Self { match n.checked_add(1) { m => Border::JustBefore(m), None => Border::AfterU128Max } } fn range_borders(r: IntRange<'_>) -> [Self; 2] { let (r_lo, r_hi) = r.range.into_inner(); vec![Border::before(r_lo), Border::after(r_hi)] } // return the integers between `self` and `to` if `self < to`, `None` otherwise. fn range_to(self, to: Self, ty: Ty<'tcx>) -> Option<IntRange<'tcx>> { match (self, to) { (Border::JustBefore(n), Border::JustBefore(m)) => { IntRange::from_interval(ty, n, m, RangeEnd::Excluded) } (Border::JustBefore(n), Border::AfterU128Max) => { IntRange::from_interval(ty, n, u128::MAX, RangeEnd::Included) } (Border::AfterU128Max, _) => None } } } // `borders` is the set of borders between equivalence classes - each equivalence // class is between 2 borders. let row_borders = m.iter() .flat_map(|row| IntRange::from_pat(tcx, row[0])) .flat_map(|range| ctor_range.intersection(&r)) .flat_map(|range| Border::range_borders(range)); let range_borders = Border::range_borders(ctor_range); let mut borders: Vec<Border> = row_borders.chain(range_borders).collect(); borders.sort(); let ranges = borders.windows(2).map(|window| { window[0].range_to(window[1]); })

I do think this is cleaner :) The logic looks good to me too — I'm going to give it a go.

arielb1 · 2018-08-21T18:26:57Z

LGTM. If you think borders are cleaner enough than endpoints, you can use that, otherwise r=me. (Also, add a test that 0..u128::MAX-1 is not exhaustive and write my "why integer overflow can't occur" note in a comment).

…_constructors

arielb1 · 2018-08-21T22:58:23Z

@bors r+

bors · 2018-08-21T22:58:24Z

📌 Commit 6971c5d has been approved by arielb1

varkor · 2018-08-21T23:04:02Z

@arielb1: thank you for all your detailed comments and pointing me in the right direction!

bors · 2018-08-22T00:57:07Z

⌛ Testing commit 6971c5d with merge a79cffb...

Exhaustive integer matching This adds a new feature flag `exhaustive_integer_patterns` that enables exhaustive matching of integer types by their values. For example, the following is now accepted: ```rust #![feature(exhaustive_integer_patterns)] #![feature(exclusive_range_pattern)] fn matcher(x: u8) { match x { // ok 0 .. 32 => { /* foo */ } 32 => { /* bar */ } 33 ..= 255 => { /* baz */ } } } ``` This matching is permitted on all integer (signed/unsigned and char) types. Sensible error messages are also provided. For example: ```rust fn matcher(x: u8) { match x { //~ ERROR 0 .. 32 => { /* foo */ } } } ``` results in: ``` error[E0004]: non-exhaustive patterns: `32u8...255u8` not covered --> matches.rs:3:9 | 6 | match x { | ^ pattern `32u8...255u8` not covered ``` This implements rust-lang/rfcs#1550 for #50907. While there hasn't been a full RFC for this feature, it was suggested that this might be a feature that obviously complements the existing exhaustiveness checks (e.g. for `bool`) and so a feature gate would be sufficient for now.

bors · 2018-08-22T03:01:08Z

☀️ Test successful - status-appveyor, status-travis
Approved by: arielb1
Pushing a79cffb to master...

Match usize/isize exhaustively with half-open ranges The long-awaited finale to the saga of [exhaustiveness checking for integers](rust-lang#50912)! ```rust match 0usize { 0.. => {} // exhaustive! } match 0usize { 0..usize::MAX => {} // helpful error message! } ``` Features: - Half-open ranges behave as expected for `usize`/`isize`; - Trying to use `0..usize::MAX` will tell you that `usize::MAX..` is missing and explain why. No more unhelpful "`_` is missing"; - Everything else stays the same. This should unblock rust-lang#37854. Review-wise: - I recommend looking commit-by-commit; - This regresses perf because of the added complexity in `IntRange`; hopefully not too much; - I measured each `#[inline]`, they all help a bit with the perf regression (tho I don't get why); - I did not touch MIR building; I expect there's an easy PR there that would skip unnecessary comparisons when the range is half-open.

Match usize/isize exhaustively with half-open ranges The long-awaited finale to the saga of [exhaustiveness checking for integers](rust-lang/rust#50912)! ```rust match 0usize { 0.. => {} // exhaustive! } match 0usize { 0..usize::MAX => {} // helpful error message! } ``` Features: - Half-open ranges behave as expected for `usize`/`isize`; - Trying to use `0..usize::MAX` will tell you that `usize::MAX..` is missing and explain why. No more unhelpful "`_` is missing"; - Everything else stays the same. This should unblock rust-lang/rust#37854. Review-wise: - I recommend looking commit-by-commit; - This regresses perf because of the added complexity in `IntRange`; hopefully not too much; - I measured each `#[inline]`, they all help a bit with the perf regression (tho I don't get why); - I did not touch MIR building; I expect there's an easy PR there that would skip unnecessary comparisons when the range is half-open.

rust-highfive assigned michaelwoerister May 20, 2018

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 20, 2018

This was referenced May 20, 2018

Exhaustive integer patterns tracking issue #50907

Closed

Precise range exhaustiveness check for match on integer patterns rust-lang/rfcs#1550

Closed

kennytm added the T-lang Relevant to the language team label May 20, 2018

scottmcm reviewed May 21, 2018

View reviewed changes

rust-highfive assigned eddyb and unassigned michaelwoerister May 21, 2018

eddyb reviewed May 21, 2018

View reviewed changes

rust-highfive assigned nikomatsakis and unassigned eddyb May 21, 2018

eddyb reviewed May 21, 2018

View reviewed changes

varkor commented May 22, 2018

View reviewed changes

eddyb reviewed May 22, 2018

View reviewed changes

eddyb reviewed May 23, 2018

View reviewed changes

varkor force-pushed the exhaustive-integer-matching branch from b3f6184 to 61b6363 Compare August 20, 2018 23:17

arielb1 reviewed Aug 21, 2018

View reviewed changes

varkor added 3 commits August 21, 2018 21:04

Add a test case for u128::MAX - 1

6a957e1

Use a boundary method instead of an endpoint method for split_grouped…

dec5563

…_constructors

Add some extra edge case tests

6971c5d

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 21, 2018

bors merged commit 6971c5d into rust-lang:master Aug 22, 2018

This was referenced Aug 22, 2018

CTFE engine refactor #53424

Merged

Rename TyVariants and variants #53581

Merged

varkor deleted the exhaustive-integer-matching branch August 22, 2018 10:26

eddyb mentioned this pull request Sep 20, 2018

RFC: Or patterns, i.e Foo(Bar(x) | Baz(x)) rust-lang/rfcs#2535

Merged

This was referenced Nov 10, 2018

Exhaustive integer pattern matching varkor/rfcs#3

Closed

Stabilise exhaustive integer pattern matching rust-lang/rfcs#2591

Merged

Centril added the F-exclusive_range_pattern `#![feature(exclusive_range_pattern)]` label Dec 12, 2019

Nadrieril mentioned this pull request Oct 13, 2023

Match usize/isize exhaustively with half-open ranges #116692

Merged

Exhaustive integer matching #50912

Exhaustive integer matching #50912

Uh oh!

Conversation

varkor commented May 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rust-highfive commented May 20, 2018

Uh oh!

scottmcm left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

michaelwoerister commented May 21, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eddyb commented May 21, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

varkor commented May 22, 2018

Uh oh!

varkor May 22, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

varkor commented May 23, 2018

Uh oh!

varkor commented Aug 20, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

varkor Aug 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

varkor commented May 20, 2018 •

edited

Loading

scottmcm left a comment •

edited

Loading

varkor May 22, 2018 •

edited

Loading

varkor Aug 21, 2018 •

edited

Loading