-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exhaustive integer matching #50912
Exhaustive integer matching #50912
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm definitely fine with merging this without an RFC (feature-gated of course, as you did).
^ pattern
32u8...255u8
not covered
This is amazing even if all it did was give that error instead of just "pattern _
not covered".
|
||
// Let's test other types too! | ||
match '\u{0}' { | ||
'\u{0}' ..= char::MAX => {} // ok |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since char
is a USV, it doesn't actually need to cover that entire range. If it's feasible, could this handle the example in rust-lang/rfcs#1550 (comment) too? (Feel free to punt if hard.)
--> $DIR/exhaustive_integer_patterns.rs:50:11 | ||
| | ||
LL | match x { //~ ERROR non-exhaustive patterns | ||
| ^ patterns `-128i8...-5i8`, `120i8...121i8` and `121i8...127i8` not covered |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could adjacent ranges be merged, or at least not overlap? 121
is in two of the uncovered ranges, and it would be nice if the message was "patterns -128i8...-5i8
and 120i8...127i8
not covered" instead.
LL | match x { //~ ERROR non-exhaustive patterns | ||
| ^ patterns `-128i8...-5i8`, `120i8...121i8` and `121i8...127i8` not covered | ||
|
||
error: aborting due to 5 previous errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the holes in the examples are non-singleton. Consider a UI test for something like -127i8..=127
or match 0 { i16::MIN..=-1 => {} 1..=i16::MAX => {} }
.
value_constructors = true; | ||
vec![ConstantRange(ty::Const::from_bits(cx.tcx, min, cx.tcx.types.char), | ||
ty::Const::from_bits(cx.tcx, max, cx.tcx.types.char), | ||
RangeEnd::Included)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully the char
request is just to initialize this as a two-element vec instead...
r? @eddyb maybe? |
I16 => min_max_ty!(i16, u16, cx.tcx.types.i16), | ||
I32 => min_max_ty!(i32, u32, cx.tcx.types.i32), | ||
I64 => min_max_ty!(i64, u64, cx.tcx.types.i64), | ||
I128 => min_max_ty!(i128, u128, cx.tcx.types.i128), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @oli-obk This sort of thing can be done from the bit width (which you can get from the layout of these types).
struct Interval<'tcx> { | ||
pub lo: u128, | ||
pub hi: u128, | ||
pub ty: Ty<'tcx>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should use ty::layout::Integer
here or similar.
/// An inclusive interval, used for precise integer exhaustiveness checking. | ||
struct Interval<'tcx> { | ||
pub lo: u128, | ||
pub hi: u128, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you can use RangeInclusive<u128>
instead?
remaining_ranges.into_iter().map(|(lo, hi)| { | ||
let (lo, hi) = Interval::offset_sign(ty, (lo, hi), false); | ||
ConstantRange(ty::Const::from_bits(cx.tcx, lo, ty), | ||
ty::Const::from_bits(cx.tcx, hi, ty), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should ConstantRange
use ty::Const
? Seems expensive, and useless unless the ty::Const
is Bits
(cc @oli-obk).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should definitely be using ConstValue
instead of ty::Const
. I put it on my TODO list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ConstantRange
can only hold integers, so ConstValue
shouldn't be needed, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably. My TODO already says "ConstValue or smaller" ;). That should not block this PR though, since it's a preexisting issue.
// The pattern intersects the middle of the subrange, | ||
// so we create two ranges either side of the intersection.) | ||
remaining_ranges.push((subrange_lo, pat_interval.lo)); | ||
remaining_ranges.push((pat_interval.hi, subrange_hi)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you missed -1
and +1
on the fields of pat_interval
, here.
for (subrange_lo, subrange_hi) in ranges { | ||
if pat_interval.lo > subrange_hi || pat_interval.hi < subrange_lo { | ||
// The pattern doesn't intersect with the subrange at all, | ||
// so the subrange remains untouched. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could construct the intersection by i = a.start.max(b.start)..=a.end.min(b.end)
, and the two halves of the subtraction a - b
by sl = a.start..=i.start-1
and sh = i.end+1..=a.end
(be careful with overflow).
AFAIK, the empty condition is range.start > range.end
(for inclusive ranges).
Since I don't think you need the intersection by itself, the two halves can be:
sl = a.start..=a.start.max(b.start)-1
and
sh = a.end.min(b.end)+1..=a.end
(when a.start.max(b.start) <= a.end.min(b.end)
, otherwise sl = sh = a
)
Therefore:
if a.start > b.end || b.start > a.end {
// No intersection.
remaining_ranges.push(a);
} else {
// Overflow is now not a problem, because of the conditions.
if b.start > a.start {
remaining_ranges.push(a.start..=b.start-1);
}
if b.end < a.end {
remaining_ranges.push(b.end+1..=a.end);
}
}
I've just checked and your code corresponds to this, but you have the 4 possibilities of my two small if
s expanded out - my main suggestion from this comment would be to orthogonalize those.
// `missing_ctors` are those that should have appeared | ||
// as patterns in the `match` expression, but did not. | ||
let mut missing_ctors = vec![]; | ||
'req: for req_ctor in all_ctors.clone() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this iterate over &all_ctors
instead?
I left some nit-picky comments about implementation details, but r? @nikomatsakis |
} | ||
} | ||
let (min, max, ty) = match int_ty { | ||
Isize => min_max_ty!(isize, usize, cx.tcx.types.isize), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this is broken, because it's using the host's usize
/ isize
bitwidth instead of the target's.
I've addressed most of the comments, but there's still the issue with |
--> $DIR/exhaustive_integer_patterns.rs:50:11 | ||
| | ||
LL | match x { //~ ERROR non-exhaustive patterns | ||
| ^ patterns `-128i8...-6i8` and `122i8...127i8` not covered |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These offsets do not look right now. I'll look into that.
U128 => ( u128::MAX as u128, cx.tcx.types.u128), | ||
}); | ||
let min_max_ty = |sty| { | ||
let size = cx.tcx.layout_of(ty::ParamEnv::reveal_all().and(sty)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should use cx.param_env
or something. Also, you don't need to match on uint_ty
at all! Pass pcx.ty
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that's so much nicer, of course!
if pat_interval_hi < subrange_hi { | ||
// The pattern intersects a lower section of the | ||
// subrange, so an upper section will remain. | ||
remaining_ranges.push((pat_interval_hi + 1, subrange_hi)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remaining_ranges
could be a Vec<RangeInclusive<u128>>
, i.e. use ..=
syntax.
@@ -34,6 +34,9 @@ use arena::TypedArena; | |||
use std::cmp::{self, Ordering}; | |||
use std::fmt; | |||
use std::iter::{FromIterator, IntoIterator, repeat}; | |||
use std::{char, usize, u8, u16, u32, u64, u128, isize, i8, i16, i32, i64, i128}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These, especially usize
, are a bad sign - you shouldn't need them.
let size = cx.tcx.layout_of(ty::ParamEnv::reveal_all().and(pcx.ty)) | ||
.unwrap().size.bits() as u32; | ||
let shift = 1u128.overflowing_shl(size); | ||
let max = shift.0.wrapping_sub(1 + (shift.1 as u128)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be implemented with just !0 >> (128 - bits)
.
ty::TyInt(_) if exhaustive_integer_patterns => { | ||
let size = cx.tcx.layout_of(ty::ParamEnv::reveal_all().and(pcx.ty)) | ||
.unwrap().size.bits() as u128; | ||
let min = (1u128 << (size - 1)).wrapping_neg(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need wrapping_neg
here?
let size = cx.tcx.layout_of(ty::ParamEnv::reveal_all().and(pcx.ty)) | ||
.unwrap().size.bits() as u128; | ||
let min = (1u128 << (size - 1)).wrapping_neg(); | ||
let max = (1u128 << (size - 1)).wrapping_sub(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need wrapping_sub
, can just - 1
.
Everything seems to be working correctly now, as far as I can tell. @eddyb had some suggestions for refactoring |
b3f6184
to
61b6363
Compare
I've tried to improve the comments in |
} | ||
let c = match a.1 { | ||
Endpoint::Start => a.0, | ||
Endpoint::End | Endpoint::Both => a.0 + 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this handle integer overflow (i.e. a.0 = uint128::MAX
)? worth a test case, at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Edited) a.0 < b.0
strictly, so overflow shouldn't be possible. There are already test cases for covering the entire range of u128
and i128
, so I think that covers the bases already. I'll add a comment to this effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than that and the duplication, this looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A: integer overflow can't occur, because only the last point can be uint128::MAX
, and only the first point can be 0
.
// endpoint a point corresponds to. Whenever a point corresponds to both a start | ||
// and an end, then we create a unit range for it. | ||
#[derive(PartialEq, Clone, Copy, Debug)] | ||
enum Endpoint { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A potential simplification:
I just noticed that range boundaries are always "in between" numbers - an Endpoint::Start
is a boundary between p-1
and p
, and an Endpoint::End
is between p
and p+1
(that means that an Endpoint::End
is equivalent to an Endpoint::Start
that immediately follows it). You could do something based on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g., I think this is a correct implementation:
/// Represents a border between 2 integers. Because of the "fencepost error",
/// there are be 2^128+1 such borders.
#[derive(Copy, Clone, PartialOrd, Ord, PartialEq, Eq)]
enum Border {
JustBefore(u128),
AfterU128Max
}
impl Border {
fn before(n: u128) -> Self {
Border::JustBefore(n)
}
fn after(n: u128) -> Self {
match n.checked_add(1) {
m => Border::JustBefore(m),
None => Border::AfterU128Max
}
}
fn range_borders(r: IntRange<'_>) -> [Self; 2] {
let (r_lo, r_hi) = r.range.into_inner();
vec![Border::before(r_lo), Border::after(r_hi)]
}
// return the integers between `self` and `to` if `self < to`, `None` otherwise.
fn range_to(self, to: Self, ty: Ty<'tcx>) -> Option<IntRange<'tcx>> {
match (self, to) {
(Border::JustBefore(n), Border::JustBefore(m)) => {
IntRange::from_interval(ty, n, m, RangeEnd::Excluded)
}
(Border::JustBefore(n), Border::AfterU128Max) => {
IntRange::from_interval(ty, n, u128::MAX, RangeEnd::Included)
}
(Border::AfterU128Max, _) => None
}
}
}
// `borders` is the set of borders between equivalence classes - each equivalence
// class is between 2 borders.
let row_borders = m.iter()
.flat_map(|row| IntRange::from_pat(tcx, row[0]))
.flat_map(|range| ctor_range.intersection(&r))
.flat_map(|range| Border::range_borders(range));
let range_borders = Border::range_borders(ctor_range);
let mut borders: Vec<Border> = row_borders.chain(range_borders).collect();
borders.sort();
let ranges = borders.windows(2).map(|window| {
window[0].range_to(window[1]);
})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think this is cleaner :) The logic looks good to me too — I'm going to give it a go.
LGTM. If you think borders are cleaner enough than endpoints, you can use that, otherwise r=me. (Also, add a test that |
@bors r+ |
📌 Commit 6971c5d has been approved by |
@arielb1: thank you for all your detailed comments and pointing me in the right direction! |
Exhaustive integer matching This adds a new feature flag `exhaustive_integer_patterns` that enables exhaustive matching of integer types by their values. For example, the following is now accepted: ```rust #![feature(exhaustive_integer_patterns)] #![feature(exclusive_range_pattern)] fn matcher(x: u8) { match x { // ok 0 .. 32 => { /* foo */ } 32 => { /* bar */ } 33 ..= 255 => { /* baz */ } } } ``` This matching is permitted on all integer (signed/unsigned and char) types. Sensible error messages are also provided. For example: ```rust fn matcher(x: u8) { match x { //~ ERROR 0 .. 32 => { /* foo */ } } } ``` results in: ``` error[E0004]: non-exhaustive patterns: `32u8...255u8` not covered --> matches.rs:3:9 | 6 | match x { | ^ pattern `32u8...255u8` not covered ``` This implements rust-lang/rfcs#1550 for #50907. While there hasn't been a full RFC for this feature, it was suggested that this might be a feature that obviously complements the existing exhaustiveness checks (e.g. for `bool`) and so a feature gate would be sufficient for now.
☀️ Test successful - status-appveyor, status-travis |
Match usize/isize exhaustively with half-open ranges The long-awaited finale to the saga of [exhaustiveness checking for integers](rust-lang#50912)! ```rust match 0usize { 0.. => {} // exhaustive! } match 0usize { 0..usize::MAX => {} // helpful error message! } ``` Features: - Half-open ranges behave as expected for `usize`/`isize`; - Trying to use `0..usize::MAX` will tell you that `usize::MAX..` is missing and explain why. No more unhelpful "`_` is missing"; - Everything else stays the same. This should unblock rust-lang#37854. Review-wise: - I recommend looking commit-by-commit; - This regresses perf because of the added complexity in `IntRange`; hopefully not too much; - I measured each `#[inline]`, they all help a bit with the perf regression (tho I don't get why); - I did not touch MIR building; I expect there's an easy PR there that would skip unnecessary comparisons when the range is half-open.
Match usize/isize exhaustively with half-open ranges The long-awaited finale to the saga of [exhaustiveness checking for integers](rust-lang#50912)! ```rust match 0usize { 0.. => {} // exhaustive! } match 0usize { 0..usize::MAX => {} // helpful error message! } ``` Features: - Half-open ranges behave as expected for `usize`/`isize`; - Trying to use `0..usize::MAX` will tell you that `usize::MAX..` is missing and explain why. No more unhelpful "`_` is missing"; - Everything else stays the same. This should unblock rust-lang#37854. Review-wise: - I recommend looking commit-by-commit; - This regresses perf because of the added complexity in `IntRange`; hopefully not too much; - I measured each `#[inline]`, they all help a bit with the perf regression (tho I don't get why); - I did not touch MIR building; I expect there's an easy PR there that would skip unnecessary comparisons when the range is half-open.
Match usize/isize exhaustively with half-open ranges The long-awaited finale to the saga of [exhaustiveness checking for integers](rust-lang#50912)! ```rust match 0usize { 0.. => {} // exhaustive! } match 0usize { 0..usize::MAX => {} // helpful error message! } ``` Features: - Half-open ranges behave as expected for `usize`/`isize`; - Trying to use `0..usize::MAX` will tell you that `usize::MAX..` is missing and explain why. No more unhelpful "`_` is missing"; - Everything else stays the same. This should unblock rust-lang#37854. Review-wise: - I recommend looking commit-by-commit; - This regresses perf because of the added complexity in `IntRange`; hopefully not too much; - I measured each `#[inline]`, they all help a bit with the perf regression (tho I don't get why); - I did not touch MIR building; I expect there's an easy PR there that would skip unnecessary comparisons when the range is half-open.
Match usize/isize exhaustively with half-open ranges The long-awaited finale to the saga of [exhaustiveness checking for integers](rust-lang#50912)! ```rust match 0usize { 0.. => {} // exhaustive! } match 0usize { 0..usize::MAX => {} // helpful error message! } ``` Features: - Half-open ranges behave as expected for `usize`/`isize`; - Trying to use `0..usize::MAX` will tell you that `usize::MAX..` is missing and explain why. No more unhelpful "`_` is missing"; - Everything else stays the same. This should unblock rust-lang#37854. Review-wise: - I recommend looking commit-by-commit; - This regresses perf because of the added complexity in `IntRange`; hopefully not too much; - I measured each `#[inline]`, they all help a bit with the perf regression (tho I don't get why); - I did not touch MIR building; I expect there's an easy PR there that would skip unnecessary comparisons when the range is half-open.
Match usize/isize exhaustively with half-open ranges The long-awaited finale to the saga of [exhaustiveness checking for integers](rust-lang/rust#50912)! ```rust match 0usize { 0.. => {} // exhaustive! } match 0usize { 0..usize::MAX => {} // helpful error message! } ``` Features: - Half-open ranges behave as expected for `usize`/`isize`; - Trying to use `0..usize::MAX` will tell you that `usize::MAX..` is missing and explain why. No more unhelpful "`_` is missing"; - Everything else stays the same. This should unblock rust-lang/rust#37854. Review-wise: - I recommend looking commit-by-commit; - This regresses perf because of the added complexity in `IntRange`; hopefully not too much; - I measured each `#[inline]`, they all help a bit with the perf regression (tho I don't get why); - I did not touch MIR building; I expect there's an easy PR there that would skip unnecessary comparisons when the range is half-open.
This adds a new feature flag
exhaustive_integer_patterns
that enables exhaustive matching of integer types by their values. For example, the following is now accepted:This matching is permitted on all integer (signed/unsigned and char) types. Sensible error messages are also provided. For example:
results in:
This implements rust-lang/rfcs#1550 for #50907. While there hasn't been a full RFC for this feature, it was suggested that this might be a feature that obviously complements the existing exhaustiveness checks (e.g. for
bool
) and so a feature gate would be sufficient for now.