Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add edge bias feature to bias for testing edge cases. #515

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions proptest/src/arbitrary/_std/time.rs
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,11 @@ use crate::num;
use crate::strategy::statics::{self, static_map};

arbitrary!(Duration, SMapped<(u64, u32), Self>;
static_map(any::<(u64, u32)>(), |(a, b)| Duration::new(a, b))
);
static_map(any::<(u64, u32)>(), |(a, b)|
// Duration::new panics if nanoseconds are over one billion (one second)
// and overflowing into the seconds would overflow the second counter.
Duration::new(a, b % 1_000_000_000)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the distribution is uniform I believe we can do a kind of rejection sampling here. since the distribution is uniform we can just select for the values we want in our range.

let mut nanos = b;
while nanos >= 1_000_000_000 {
    nanos = runner.rng().gen();
}

Duration::new(a, nanos)

however given we may have edge bias, that sort of throws this sampling off. maybe one option is when we detect if we have an edge value for b that is > 1_000_000_000 we pin to 1_000_000_000 since that is our edge. otherwise we perform uniform rejection sampling with the assumption that any non-edge value was chosen randomly and uniformly, and any edge value has negligible probability of appearing by comparison. thoughts?

side note - you wrote "Duration::new panics if nanoseconds are over one billion" but you have 1_000_000_000 % 1_000_000_000 = 0 which does not include 1_000_000_000, i'm assuming you wanted to include 1_000_000_000 however.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one I remember something about. This change is more to fix a broken test that was detected because of the edge bias feature than to actually test the edge bias feature. The test was always broken, but it would only cause an issue if a happened to randomly end up as u64::MAX, which is very unlikely without an edge bias feature. And a bit of a justification for its existence.

I do know % 1_000_000_000 will have a slight bias towards lower numbers since the nanoseconds are u32, but in reality it shouldn't really matter. And yes, of course it will test a lot of Duration::new(a, 294967295) (where 294967295 is u32::MAX % 1_000_000_000), which is not very useful... but not very harmful either. Also, internally in Duration nanoseconds are u32 right now, but ::from_nanos for example returns u64, so I was worried about things like if someone changes the type to u64 without thinking, it would freeze the tests pretty badly if rejection sampling was used.

Actually, I think it already handles ranges well enough, too. Just that the original test didn't use them either. And actually I don't think it should use the ranges for the nanoseconds, since overflowing nanos into seconds is valid, as long as seconds are not u64::MAX (or close). Although right now I'm not exactly sure what the deeper meaning of the test's existence is...

you wrote "Duration::new panics if nanoseconds are over one billion" but you have 1_000_000_000 % 1_000_000_000 = 0 which does not include 1_000_000_000, i'm assuming you wanted to include 1_000_000_000 however.

Actually, you are correct. Although, in this case the bug is also in Rust documentation: https://doc.rust-lang.org/std/time/struct.Duration.html#method.new
States:

If the number of nanoseconds is greater than 1 billion (the number of nanoseconds in a second), then it will carry over into the seconds provided.

§Panics
This constructor will panic if the carry from the nanoseconds overflows the seconds counter.

It should say: "If the number of nanoseconds is greater OR EQUAL TO 1 billion", which you can see with

use std::time::Duration;

fn main() {
    println!("{:?}", Duration::new(u64::MAX, 1_000_000_000 - 1));
    println!("{:?}", Duration::new(u64::MAX, 1_000_000_000));
}

So using % 1_000_000_000 is correct, but the comment is incorrect.

Copy link
Contributor

@rexmas rexmas Dec 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So using % 1_000_000_000 is correct, but the comment is incorrect.

nice find!

And actually I don't think it should use the ranges for the nanoseconds, since overflowing nanos into seconds is valid, as long as seconds are not u64::MAX (or close).

ah i see, one option is we could simply use modulo if we have a value for seconds within some epsilon of u64::MAX on seconds. another option, not sure why i didn't think of this sooner, is making the range explicit (any::<u64>(), 0..1_000_000_001u32) where 1_000_000_001 is exclusive.

would either of those make sense?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And actually I don't think it should use the ranges for the nanoseconds, since overflowing nanos into seconds is valid, as long as seconds are not u64::MAX (or close).

Reading back what I wrote in the last comment doesn't make any sense. I was thinking it should be OK to test the overflowing thing, but of course it's not overflowing now, so it might as well use the range...

Or do something silly like

arbitrary!(Duration, SMapped<(u64, u32), Self>;
    static_map(any::<(u64, u32)>(), |(a, b)|
    let c = if b as u64 / 1_000_000_000 > u64::MAX - a {
        b % 1_000_000_000
    } else {
        b
    };
    // Duration::new panics if nanoseconds are one billion (one second) or over
    // and overflowing into the seconds would overflow the second counter.
    Duration::new(a, c)
));

I think in the end that test is not super important anyway... Personally I would just leave it as a range or modulo.

));

// Instant::now() "never" returns the same Instant, so no shrinking may occur!
arbitrary!(Instant; Self::now());
Expand Down
238 changes: 238 additions & 0 deletions proptest/src/num.rs
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,193 @@ pub fn sample_uniform_incl<X: SampleUniform>(
Uniform::new_inclusive(start, end).sample(run.rng())
}

trait SampleEdgeCase<T> {
fn sample_edge_case(runner: &mut TestRunner, epsilon: T) -> Self;
}

macro_rules! impl_sample_edge_case_signed_impl {
($typ: ty) => {
impl SampleEdgeCase<$typ> for $typ {
fn sample_edge_case(
runner: &mut TestRunner,
epsilon: $typ,
) -> $typ {
match sample_uniform(runner, 0, 7) {
0 => 0,
1 => epsilon,
2 => -epsilon,
3 => <$typ>::MIN,
4 => <$typ>::MAX,
5 => <$typ>::MIN + epsilon,
6 => <$typ>::MAX - epsilon,
_ => unreachable!(),
}
}
}
};
}

macro_rules! impl_sample_edge_case_signed {
($($typ: ty),*) => {
$(impl_sample_edge_case_signed_impl!($typ);)*
};
}

macro_rules! impl_sample_edge_case_unsigned_impl {
($typ: ty) => {
impl SampleEdgeCase<$typ> for $typ {
fn sample_edge_case(
runner: &mut TestRunner,
epsilon: $typ,
) -> $typ {
match sample_uniform(runner, 0, 4) {
0 => 0,
1 => epsilon,
2 => <$typ>::MAX,
3 => <$typ>::MAX - epsilon,
_ => unreachable!(),
}
}
}
};
}

macro_rules! impl_sample_edge_case_unsigned {
($($typ: ty),*) => {
$(impl_sample_edge_case_unsigned_impl!($typ);)*
};
}

impl_sample_edge_case_signed!(i8, i16, i32, i64, i128, isize);
impl_sample_edge_case_unsigned!(u8, u16, u32, u64, u128, usize);

trait SampleEdgeCaseRangeExclusive<T> {
fn sample_edge_case_range_exclusive(
runner: &mut TestRunner,
start: T,
end: T,
epsilon: T,
) -> Self;
}

macro_rules! impl_sample_edge_case_range_exclusive_impl {
($typ: ty) => {
impl SampleEdgeCaseRangeExclusive<$typ> for $typ {
fn sample_edge_case_range_exclusive(
runner: &mut TestRunner,
start: $typ,
end: $typ,
epsilon: $typ,
) -> $typ {
match sample_uniform(runner, 0, 4) {
0 => start,
1 => {
num_traits::clamp(start + epsilon, start, end - epsilon)
}
2 => {
if start < end - epsilon {
end - epsilon * 2 as $typ
} else {
start
}
}
3 => end - epsilon,
_ => unreachable!(),
}
}
}
};
}

macro_rules! impl_sample_edge_case_range_exclusive {
($($typ: ty),*) => {
$(impl_sample_edge_case_range_exclusive_impl!($typ);)*
};
}

impl_sample_edge_case_range_exclusive!(
i8, i16, i32, i64, i128, isize, u8, u16, u32, u64, u128, usize, f32, f64
);

trait SampleEdgeCaseRangeInclusive<T> {
fn sample_edge_case_range_inclusive(
runner: &mut TestRunner,
start: T,
end: T,
epsilon: T,
) -> Self;
}

macro_rules! impl_sample_edge_case_range_inclusive_impl {
($typ: ty) => {
impl SampleEdgeCaseRangeInclusive<$typ> for $typ {
fn sample_edge_case_range_inclusive(
runner: &mut TestRunner,
start: $typ,
end: $typ,
epsilon: $typ,
) -> $typ {
match sample_uniform(runner, 0, 4) {
0 => start,
1 => num_traits::clamp(start + epsilon, start, end),
2 => num_traits::clamp(end - epsilon, start, end),
3 => end,
_ => unreachable!(),
}
}
}
};
}

macro_rules! impl_sample_edge_case_range_inclusive {
($($typ: ty),*) => {
$(impl_sample_edge_case_range_inclusive_impl!($typ);)*
};
}

impl_sample_edge_case_range_inclusive!(
i8, i16, i32, i64, i128, isize, u8, u16, u32, u64, u128, usize, f32, f64
);

trait SampleEdgeCaseFloat<T> {
fn sample_edge_case_float(runner: &mut TestRunner) -> Self;
}

macro_rules! impl_sample_edge_case_float_impl {
($typ: ty) => {
impl SampleEdgeCaseFloat<$typ> for $typ {
fn sample_edge_case_float(runner: &mut TestRunner) -> $typ {
match sample_uniform(runner, 0, 11) {
0 => 0.0,
1 => -0.0,
2 => 1.0,
3 => -1.0,
4 => <$typ>::MIN,
5 => <$typ>::MAX,
6 => <$typ>::MIN_POSITIVE,
7 => -<$typ>::MIN_POSITIVE,
8 => <$typ>::EPSILON,
// One ULP from MIN and MAX
9 => <$typ>::from_bits(<$typ>::MIN.to_bits() - 1),
10 => <$typ>::from_bits(<$typ>::MAX.to_bits() - 1),
// 11 => <$typ>::NAN,
// 12 => <$typ>::NEG_INFINITY,
// 13 => <$typ>::INFINITY,
_ => unreachable!(),
}
}
}
};
}

macro_rules! impl_sample_edge_case_float {
($($typ: ty),*) => {
$(impl_sample_edge_case_float_impl!($typ);)*
};
}

impl_sample_edge_case_float!(f32, f64);

macro_rules! int_any {
($typ: ident) => {
/// Type of the `ANY` constant.
Expand All @@ -53,6 +240,14 @@ macro_rules! int_any {
type Value = $typ;

fn new_tree(&self, runner: &mut TestRunner) -> NewTree<Self> {
if runner.eval_edge_bias() {
return Ok(BinarySearch::new(
$crate::num::SampleEdgeCase::sample_edge_case(
runner, 1,
),
));
}

Ok(BinarySearch::new(runner.rng().gen()))
}
}
Expand All @@ -76,6 +271,14 @@ macro_rules! numeric_api {
);
}

if runner.eval_edge_bias() {
return Ok(BinarySearch::new(
$crate::num::SampleEdgeCaseRangeExclusive::sample_edge_case_range_exclusive(
runner, self.start, self.end, $epsilon,
),
));
}

Ok(BinarySearch::new_clamped(
self.start,
$crate::num::sample_uniform::<$sample_typ>(
Expand All @@ -102,6 +305,14 @@ macro_rules! numeric_api {
);
}

if runner.eval_edge_bias() {
return Ok(BinarySearch::new(
$crate::num::SampleEdgeCaseRangeInclusive::sample_edge_case_range_inclusive(
runner, (*self.start()), (*self.end()), $epsilon,
),
));
}

Ok(BinarySearch::new_clamped(
*self.start(),
$crate::num::sample_uniform_incl::<$sample_typ>(
Expand All @@ -120,6 +331,13 @@ macro_rules! numeric_api {
type Value = $typ;

fn new_tree(&self, runner: &mut TestRunner) -> NewTree<Self> {
if runner.eval_edge_bias() {
return Ok(BinarySearch::new(
$crate::num::SampleEdgeCaseRangeInclusive::sample_edge_case_range_inclusive(
runner, self.start, ::core::$typ::MAX, $epsilon,
),
));
}
Ok(BinarySearch::new_clamped(
self.start,
$crate::num::sample_uniform_incl::<$sample_typ>(
Expand All @@ -138,6 +356,13 @@ macro_rules! numeric_api {
type Value = $typ;

fn new_tree(&self, runner: &mut TestRunner) -> NewTree<Self> {
if runner.eval_edge_bias() {
return Ok(BinarySearch::new(
$crate::num::SampleEdgeCaseRangeExclusive::sample_edge_case_range_exclusive(
runner, ::core::$typ::MIN, self.end, $epsilon,
),
));
}
Ok(BinarySearch::new_clamped(
::core::$typ::MIN,
$crate::num::sample_uniform::<$sample_typ>(
Expand All @@ -156,6 +381,13 @@ macro_rules! numeric_api {
type Value = $typ;

fn new_tree(&self, runner: &mut TestRunner) -> NewTree<Self> {
if runner.eval_edge_bias() {
return Ok(BinarySearch::new(
$crate::num::SampleEdgeCaseRangeInclusive::sample_edge_case_range_inclusive(
runner, ::core::$typ::MIN, self.end, $epsilon,
),
));
}
Ok(BinarySearch::new_clamped(
::core::$typ::MIN,
$crate::num::sample_uniform_incl::<$sample_typ>(
Expand Down Expand Up @@ -627,6 +859,12 @@ macro_rules! float_any {

fn new_tree(&self, runner: &mut TestRunner) -> NewTree<Self> {
let flags = self.0.normalise();

if runner.eval_edge_bias() {
return Ok(BinarySearch::new_with_types(
$crate::num::SampleEdgeCaseFloat::sample_edge_case_float(runner), flags))
}

let sign_mask = if flags.contains(FloatTypes::NEGATIVE) {
$typ::SIGN_MASK
} else {
Expand Down
37 changes: 23 additions & 14 deletions proptest/src/num/float_samplers.rs
Original file line number Diff line number Diff line change
Expand Up @@ -438,20 +438,29 @@ macro_rules! float_sampler {
let intervals = split_interval([low, high]);
let size = (intervals.count - 1) as usize;

let interval = intervals.get(index.index(size) as $int_typ);
let small_intervals = split_interval(interval);

let start = small_intervals.get(0);
let end = small_intervals.get(small_intervals.count - 1);
let (low_interval, high_interval) = if start[0] < end[0] {
(start, end)
} else {
(end, start)
};

prop_assert!(
interval[0] == low_interval[0] &&
interval[1] == high_interval[1]);
if size <= 0
{
prop_assert!((intervals.start == high && intervals.step < 0.0) ||
(intervals.start == low && intervals.step > 0.0));
prop_assert!(intervals.count == 1);
}
else
{
let interval = intervals.get(index.index(size) as $int_typ);
let small_intervals = split_interval(interval);

let start = small_intervals.get(0);
let end = small_intervals.get(small_intervals.count - 1);
let (low_interval, high_interval) = if start[0] < end[0] {
(start, end)
} else {
(end, start)
};

prop_assert!(
interval[0] == low_interval[0] &&
interval[1] == high_interval[1]);
}
}
}
}
Expand Down
Loading
Loading