-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add edge bias feature to bias for testing edge cases. #515
base: main
Are you sure you want to change the base?
Conversation
* Works by randomly changing some generated values to be one of the random edge cases. The first tests are done with 100% edge cases, and then the chance goes down over the number of test cases until it's 0%. * The average amount of edge case tests can be controlled by edge_bias configuration option. It defaults to 0.25. * In failure the edge bias is persisted to the persistence file along with the seed, so it should be reproducible. * Also fix time::Duration test because the edge bias system found a problem in it. (Might need more modifications.) * Fix float_samplers test subsequent_splits_always_match_bounds.
seed = line[0..comma_position].parse::<PersistedSeed>().ok(); | ||
// The result is safe to ignore to not spam the log in case old | ||
// cases exist, in which case edge_bias will be 0 to match old | ||
// behaviour. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a little confused here, wouldn't the ,
already be the token dictating whether or not there's an edge bias?
also ,
is a fairly arbitrary delimeter. if we're extending the seed parsing maybe we can choose something more explicit so it doesn't conflict with future metadata someone may wish to add? how about @e=
as the token, @
is not base-16 so a clear delimeter that this is new metadata, and e=
meaning "edge equals". thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might take me a while to remember what I wrote over a year ago, so I'm sorry if I make any mistakes here.
I think the idea wasn't to indicate that edge bias was used, but to handle and store the seed for the edge bias separately, so that the old persistence files would be compatible. Right now I'm not entirely sure if that's necessary, might need some more investigation/thinking.
} | ||
} | ||
|
||
pub(crate) fn from_base16(dst: &mut [u8], src: &str) -> Option<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function looks a bit awkward because the return value is easy to ignore given dst
, what if we returned a result that contained the dst value instead?
maybe something like:
pub(crate) fn from_base16<const N: usize>(
src: &str,
) -> Result<[u8; N], &'static str> {
if src.len() != N * 2 {
return Err("Input length does not match expected buffer size.");
}
let mut buffer = [0u8; N];
for (i, chunk) in src.as_bytes().chunks(2).enumerate() {
let hex_str = std::str::from_utf8(chunk)
.map_err(|_| "Input contains invalid UTF-8.")?;
buffer[i] = u8::from_str_radix(hex_str, 16)
.map_err(|_| "Failed to parse hex chunk as hexadecimal.")?;
}
Ok(buffer)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I don't remember what I was thinking there... Sounds like a better approach.
if x < p { | ||
self.current_edge_bias = 1f32 + x * (p - 1f32) / p; | ||
} else { | ||
self.current_edge_bias = (x - 1f32) * p / (p - 1f32); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add a bit more documentation. maybe something like:
"""
Separate the curve into two linear parts whose total area
is p. Cases of p <= 0 and p >= 1 are handled above.
- Before x reaches p: Bias decreases from 1.0 toward p.
- After x reaches p: Bias decreases from p toward 0.0.
The first region ensures that edge cases are mixed into tests. The second region ensures we are testing all cases uniformly.
"""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a pretty good description.
static_map(any::<(u64, u32)>(), |(a, b)| | ||
// Duration::new panics if nanoseconds are over one billion (one second) | ||
// and overflowing into the seconds would overflow the second counter. | ||
Duration::new(a, b % 1_000_000_000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the distribution is uniform I believe we can do a kind of rejection sampling here. since the distribution is uniform we can just select for the values we want in our range.
let mut nanos = b;
while nanos >= 1_000_000_000 {
nanos = runner.rng().gen();
}
Duration::new(a, nanos)
however given we may have edge bias, that sort of throws this sampling off. maybe one option is when we detect if we have an edge value for b
that is > 1_000_000_000
we pin to 1_000_000_000
since that is our edge. otherwise we perform uniform rejection sampling with the assumption that any non-edge value was chosen randomly and uniformly, and any edge value has negligible probability of appearing by comparison. thoughts?
side note - you wrote "Duration::new panics if nanoseconds are over one billion" but you have 1_000_000_000 % 1_000_000_000 = 0
which does not include 1_000_000_000
, i'm assuming you wanted to include 1_000_000_000
however.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one I remember something about. This change is more to fix a broken test that was detected because of the edge bias feature than to actually test the edge bias feature. The test was always broken, but it would only cause an issue if a
happened to randomly end up as u64::MAX, which is very unlikely without an edge bias feature. And a bit of a justification for its existence.
I do know % 1_000_000_000 will have a slight bias towards lower numbers since the nanoseconds are u32, but in reality it shouldn't really matter. And yes, of course it will test a lot of Duration::new(a, 294967295) (where 294967295 is u32::MAX % 1_000_000_000), which is not very useful... but not very harmful either. Also, internally in Duration nanoseconds are u32 right now, but ::from_nanos for example returns u64, so I was worried about things like if someone changes the type to u64 without thinking, it would freeze the tests pretty badly if rejection sampling was used.
Actually, I think it already handles ranges well enough, too. Just that the original test didn't use them either. And actually I don't think it should use the ranges for the nanoseconds, since overflowing nanos into seconds is valid, as long as seconds are not u64::MAX (or close). Although right now I'm not exactly sure what the deeper meaning of the test's existence is...
you wrote "Duration::new panics if nanoseconds are over one billion" but you have 1_000_000_000 % 1_000_000_000 = 0 which does not include 1_000_000_000, i'm assuming you wanted to include 1_000_000_000 however.
Actually, you are correct. Although, in this case the bug is also in Rust documentation: https://doc.rust-lang.org/std/time/struct.Duration.html#method.new
States:
If the number of nanoseconds is greater than 1 billion (the number of nanoseconds in a second), then it will carry over into the seconds provided.
§Panics
This constructor will panic if the carry from the nanoseconds overflows the seconds counter.
It should say: "If the number of nanoseconds is greater OR EQUAL TO 1 billion", which you can see with
use std::time::Duration;
fn main() {
println!("{:?}", Duration::new(u64::MAX, 1_000_000_000 - 1));
println!("{:?}", Duration::new(u64::MAX, 1_000_000_000));
}
So using % 1_000_000_000
is correct, but the comment is incorrect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So using % 1_000_000_000 is correct, but the comment is incorrect.
nice find!
And actually I don't think it should use the ranges for the nanoseconds, since overflowing nanos into seconds is valid, as long as seconds are not u64::MAX (or close).
ah i see, one option is we could simply use modulo if we have a value for seconds within some epsilon of u64::MAX on seconds. another option, not sure why i didn't think of this sooner, is making the range explicit (any::<u64>(), 0..1_000_000_001u32)
where 1_000_000_001
is exclusive.
would either of those make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And actually I don't think it should use the ranges for the nanoseconds, since overflowing nanos into seconds is valid, as long as seconds are not u64::MAX (or close).
Reading back what I wrote in the last comment doesn't make any sense. I was thinking it should be OK to test the overflowing thing, but of course it's not overflowing now, so it might as well use the range...
Or do something silly like
arbitrary!(Duration, SMapped<(u64, u32), Self>;
static_map(any::<(u64, u32)>(), |(a, b)|
let c = if b as u64 / 1_000_000_000 > u64::MAX - a {
b % 1_000_000_000
} else {
b
};
// Duration::new panics if nanoseconds are one billion (one second) or over
// and overflowing into the seconds would overflow the second counter.
Duration::new(a, c)
));
I think in the end that test is not super important anyway... Personally I would just leave it as a range or modulo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the approach @ajantti left some comments
replaces #369