-
Notifications
You must be signed in to change notification settings - Fork 11
Proptests
Property testing is a fascinating subject, and our crate lends itself really well to proptests. We use the proptest
crate for them.
In informal terms, property testing is a more abstract way of testing, where instead of explicitly defining an input and the expected output, and then asserting actual == expected
, we define rules for how input should look like and what properties the output for that input should satisfy. The testing framework then takes care of generating a wide coverage of the possible inputs.
More formally, we test an algorithm
This is an extremely potent testing technique, that can fish out bugs that would be difficult to test for otherwise. To test against a given case classically the developer has to think of that case and create the input.
First, read the proptest book up to a point where you get bored.
A good case study is the small_test
proptest suite we have. SmallSet256
is a heavily optimised set that holds u8
values in two 128-bit wide bitmasks.
Let's consider this test:
#[test]
fn contains(btree_set in collection::btree_set(any_elem(), 0..=MAX_SET_SIZE)) {
let vec: Vec<u8> = btree_set.iter().copied().collect();
let slice: &[u8] = &vec;
let small_set: SmallSet256 = slice.into();
for elem in 0..=MAX_ELEM {
assert_eq!(btree_set.contains(&elem), small_set.contains(elem));
}
}
The any_elem
function is a generation strategy that returns any u8
value. So in this case the set BTreeSet<u8>
instances. The function we test here is the From<&[u8]>
impl for SmallSet256
. The property is that after calling into
each value of u8
it is in the result set if and only if it was in the source.
While testing doesn't usually sound fun to developers, designing proptests is actually a great little challenge.
Let us take classification proptests as an example, located in classifier_correctness_test.rs. The goal here is to proptest the overall classifier pipeline. The domain of all the valid inputs here is the domain of all possible strings – the classifiers shouldn't care about correctness of the JSON, but we do require valid UTF-8.
Designing a proptest for Opening(7)
; we can check that input[7] == b'{'
or input[7] == b'['
, but that's not enough – it could be within a string, and then it wouldn't have been correct to classify as an opening! So we'd need to see if it is delimited by double quotes, which is a very non-local property – we need to count how many quotes from the start there are; moreover quotes can be escaped with backslashes, which can themselves be escaped with backslashes...
We quickly run into an issue where we need an equivalent of rsonpath
implementation with another rsonpath
implementation is much more precarious – the prior probability of "we at rsonpath
screwed the same thing twice" is much higher.
To feel less overwhelmed let's just simplify the domain. Instead of considering all the strings we'll exclude quotes and backslashes from our inputs completely. On this limited domain the family of properties "if the classifier returned Structural(idx)
, then input[idx]
is a byte representing that structural" is enough to test correctness.
This has the obvious downside of not testing on the entire domain but is a start and is much easier than getting the entire proptest suite right the first time around. Next iterations of design can introduce more complex inputs to the domain, e.g. quotes but without escapes, and expand the suite.
We can decrease the prior probability of writing the same
As an example consider
In our classifier example, instead of generating random strings, we generate random streams of tokens: either one of the structural characters, or "garbage", which is any string that doesn't contain any structural characters, quotes, or escapes. On such input there is an obvious
Now we can expand the suite by considering a larger domain. Using technique 2. we can add a token of "doubly-quoted string" that won't contain any quotes or backslashes inside, and is expected to be ignored. Such small iterative improvements can now be used to arrive at a comprehensive test suite. This is an open issue, #20, and a good starter for learning proptests!
rsonpath wiki, curated by Mateusz Gienieczko (V0ldek) (mat@gienieczko.com)