Idea from QC & the paper: classification of input to properties #34

Centril · 2018-01-25T05:40:25Z

The original QC paper: https://www.cs.tufts.edu/~nr/cs257/archive/john-hughes/quick.pdf

Under 2.4. Monitoring test data, the paper talks about classification of trivial inputs which are then shown to the user. These mechanisms are exposed here today: https://hackage.haskell.org/package/QuickCheck-2.11.3/docs/Test-QuickCheck.html#g:22

I think the most useful function here is classify.
I believe these primitives can be offered either at the strategy level by calling methods of TestRunner or via the success/failure mechanism (we can overload the meaning of success such that it can be annotated with classification..)

However, cargo test does not currently expose extra info very nicely on success, so at best I think we can print the out the classification with a println!(..).

The text was updated successfully, but these errors were encountered:

Centril · 2018-01-25T06:15:34Z

I thought about this some more, specifically about the advantages of doing it via Strategy vs. doing it via the property that the user specifies... I think it is possible to use both systems together, so they are not mutually exclusive, but the classification pool might be different then.

Pro for Strategy:

There is less of a burden for users to classify inputs themselves, allowing us to classify for them inside the strategies we offer in proptest ourselves.

Con for Strategy:

The overhead of doing classification inside the strategies we offer in proptest might be not worth it.
The classification wouldn't necessarily be mutually exclusive.

Pro for in-property:

The user can provide more fine tuned predicate based classification more closely in line with their domain.

Con for in-property:

Many users are unlikely to do classification, as is the experience with using QC in Haskell.

AltSysrq · 2018-01-26T02:24:28Z

I don't see how this would work at the strategy level; it doesn't really compose well. The QC exapmle uses "non-trivial" for lists. If you had a strategy producing the same kind of classification, what happens when someone makes a list of lists?

The property level seems simple enough, though as you mention it seems like not many people would use it. (On the other hand, it would not be much work to implement.) Here's an example of what comes to mind:

proptest! {
    #![proptest_config(Config {
        require_classification: &[("non-trivial", 0.5)],
        .. Config::default()
    })]
    #[test]
    fn test_something(ref s in ".*") {
        // test stuff...
        Ok(if !s.empty() {
            // TODO Maybe add some sugar
            TestPass::Classified("non-trivial")
        } else {
            TestPass::Unclassified
        }
    }
}

Honestly it seems of fairly limited value, since in proptest you can just make strategies for each class you're interested in and use a weighted prop_oneof! if you want that much control over the distribution.

Centril · 2018-01-26T03:41:19Z

I don't see how this would work at the strategy level; it doesn't really compose well. If you had a strategy producing the same kind of classification, what happens when someone makes a list of lists?

Right, that's the tricky part I think.

Tho, not all strategies would be obliged to classify their outputs. The classification does also not have to be purely string based. You could perhaps use marker types (possibly with generic parameters to handle lists of lists and varying element types), some Classification trait and store the classification results in a map in the test runner using trait objects. The classification labels will have to have some way of describing themselves into a String or similar.

We could also have:

struct Classifier<S: Strategy, C: Classification, P: Fn(&ValueOf<S>) -> Option<C>> {
    strat: S,
    pred: Arc<P>,
    klass: PhantomData<C>
}

// Or perhaps use a   pred : &ValueOf<S> -> bool  approach for better ergonomics...
classify(a_strategy, |val| if val.is_empty() { Some("empty list..") } else { None })

struct ReallyTrivial; impl Classification { .. }
classify(a_strategy, |val| val.is_empty(), ReallyTrivial);

// use 'classified' as an adjective instead of a verb?

Perhaps this is not workable on a more fundamental level and we can forget about it. My worry is that even with this setup, there is a global-ish set of classifications and collisions will occur. But I'd like to experiment with it and see if it leads anywhere perhaps. In any case, I think the trait-object and free-form labeling can be better than strings.

Here's an example of what comes to mind:

Regarding sugar... how about this?

proptest! {
    #![proptest_config(Config {
        require_classification: &[("non-trivial", 0.5)],
        .. Config::default()
    })]
    #[test]
    fn test_something(ref s in ".*") {
        classify_if!(!s.is_empty(), "non-trivial", {  // <-- this is expr is moved above
            // test stuff...
        }
    }
}

Honestly it seems of fairly limited value, since in proptest you can just make strategies for each class you're interested in and use a weighted prop_oneof! if you want that much control over the distribution.

I think that is the preferred solution once you are more certain what distribution is appropriate, but classification can help you detect inappropriate distributions while you're fine tuning your tests. So classification is more of a tool to monitor and aid in fine tuning of distributions.

AltSysrq · 2018-01-27T15:46:15Z

help you detect inappropriate distributions while you're fine tuning your tests

I would suggest just making a unit test for those cases, like exists many times in proptest itself (example). It might be worth a convenience function for what is currently the for loop in that code. Any extra classification step would be just adding a prop_map to a string. But I think that approach is a lot easier for users since there's nothing extra to learn (particularly no new macros), and other than the for loop itself, it's already fairly close to minimally compact. I think another benefit is that you don't end up with a test that tests two different things, which risks someone accidentally deleting both if the test becomes obsolete later.

Centril · 2018-01-27T16:34:45Z

It might be worth a convenience function for what is currently the for loop in that code.

I think we should do that in any case =)

What could also be useful is something like: sample' but instead as an infinite iterator that you can just .take(n) as many elements you want from, and then you can use sample(strategy).take(n). You can then further group those and count them..

If we recommend the unit test route, it might be worthwhile to add some "sugar" for that. The idea sounds interesting.

I think another benefit is that you don't end up with a test that tests two different things, which risks someone accidentally deleting both if the test becomes obsolete later.

I would just note here that classify in QC is not a test, it just labels and tallies frequency. The benefit of this is that you can see the frequencies for the specific test run you just executed.

I'll have to think more about what we've discussed and try out some code =)

Eh2406 · 2018-09-19T17:55:00Z

@Centril recommended I describe my experience here.

I wrote a complicated strategy. I had a simple test asserting that all generated cases are trivial. That test panicked so I new I had a functional strategy. So I removed that panicking test and added a real property, witch ternd up a bug. Then I did some refactoring of the strategy and almost did not notes that it was now generating all trivial cases.

I think the docs (witch are excellent) for proptest should mention the importance of testing strategies to ensure a good distribution of cases and describe best practices for how to do so.

turion · 2021-02-05T08:44:17Z

In particular it would be great to have a kind of coverage check for strategies. I.e. I have a list strategy and I want to ensure that it will, at least in n% of the cases, generate an empty list, and at least in m% a list of length > something.

AltSysrq added the feature-request This issue is requesting new functionality label Jan 27, 2018

tzemanovic mentioned this issue Mar 22, 2024

Failed test percentage #436

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Idea from QC & the paper: classification of input to properties #34

Idea from QC & the paper: classification of input to properties #34

Centril commented Jan 25, 2018

Centril commented Jan 25, 2018 •

edited

Loading

AltSysrq commented Jan 26, 2018

Centril commented Jan 26, 2018

AltSysrq commented Jan 27, 2018

Centril commented Jan 27, 2018 •

edited

Loading

Eh2406 commented Sep 19, 2018

turion commented Feb 5, 2021

Idea from QC & the paper: classification of input to properties #34

Idea from QC & the paper: classification of input to properties #34

Comments

Centril commented Jan 25, 2018

Centril commented Jan 25, 2018 • edited Loading

AltSysrq commented Jan 26, 2018

Centril commented Jan 26, 2018

AltSysrq commented Jan 27, 2018

Centril commented Jan 27, 2018 • edited Loading

Eh2406 commented Sep 19, 2018

turion commented Feb 5, 2021

Centril commented Jan 25, 2018 •

edited

Loading

Centril commented Jan 27, 2018 •

edited

Loading