Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: add Arbitrary. Allow proptest!(..) to omit strategy when T : Arbitrary #7

Closed
Centril opened this issue Sep 17, 2017 · 9 comments
Labels
feature-request This issue is requesting new functionality

Comments

@Centril
Copy link
Collaborator

Centril commented Sep 17, 2017

Notes

This issue is a more long-term issue and may, to be really useful, depend on landing procedural macros in Rust stable, or provide a nightly only side-crate for some parts (3.).

Edit: Apparently procedural macros already allow for custom derive in stable

Motivation

In a lot of cases, there's a canonical way to generate an object of type T.
In such cases, you could simply encode the canonical strategy for such a T in the standard quickcheck Arbitrary trait.

The benefits of doing this are:

1.

Increased familiarity for quickcheck users by bridging the gap.

2.

The creation of the Strategy in proptest! can be omitted, and

proptest! {
    #[test]
    fn i64_abs_is_never_negative(a in prop::num::i64::ANY) {
        assert!(a.abs() >= 0);
    }
}

can be turned into:

proptest! {
    #[test]
    fn i64_abs_is_never_negative(a: i64) {
        assert!(a.abs() >= 0);
    }
}

or even the more standard in quickcheck:

proptest! {
    #[test]
    fn i64_abs_is_never_negative(a: i64) -> bool { a.abs() >= 0 }
}

3.

Custom "deriving" Arbitrary, by macro (to decouple testing from main logic), or by standalone derive if Rust ever gets that, becomes possible in a lot of cases.

Rules for deriving by induction:

  • All unit types are Arbitrary by simply always returning the unit.
  • All product types are Arbitrary if their factors are Arbitrary.
  • All sum types are Arbitrary if each product type in each variant in a sum type is Arbitrary. Variants with no inner product are just unit types.

Ergonomics

It is considerably more ergonomic to use a macro to generate strategies compared to simply re-iterating the variants in the enum.

The trait

The trait could look like:

/// `Arbitrary` determines a canonical `Strategy` for the implementing type.
///
/// It provides one function to, yield strategies for generating `Self`.
pub trait Arbitrary
where
    Self: Sized + Debug
{
    /// The type of `ValueTree` used for `Self`'s `Strategy`.
    type ValueTree: ValueTree<Value = Self>;

    /// The type of `Strategy` used to generate values of type `Self`.
    type Strategy: Strategy<Value = Self::ValueTree>;

    /// Yields a `Strategy` for generating values of type `Self`.
    fn arbitrary() -> Self::Strategy;
}

Or, in a future Rust version:

/// `Arbitrary` determines a canonical `Strategy` for the implementing type.
///
/// It provides one function to, yield strategies for generating `Self`.
pub trait Arbitrary
where
    Self: Sized + Debug
{
    /// Yields a `Strategy` for generating values of type `Self`.
    fn arbitrary() -> impl Strategy<Value = impl ValueTree<Value = Self>>;
}
@Centril
Copy link
Collaborator Author

Centril commented Sep 18, 2017

I will start work on the Arbitrary trait and a procedural macro for custom derive of it for now. So, let's see how that goes first and I'll report on my progress.

@AltSysrq
Copy link
Collaborator

There's a lot of facets here; hopefully I cover all of them.

In a lot of cases, there's a canonical way to generate an object of type T.

I don't by any means disagree with this, but it would be nice to have some data for how often this applies in real code using things like Hypothesis/proptest.

For what it's worth, here's some info for what I've worked on:

  • ensync: Generally holds, though arb_fs() almost certainly wouldn't be covered by the proposal here since it's recursive. It's worth noting though that this was originally adapted from quickcheck, so it's strongly biased in that direction.

  • hexeline: I don't think it ever applies, but admittedly hexeline is unusual in a lot of ways.

  • Some proprietary code I worked on used fj's quickcheck and in the majority of cases had exactly one Arbitrary per type, but part of this is due to how excruciating making new Arbitrarys in Java 7 was.

Arbitrary

Maybe DefaultStrategy since the implementation essentially just provides a Strategy?

I'll keep calling it Arbitrary below for now for consistency.

a in prop::num::i64::ANY … can be turned into … a: i64

I can't say I particularly care for this; I think maintaining the clarity that a Strategy is just a normal value that can be passed around is important. Having two forms like that would also give an awkward speed-bump when needing to transition between them when there are multiple arguments.

I'd also question how useful it is to do this for things like integers; I'm not sure how many cases there are where testing with literally arbitrary integers is useful. IIRC, quickcheck defaults to the range 0..100.

Custom "deriving" Arbitrary

This sounds like a very good idea, and would particularly eliminate a lot of boilerplate in ensync.

Rules for deriving by induction

The rules don't take into account recursive types (e.g., En), which ideally would be handled in a way more graceful than aborting at runtime by overflowing the stack.

I think any custom derive would also need a way to override the strategy for a particular field, since the type will often have its own constraints on internal state. Something akin to #[serde(serialize_with = "..")] might work. This would also relieve that field of needing to be Arbitrary.

re-iterating the variants in the enum

IMO this is the single biggest weakness in proptest right now; the compiler can't help you if you forget a variant when defining the strategy.

The trait

I don't have anything to say about this as it pertains to the original proposal that I haven't already voiced.


Overall, this is something I feel reasonably positively about. My biggest concern is that Arbitrary is all-or-nothing, so the moment you need even a single parameter for constructing the strategy, you're back to what we have today. It would be nice if there were some way to pass parameters through and still use the code generation.

I'm not sure how practical this would be, but to illustrate:

pub trait Arbitrary where Self : Sized + Debug {
    // Parameters that can be passed to the implementation. The custom derive
    // could default this to `()`.
    type Parameters : Default;
    // As in the issue description.
    type ValueTree: ValueTree<Value = Self>;
    type Strategy: Strategy<Value = Self::ValueTree>;

    fn arbitrary(parameters: Self::Parameters) -> Self::Strategy;
}

// Simple use of auto-derive
#[derive(Debug, Arbitrary)]
struct SimpleStruct {
    string: String,
    int: i64,
}

// Generated impl
impl Arbitrary for SimpleStruct {
    type Parameters = ();
    type ValueTree = /* big type */;
    type Strategy = /* big type */;

    fn arbitrary(parameters: ()) -> Self::Strategy {
        // NB This won't compile because the type is not nameable
        (String::arbitrary(String::Parameters::default()),
         i64::arbitrary(i64::Parameters::default()))
            .prop_map(|string, int| SimpleStruct { string, int })
    }
}

// Use custom parameters
#[derive(Debug, Arbitrary)]
#[proptest(paremeters = ComplexParms)]
struct ComplexStruct {
    // Embedding Rust code in string literals is probably a bad idea, but
    // doing it here for conciseness.
    #[proptest(strategy = "parameters.string")]
    string: String,
    #[proptest(strategy = "0..parameters.int_max")]
    int: i64,
}

struct ComplexParms {
    string: &'static str,
    int_max: i64,
}

impl Default for ComplexParms {
    fn default() -> Self {
        ComplexParms {
            string: "[0-9]{4}",
            int_max: 42,
        }
    }
}

// Generated impl
impl Arbitrary for ComplexStruct {
    type Parameters = ComplexParms;
    type ValueTree = /* big type */;
    type Strategy = /* big type */;

    fn arbitrary(parameters: ()) -> Self::Strategy {
        (parameters.string, 0..parameters.int_max)
            .prop_map(|string, int| ComplexStruct { string, int })
    }
}

In any case, thanks for being willing to take a stab at this yourself. I look forward to seeing what you come up with!

@Centril
Copy link
Collaborator Author

Centril commented Sep 19, 2017

I don't by any means disagree with this, but it would be nice to have some data for how often this applies in real code using things like Hypothesis/proptest.

I think it's not an all or nothing proposition - some types will be canonical, and some won't. For simpler types, I think this is more true, and since they are a pain point, it's nice to get rid of them.

In my use case, I have a lexer and parser I'm testing for read/show identity.

Edit: In any case where you have an AST (lots of enums) you will probably have a canonical strategy.

Maybe DefaultStrategy since the implementation essentially just provides a Strategy?

I think Arbitrary is better since it will be familiar to QuickCheck uses. In fact, proptests Strategy is more of a Gen of the original Haskell QuickCheck than the Gen of the quickcheck since it is also a monad and functor. So the potential Arbitrary of proptest is much more like arbitrary :: Arbitrary x => Gen x.

I can't say I particularly care for this; I think maintaining the clarity that a Strategy is just a normal value that can be passed around is important. Having two forms like that would also give an awkward speed-bump when needing to transition between them when there are multiple arguments.

I think stylistic choice is good here - I've so far added a function any which you can use as: any::<bool>() to be explicit about what values you want.

Edit: Some prefer type based approach with newtypes and some prefer the value based approach.

I'd also question how useful it is to do this for things like integers; I'm not sure how many cases there are where testing with literally arbitrary integers is useful. IIRC, quickcheck defaults to the range 0..100.

You can always use newtypes which is really a good idea if your data type really only handles a specific range - but sure, some types in the middle of deriving you'd want to customize it to use a different Strategy.

The rules don't take into account recursive types (e.g., En), which ideally would be handled in a way more graceful than aborting at runtime by overflowing the stack.

You could use sane defaults and allow depth and such to be customized with fields. Perhaps a LeafArbitrary type could be used when you only want an arbitrary value of the type but no recursion.

I think any custom derive would also need a way to override the strategy for a particular field, since the type will often have its own constraints on internal state. Something akin to #[serde(serialize_with = "..")] might work. This would also relieve that field of needing to be Arbitrary.

I agree completely. I thought of this a few hours ago but forgot to update the issue =) Tho, I'm not sure if it is currently possible to custom derive and have custom attributes as well in stable Rust - if not, perhaps it should be done when possible...?

IMO this is the single biggest weakness in proptest right now; the compiler can't help you if you forget a variant when defining the strategy.

Right =) This was the main impetus of this issue and the drive for the trait and custom derive proc macro.

Overall, this is something I feel reasonably positively about. My biggest concern is that Arbitrary is all-or-nothing, so the moment you need even a single parameter for constructing the strategy, you're back to what we have today. It would be nice if there were some way to pass parameters through and still use the code generation. [...]

That's a nice idea! I'll try this idea out and see how it works with type inference, etc. I think it should work out fine.

@matklad
Copy link

matklad commented Dec 16, 2017

I think stylistic choice is good here - I've so far added a function any which you can use as: any::() to be explicit about what values you want.

That would be really useful! Currently I don't have a solid understanding of where to place strategies for my types, especially because I can't mirror prop::type::ANY pattern because my strategies are boxed. If, on the call site, I could write any::<u32> the same as any::<MyType>, then this question would become irrelevant, because the actual strategy would not be a part of public API.

@Centril
Copy link
Collaborator Author

Centril commented Dec 16, 2017

@matklad =) I've done some initial work at the temporary crate proptest_arbitrary with implementations for most of the standard library (save for non-owned types which proptest doesn't support (yet)), but it's not tested yet or ready for prime time. But I'm working diligently on it atm ;)

@Centril
Copy link
Collaborator Author

Centril commented Jan 14, 2018

So... most of the work is done I guess and lives here: https://github.com/Centril/proptest-arbitrary/tree/master/src

How do you want to go about merging this into proptest? Should I make one large PR or several small ones? And where should Arbitrary live? In its own module, or proptest::strategy perhaps?

I've done a bunch of work on deriving too, and it works - I haven't fixed the last bit for recursive types, but I have a good idea of how to do it. I'll also have to update proptest_derive given the new syn 0.12 update.

@AltSysrq
Copy link
Collaborator

Wow, that's.. quite exhaustive coverage. I'd like to read through everything over the course of the week or so to get the full picture, though my initial thought is to put it in proptest::arbitrary.

If you want to go ahead and put this in one large PR, that'd be fine too.

@Centril
Copy link
Collaborator Author

Centril commented Jan 15, 2018

Cool =) I'll put this in a large PR under proptest::arbitrary mostly then (some bits are just helpers needed on the way, so they belong elsewhere..) and you can take your time digesting bits.

@AltSysrq AltSysrq added the feature-request This issue is requesting new functionality label Jan 27, 2018
@AltSysrq
Copy link
Collaborator

AltSysrq commented Feb 5, 2019

Looks like I forgot to close this. This feature became available in 0.5.0.

@AltSysrq AltSysrq closed this as completed Feb 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request This issue is requesting new functionality
Projects
None yet
Development

No branches or pull requests

3 participants