Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strategies from type hints, and inference of missing arguments to builds() and @given() #643

Merged
merged 16 commits into from
Jul 23, 2017

Conversation

Zac-HD
Copy link
Member

@Zac-HD Zac-HD commented May 24, 2017

Closes #293. This pull:

  • adds a new function from_type to look up a strategy that can generate instances of the given type
  • upgrades builds() to infer missing arguments based on type hints
  • upgrades @given to infer missing arguments from type hints
  • adds a new function register_type_strategy to register custom types that can't be automatically derived (based on a known child class or type hints, using builds)

It's been a long time coming, but I think I'm done. (again 😉)

@mulkieran
Copy link
Contributor

I feel this is one of those instances where writing the docs first might be a helpful strategy. I can't yet tell whether the ultimate effect of this would be:

  • wonderful
  • deeply confusing

@mjsir911
Copy link
Contributor

mjsir911 commented May 24, 2017

I've worked on some tests here. Unfortunately most tests fail as it tries to test types such as typing.List[int], which doesn't seem to be implemented in the latest version of zac's.

I have a few question about implementation. When I was designing something similar to this (I didn't see the work already done), I quickly jumped to a custom mapping object. It was it made sense to start off with, the simple types mapping from resolve[int] = st.integers(). As I dove deeper into the typing module though, there were special types like typing.Union or typing.Tuple, which take in arguments as a slice(?), in the form of typing.Union[int, float].

Is there any particular reason you opted to functions instead of a general mapping object with a __getitem__ method handling the complicated types?

It seems the effort to go from what is currently implemented on zac's branch to a more object-oriented approach is extremely minimal, as I've been messing around with the module and commit mjsir911/hypothesis@f75fa8a is the point where I just moved around some code to fit the class methods.

I do like the current implementation, what does what isn't clear at first but after a bit of time I got it. I think this would integrate nicely with the hypothesis library, the only problems I really foresee is how confusing the typing module seems to be and working around some restrictions inside of it.

@mjsir911
Copy link
Contributor

Oh yea and right now the only way to use anything is through backend functions. How do you expect to implement for using. I opted for a @infer decorator, but alternatively it could just be additional functionality onto the @given decorator, although that seems a bit dangerous.

@DRMacIver
Copy link
Member

I opted for a @infer decorator, but alternatively it could just be additional functionality onto the @given decorator, although that seems a bit dangerous.

This is what I meant when I said on IRC that I was making @Zac-HD's life difficult by making the bar for this feature very high. :-)

The @infer decorator is right out as a solution I'm afraid. The problem is that types are not actually a very good fit to the problem of data generation, because often what you want to generate is much more specific than a type. This means that I'm not willing to accept any solution that makes it hard to swap out a type and replace it with a strategy (which included anything that forces you to swap out all your types for strategies, which is why the @infer solution won't work).

The best design is probably going to be one which adds an additional decorator or two that specify in some manner which arguments to infer strategies for before you apply given to the problem.

...but this pull request doesn't need to handle any of that. It's just working on the actual conversion logic, with the goal of providing a function that can take a type and return a strategy.

@Zac-HD
Copy link
Member Author

Zac-HD commented May 25, 2017

@mjsir911, I have actually handled the parametrised types - the lookup filter was just a little too severe. Using functions instead of classes in this situation seems to me mostly a matter of style, and I find it easier to experiment by composing functions than editing classes; it also helps encapsulate things into bite-sized pieces of logic.

@Zac-HD Zac-HD force-pushed the from_type branch 4 times, most recently from dee1bd6 to abe9890 Compare May 26, 2017 14:19
@Zac-HD Zac-HD force-pushed the from_type branch 5 times, most recently from 4ccaed6 to bc71d64 Compare June 9, 2017 12:20
@Zac-HD Zac-HD force-pushed the from_type branch 2 times, most recently from e80dccb to a37b570 Compare June 9, 2017 14:44
@Zac-HD Zac-HD changed the title WIP: lookup function to find a strategy for a type A type -> strategy lookup function Jun 9, 2017
@DRMacIver
Copy link
Member

Sorry, I've been ill and am now at an academic summer school, so I haven't had time to review this properly.

My initial impression is that this needs more work though, sorry. :-(

I'm really not comfortable making this public API without the following two features:

  • Allowing users to extend this to new types without poking around internals.
  • A smoother transition from types to strategies. (Elaborating on this below)

I'm very very keen to make sure we don't end up with a sort of "two worlds" approach to strategies, where people mostly just use one or the other way of specifying data. Right now this seems to encourage them by making it quite awkward to switch. if I have e.g some large named tuple which I'm getting with from_type(my_named_tuple), if I just want to replace one of the coordinates with a specific strategy I have to specify from_type as the strategy for all the other coordinates. I'd like to be able to just override individual coordinates. Another example (which I think matters less) is it would be nice if I could e.g. do from_type(Set[int]) in a way which specifies a more specific elements strategy.

One way to do this might be to pass arbitrary keywords from from_type to the relevant strategy function, but I'm not wed to that approach and it might create problems with how to document it.

I'll try to do a detailed review, but it probably won't be possible before next week, sorry.

@DRMacIver
Copy link
Member

Sorry, I see this is partly what your lookup argument is designed to do, but I don't actually think it works as a solution:

  • If I have Tuple[int, int] I might want to override the first coordinate and not the second
  • It makes externally defined types a second class citizen that are much more awkward to use because they need to be specified each time.

@Zac-HD
Copy link
Member Author

Zac-HD commented Jun 12, 2017

No need to apologise - I was warned very clearly before I started, and if we end up deciding not to merge anything at all I'll still have learned a lot!

After thinking some more, from_type could be an internal/private interface without much of a loss. Instead of a public lookup, it would then be used internally in builds() and @given. This is actually a better fit for my goal of providing subtle magic a useful internal building block which lets users omit anything that can be inferred from type hints.

  • Create a new public constant hypothesis.infer which signals "infer a strategy for this argument based on type hints" - usable in @given and builds(), to override a default argument.

  • In builds(), missing arguments are inferred from type hints if possible. b=infer can be used to activate inference for b even when it has a default argument. Ensure that error messages mention and distinguish absence of type hints from unresolvable type hints, and how to fix each.

  • In @given, missing arguments are 'lazily inferred' - they remain in the argspec as they do now. When the decorated function is called, if strategies can be inferred for all required arguments they are; otherwise the usual TypeError is raised - with more info if there was an unresolvable type hint.

  • A new function somewhere to register a default strategy for custom types into a global mapping, like for custom Django model fields.

Does this sound sensible and address your concerns?

@Zac-HD Zac-HD changed the title A type -> strategy lookup function [WIP again] use type annotations to select missing arguments to builds() and @given() Jun 14, 2017
@Zac-HD Zac-HD force-pushed the from_type branch 8 times, most recently from 86bd8fe to 1917c76 Compare June 15, 2017 12:01
@DRMacIver
Copy link
Member

BTW I've referenced this pull request in a recent article (the article isn't really about this pull request, but it was natural to use the features you're working on in the examples).

@Zac-HD
Copy link
Member Author

Zac-HD commented Jul 18, 2017

recent article

Nice! Ironically I have almost no experience with type-driven property-based testing, but the builds() upgrade is still pretty exciting 😄

Re-finalised changes, all on top of the changelog commit, are also ready for review again:

In fact, from_type will now use builds() much more often - it understands that self (actually the first posarg to __init__ by any name) isn't actually a required argument to construct a class, and I noticed that if there are no required arguments it works for things without type annotations too 😆 The juggling of target and target.__init__ looks pretty strange, but it's the result of many experiments to get 2/3.5/3.6-compatible code and it works now.

I wrote a nifty decorator that makes from_type either eager or lazy - eager success, for a good repr and interactive use, but lazily raised errors so you get failing tests instead of collection errors. Strange but actually pretty easy to do!

The big one is converting to a single, global mapping from types to strategies - it's constructed at import time and there's no distinction between user-registered and default-registered types. Purely for ease of rebasing, it also includes the new (non-) handling for typevars and several fixes for robustness against future changes in the typing module and consequential fixes for 2.7/3.5/3.6 compatibility. Finally, it adds nice big comments to explain what's going on.

Last, adding tests and moving a function to ensure this pull doesn't slow down calls to builds().

@DRMacIver
Copy link
Member

DRMacIver commented Jul 22, 2017

Confession: I have not done a final review in anything resembling depth.

However, other than the things we discussed, I was pretty happy with this last time we looked at it. I'm now on holiday and didn't manage to get a proper review done before I left, and I know you're keen to get this merged, so I am prepared to leave the final merge up to your judgement. The public API is one I'm entirely happy with us committing to, the implementation is looking good, and I generally think we might as well get this out there and see how it goes.

Good job, @Zac-HD. 🎉

@Zac-HD Zac-HD force-pushed the from_type branch 2 times, most recently from e81a24b to a27a2de Compare July 23, 2017 11:06
@HypothesisWorks HypothesisWorks deleted a comment from DRMacIver Jul 23, 2017
@Zac-HD Zac-HD merged commit 834f020 into HypothesisWorks:master Jul 23, 2017
@Zac-HD
Copy link
Member Author

Zac-HD commented Jul 23, 2017

I tidied up the history a bit more, so the commit-by-commit view is as useful as possible in case of future debugging. And - after three months of work on this - I'm delighted to see everything go green and the release happen 😄 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants