Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better internal implementation for 'filtered choice' #1885

Closed
Zac-HD opened this issue Mar 20, 2019 · 1 comment
Closed

Better internal implementation for 'filtered choice' #1885

Zac-HD opened this issue Mar 20, 2019 · 1 comment
Assignees
Labels
enhancement it's not broken, but we want it to be better test-case-reduction about efficiently finding smaller failing examples

Comments

@Zac-HD
Copy link
Member

Zac-HD commented Mar 20, 2019

As discussed in the shrinking guide, we have an internal trick that can make sampling from a filtered list much more efficient. It would be great to translate sampled_from(...).filter(...) into this form internally!


On the implementation side, the most elegant way to do this is probably to extend FilteredStrategy.do_draw with special cases for strategies that are choosing from a fixed number of elements, i.e.:

  1. Choose a random element, and if it satisfies the filter return it. Try this three times, because rejection sampling has low overhead.
  2. [new bit] If the wrapped strategy is a sampled_from of reasonable size (e.g. <1k). This could extend to booleans() and just() in future
    • We're going to calculate a list of allowed values, pick of of them, and write that element's index in the full list to the buffer (see shrinking guide for details)
    • Start by choosing an index k into the filtered list. We don't know how many there are, so just assume that it's unfiltered-1 (as we've failed a random draw).
    • Create the filtered list until we get to the kth element. This is up to twice as fast on average as creating the full list - less if very few elements are allowed, but probably more when we're shrinking.
    • If there are less than k allowed elements, pick a new k which is a valid index. If there are no valid elements, fall out of this condition - it'll be noted and marked invalid as if rejection sampling failed and we didn't try this special case.
    • Write the index of the chosen element in the unfiltered list to the buffer, and return the element.
  3. Give up, because if there is a valid element we can't find it.

Where this gets a bit hairy is in the interaction with LaxyStrategy and delayed validation, so we'll probably need to fiddle with the delegation of the do_draw or filter methods a bit to have it work. That's why we have this write-up of the discussion from #1862.

@Zac-HD Zac-HD added enhancement it's not broken, but we want it to be better test-case-reduction about efficiently finding smaller failing examples labels Mar 20, 2019
@Zac-HD
Copy link
Member Author

Zac-HD commented Mar 21, 2019

Per discussion on #1887, we should ensure that this is also implemented for unique collections via lists(..., unique=True).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement it's not broken, but we want it to be better test-case-reduction about efficiently finding smaller failing examples
Projects
None yet
Development

No branches or pull requests

1 participant