diff --git a/guides/strategies-that-shrink.rst b/guides/strategies-that-shrink.rst new file mode 100644 index 0000000000..b288a66769 --- /dev/null +++ b/guides/strategies-that-shrink.rst @@ -0,0 +1,258 @@ +=================================== +Designing strategies to shrink well +=================================== + +Reducing test cases to a minimal example is a great feature of Hypothesis, +the implementation of which depends on both the shrinking engine and the +structure of the strategy (or combination of strategies) which created the +example to reduce. + +This document is organised into three parts: + +1. How to tell if you need to think about shrinking (you probably don't!) +2. Designing for shrinking 'above' the Hypothesis public API +3. Implementation tricks used in our internals, for interested contributors + +It is written for people implementing complex third-party strategies (such +as `hypothesis-networkx `__), +current or potential contributors to Hypothesis itself, and anyone interested +in how this works under the hood. + + +------------------------------------ +Do you need to design for shrinking? +------------------------------------ +You should only attempt to tune custom strategies for better shrinking +behaviour if more time would otherwise be spent reducing examples by hand +or debugging more complex examples. It *may* be worthwhile if: + +- Your custom strategy will be used by many people, so that spending + the same effort tuning the strategy has much larger benefits, or +- You have personally spent time debugging failures which better example + shrinking could have avoided and think this might happen again. + +If neither of these apply to you, relax! Hypothesis' test-case reduction +is among the best in the world, and our built-in strategies are carefully +designed to work well with it as discussed below. + + +------------------------------------ +Shrinking for third-party strategies +------------------------------------ + +That is, strategies built out of other strategies until you get down to +Hypothesis' public API. These often but not always use ``@composite``. + + +Composition of shrinking +~~~~~~~~~~~~~~~~~~~~~~~~ +The first and most important rule is that Hypothesis shrinks from the +'bottom up'. If any component of your strategy is replaced with a simpler +example, the end result should also become simpler. We usually try to define +"simpler" here to match a reasonable intuition about the strategy, and avoid +weird edge cases when it's combined with another strategy or predicate. + +`Issue #1076 `_, +where magnitude constraints were added to the ``complex_numbers`` strategy, +makes a nice case study. We wanted to continue shrinking the real and +imaginary parts like ``builds(complex, floats(), floats())``. + +In a worst-case scenario, the performance of filtering could be arbitarily +bad, while a 'generate and scale' approach would mean that simple inputs +could lead to irrational outputs. Instead, we choose an imaginary part +between +/- max_magnitute, then calculate the resulting bounds on the real +part and draw it from a strategy that will always be valid. This ensures +that the imaginary part shrinks to zero first, as we think real-valued +complex numbers are simpler than imaginary-valued complex numbers. + + +Let generation be lucky +~~~~~~~~~~~~~~~~~~~~~~~ +Sometimes, it's worth searching for a particularly nasty value to try. +This trick should be used sparingly, and always behind a branch that the +shrinker can decide not to take such as ``if draw(booleans()):``, but might +occasionally worth trying. Measure the results before you keep it! + +`Issue #69 `_ provides +a nice case study: when generating tz-aware datetimes, we would like to generate +instants that are skipped or repeated due to a daylight-savings transition more +often than by chance. Of course, there may or may not be any such moments +allowed by the bounds and tz strategy! + +Eliding much of the detail, a key part is to find such a moment between two +endpoints, when we can only check whether one or more exists. The traditional +approach would be to use a binary search, but this would be relatively expensive +to shrink as we would pay the log-n cost on every attemted shrink. + +Instead of choosing the midpoint, we draw a *random* point between our known +endpoints, and repeat this until we find a satisfactory moment. This allows +the shrinker to delete all the intermediate draws - and appear lucky enough +to find the moment we were looking for on the first guess! + + +Keep things local +~~~~~~~~~~~~~~~~~ +Hypothesis' shrinking engine sees every example as a labelled tree of choices, +with possible reductions represented as operations on the tree. An attempted +shrink succeeds if the new tree can be converted into an example, and the +resulting example triggers the same bug in the test function. + +The most common way we see users breaking data locality is by drawing a size, +then drawing a collection of that size. This is tempting because it's simple +and it _works_, but it's often much slower than the alternatives. + +.. code:: python + + # Both of these strategies can generate exactly the same kind of examples, + # but the second has better performance as well as style. + integers(0, 10).flatmap(lambda n: st.lists(..., min_size=n, max_size=n)) + st.lists(..., min_size=1, max_size=10) + +Another easy way to keep things local is to ensure that any ``.filter(...)`` +or ``assume(...)`` calls you use are as close as possible to the relevant part +of the strategy. That way, Hypothesis can retry just the part that failed +instead of the entire strategy, which might be much slower. + +For efficient shrinking, local operations on the tree should correspond with +valid (and preferably local) shrinks to the final example. For example: + +.. code:: python + + # This form of loop is hard to shrink, because we'd have to reduce `n` and + # delete something in the loop simultaneously. It's equivalent to the + # `.flatmap` example above. We _do_ shrink this, but much more slowly. + n = draw(integers(0, 10)) + for _ in range(n): + ... + draw(...) + ... + + # In this form, the shrinker can see a repeated struture of labels + # and delete one loop iteration without touching anything else. + # We use a variant of this trick to generate collections internally! + while draw(integers(0, x)) > threshold: + ... + draw(...) + ... + +Similarly, it's better to draw all the attributes or inputs you need for an +object at the same time, again so they can be modified or deleted together. + +The exact behaviour of the shrinking is a topic of active research and +development, so if you are interested in the details we recommend reading +the "internals.rst" guide and the well-commented source code in +``hypothesis.internal.conjecture``. An earlier (mid-2018) version is +illustrated in David's draft paper *Test-Case Reduction for Free*, +along with an extensive evaluation. Contact him if you would like a copy. + + +------------------------------------- +Shrinking in the Hypothesis internals +------------------------------------- +The last section is for current or prospective Hypothesis contributors only. + +These tricks rely on implementation details that are not available to +third-party libraries or users, **and can change in any patch release**. +Occasionally they are also indispensible to get good performance in underlying +primitives, so please contact us if the public API is not enough and we may +be able to work something out. + + +What do internals get you? +~~~~~~~~~~~~~~~~~~~~~~~~~~ +Using the low-level, internal APIs complements, rather than changing, the +principles above. The bytestream-level view has some important advantages: + +Because we operate at the level of bits, the relationship between a value and +the corresponding buffer is much more obvious. If we're careful, that means +we can calculate the value we want and then write the corresponding buffer +to recreate it when the test case is shrunk or replayed. + +A small step up from bits, we can also see the spans that indicate a subset +of the buffer to consider for various transformations such as transposition +or deletion. + +Sometimes these features are the only way to maintain acceptable performance +in very rare or even pathological cases - consider shrinking a complex number +with a single allowed magnitude - but it's almost certain that someone will +need the core strategies to do just that. +However, using low-level APIs also comes at a cost - they are verbose and +generally more difficult to use, and can violate key invariants of the engine +if misused. + +Internally, our strategies mostly use the public API or something that looks +a lot like ``@composite``, so it's fairly easy to follow along. There are +just a few tricks enabled by those low-level advantages that we wanted to +name and document, so we can recognise them discuss them and invent more... + + +Make your own luck +~~~~~~~~~~~~~~~~~~ +This is the simplest trick that uses our ability to write choices to the +buffer. We use it in stateful testing, where there may be many rules but only +a few of them allowed by their preconditions, and "lucky generation" would +work but be very inefficient. + +1. Draw an index into the unfiltered list of rules. Return the corresponding + rule if it's allowed - we got lucky! (or someone set us up...) +2. Create a list of allowed rules, and choose one from that shortlist instead. +3. Find the index of the chosen rule *in the unfiltered list*, and write that + index to the buffer. Finally, return the chosen rule. + +When the shrinker tries to delete the first two draws, the resulting buffer +will lead to the same rule being chosen at step *one* instead. We've made +our own luck! + +This trick is expecially useful when we want to avoid rejection sampling +(the ``.filter`` method, ``assume``) for performance reasons, but also +need to give the shrinker the same low-level represention for each instance +of a repeated choice. + + +Flags "shrink open" +~~~~~~~~~~~~~~~~~~~ +An important insight from `Swarm Testing (PDF) `__ +is that randomly disabling some features can actually reduce the expected time +before finding a bug, because some bugs may be suppressed by otherwise common +features or attributes of the data. + +As discussed on `issue #1401 `__, +there are a few points to keep in mind when implementing shrinkable swarm testing: + +- You need swarm flags to "shrink open" so that once the shrinker has run to + completion, all flags are enabled. e.g. you could do this by generating a + set of banned flags. +- You need to use rejection sampling rather than anything more clever, or at + least look like it to the shrinker. (see e.g. *Make your own luck*, above) + +Taking Unicode as an example, we'd like to use our knowledge of Unicode +categories to generate more complex examples, but shrink the generated string +without reference to categories. While we haven't actually implemented this +yet - it's pretty hairy - the simple version of the idea goes like this: + +1. Generate a set of banned categories. +2. Use ``characters().filter(category_is_not_banned)`` + +When shrinking, we start by removing categories from the banned set, after +which characters in the string can be reduced as usual. In a serious version, +the make-your-own-luck approach would be essential to make the filter +reasonably efficient, but that's not a problem internally. + +In more complicated structures, it would be nice to generate the flags on first +use rather than up front before we know if we need them. The trick there is +to write each flag to the buffer every time we check it, in such a way that if +we delete the first use the second turns into an initialisation. + + +Explicit example boundaries +~~~~~~~~~~~~~~~~~~~~~~~~~~~ +This is almost always handled implicitly, e.g. by ``cu.many``, but *sometimes* +it can be useful to explicitly insert boundaries around draws that should be +deleted simultaneously using ``data.start_example``. This is used to group +the value and sign of floating-point numbers, for example, which we split up +in order to provide a more natural shrinking order. + +Explict example management can also be useful to delineate variably-sized +draws, such as our internal helper ``cu.biased_coin``, which makes eliminating +dead bytes much cheaper. Finally, labelling otherwise indistinguishable draws +means the shrinker can attempt to swap only the like values.