Skip to content
This repository has been archived by the owner on Oct 9, 2019. It is now read-only.

Improve list shrinking #199

Merged
merged 9 commits into from
Aug 24, 2017
Merged

Improve list shrinking #199

merged 9 commits into from
Aug 24, 2017

Conversation

jwoudenberg
Copy link
Collaborator

@jwoudenberg jwoudenberg commented Jul 27, 2017

This PR aims to improve the performance of shrinking large lists. I've noticed fuzz tests occasionally hang when shrinking lists containing large fuzzed items. Especially when you're fuzzing nested list structures shrinking can take a while because it shrinks one item at a time.

This PR contains two changes:

In the benchmarks (which I made marginally more realistic) these changes result in an over 100x performance improvement for shrinking a list of ints.

It is debatable whether I'm picking 'the right halves' to try and throw away. Other possible strategies include removing all even or odd elements or removing at random. My feeling, unsupported by evidence, is that when a test fails on a large list, 9 out of 10 times the problem will be relatively simple. That means there's decent odds the problem lies entirely within either half of the list. Further optimisations might be possible for the case where you're discovering a complex problem arising from many elements in the list working in concert, but waiting a bit longer for the shrinker in those scenario's seems ok to me for now.

The same optimisation can be made for string shrinking, but this PR does not contain it because the string shrinker lives in elm-shrink.

Resolves #187.

Because of the way the list shrinker was set up, it would try to shrink
into an empty list every other step. This PR changes the behaviour to
try shrinking into an empty list once at the start of the shrinking.
The list shrinker would only ever shrink a single item per step. For
long lists this can take a while.

This adds an additional shrinking strategy: throwing away the first or
last half of the list. For long randomly generated lists there's decent
odds one of these two attempts will succeed.
Lazy.List.fromList [ dropFirstHalf listOfTrees, dropSecondHalf listOfTrees ]
|> Lazy.force
else
Lazy.List.empty
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what happens when we have a list of 1-4 elements? Say I claim that all list of ints are sorted. Will it find [1, 0] as a minimal counterexample?

Copy link
Collaborator Author

@jwoudenberg jwoudenberg Jul 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic comes in addition to and not instead of the existing 'remove one element' and 'shrink one element' strategies. It adds an extra high-reward strategy as the first thing to try, but always falls back to the more fine-grained strategies if that doesn't work. So the minimal counterexample you mention is still found (I checked it to be sure).

This conditional merely says shrinking by halving should not be attempted for lists smaller than 4 elements, shrinking by a single element is still attempted like before. The reason I added this conditional: Once we're at lists sizes smaller than 4 'halving' might sometimes simply remove a single element, which is something the algorithm was already going to try in the 'remove one' phase anyway. To prevent us from trying the same shrink twice this conditional was introduced.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment to clarify this better in the code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Can you a test to make sure that minimal counterexample is still found? There should be some shrinking tests you can copy.

Copy link
Collaborator

@drathier drathier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice optimization. I've only got a few minor comments :)

Expect.pass

_ ->
Expect.fail "Failed"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expect.notEquals []?
I like the comment.

src/Fuzz.elm Outdated
|> Lazy.force
else
-- For lists of three elements or less, halving and removing a single element is often going to be the same thing.
-- To prevent us from attempting the same shrunken value twice, we're disabling the halving strategy for these small lists.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move this comment up to the first line of the if-statement, so that you have it right there when you wonder what that magic number is doing?

I'd probably replace 4 with a slightly larger number (8 maybe) and put a comment like "the list halving shortcut is only useful for large lists" instead of the current comment, since it reads more easily. Alternatively, we could remove the if-statement all together to make the code even easier to read. How much performance impact does the duplicate work have? Is it noticeable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea is that if you halve a list of at most three elements, you could be left with an empty or singleton list. 4 is the smallest number that guarantees each half will not be one of those degenerate cases.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sure, but that's fundamentally an optimization; the fuzzer will do (more) duplicate work if you leave that check out, but it'll still be correct. My reasoning is that if it's an performance optimization, it should look like one, so you can glance over it quickly if you just want to know what this function does.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this argument!

@mgold mgold mentioned this pull request Jul 29, 2017
11 tasks
@jwoudenberg
Copy link
Collaborator Author

Thanks for all the feedback and comments! I'm enjoying some holiday this week, will address everything next week.

@jwoudenberg
Copy link
Collaborator Author

Thanks for your reviews @drathier and @mgold, it's really appreciated! I followed up on your feedback and would love a second review.

Copy link
Member

@mgold mgold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good either than one stylistic change.

src/Fuzz.elm Outdated
{- This extends listShrinkRecurse algorithm with an attempt to shrink directly to the empty list. -}
case listShrinkRecurse listOfTrees of
Rose root children ->
Rose root (Lazy.List.cons (RoseTree.singleton []) children)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like to avoid case expressions with only one case. I think this should work instead: let (Rose root children) = listShrinkRecurse listOfTrees in ...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like there should be a mapping function for the children field here.

listShrinkHelp : List (RoseTree a) -> RoseTree (List a)
listShrinkRecurse listOfTrees |> mapChildren (Lazy.List.cons (RoseTree.singleton []))

withChildren newChildren (Rose root oldChildren) = Rose root newChildren

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! I made the change.

Copy link
Collaborator

@drathier drathier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to press submit yesterday :( Ah well, just a minor comment on listShrinkHelp.

src/Fuzz.elm Outdated
{- This extends listShrinkRecurse algorithm with an attempt to shrink directly to the empty list. -}
case listShrinkRecurse listOfTrees of
Rose root children ->
Rose root (Lazy.List.cons (RoseTree.singleton []) children)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like there should be a mapping function for the children field here.

listShrinkHelp : List (RoseTree a) -> RoseTree (List a)
listShrinkRecurse listOfTrees |> mapChildren (Lazy.List.cons (RoseTree.singleton []))

withChildren newChildren (Rose root oldChildren) = Rose root newChildren

@jwoudenberg
Copy link
Collaborator Author

Thanks for the feedback @drathier, I made the change.

I kept mapChildren as a helper function in the src/Fuzz.elm module rather than a library-esque function in src/RoseTree.elm, because it has a bit of a weird signature that exposes the internals of the RoseTree implementation (specifically the lazy list used to contain its children).

@jwoudenberg
Copy link
Collaborator Author

Cool, merging. Thanks for the reviews!

@jwoudenberg jwoudenberg merged commit c5a87c1 into elm-community:master Aug 24, 2017
@jwoudenberg jwoudenberg deleted the improve-list-shrinking branch August 24, 2017 18:14
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Shrinker repeats values a lot / Max's new idea for shrinker implementation
3 participants