Increase variation in Unicode character generation #1621

Zac-HD · 2018-10-05T14:35:19Z

This patch substantially increases the variety of examples from the characters() strategy. Instead of generating directly from codepoint, characters() now selects a Unicode category and then a code point within that category. However the shrink target is unchanged, as the choice of category 'shrinks open' to allow codepoint-wise minimisation during shrinking.

I've extracted the unrelated cleanups and refactoring as #1660 - it would be great if that can be reviewed and merged soon, so I can rebase on it.

Closes #1401 and closes #341.

hypothesis-python/src/hypothesis/strategies.py

hypothesis-python/src/hypothesis/searchstrategy/strings.py

Zac-HD · 2018-10-11T13:29:22Z

OK! I've tidied this up and ensured that the increased variety in generation does not change shrinking. Ready for another review 😄

hypothesis-python/src/hypothesis/searchstrategy/strings.py

hypothesis-python/src/hypothesis/internal/charmap.py

hypothesis-python/src/hypothesis/searchstrategy/strings.py

Zac-HD · 2018-10-27T06:49:15Z

Hmm. After digging into this, it looks like it's not fully broken - instead, there are some tests that don't reliable shrink to a fixpoint within the 500 shrink limit. @DRMacIver, any idea what I should do about that?

When passed an alphabet of characters (not a strategy), we can do much better for shrinking than simply sampling from it - delegating to characters() if possible or sorting before sampling otherwise.

Zac-HD · 2018-12-08T14:15:44Z

I'm going to close this issue for now, as while the approach is promising there is also something of a clash between the generation and shrinking steps and I simply don't want to work on the frustration last ~5% of the problem at the moment.

Zac-HD · 2019-07-09T17:01:09Z

Rebased and (somewhat) updated version: master...Zac-HD:weird-text - shrinking isn't quite right at the moment but everything else seems to be working.

Zac-HD added the enhancement it's not broken, but we want it to be better label Oct 5, 2018

Zac-HD force-pushed the character-generation branch 7 times, most recently from b678f58 to f4f4635 Compare October 6, 2018 05:12

Zalathar reviewed Oct 6, 2018

View reviewed changes

hypothesis-python/src/hypothesis/strategies.py Outdated Show resolved Hide resolved

Zac-HD force-pushed the character-generation branch from f4f4635 to fb7b08b Compare October 6, 2018 10:49

DRMacIver reviewed Oct 9, 2018

View reviewed changes

hypothesis-python/src/hypothesis/searchstrategy/strings.py Outdated Show resolved Hide resolved

Zac-HD mentioned this pull request Oct 11, 2018

Support automatic 'swarm testing' for example selection #1637

Closed

Zac-HD force-pushed the character-generation branch 2 times, most recently from dad21c7 to 33c6223 Compare October 11, 2018 12:43

Zac-HD changed the title ~~Increase variation in generated characters and change shrink order~~ Increase variation in Unicode character generation Oct 11, 2018

Zac-HD force-pushed the character-generation branch from 33c6223 to 0ed071f Compare October 23, 2018 09:23

DRMacIver reviewed Oct 24, 2018

View reviewed changes

hypothesis-python/src/hypothesis/searchstrategy/strings.py Outdated Show resolved Hide resolved

hypothesis-python/src/hypothesis/internal/charmap.py Outdated Show resolved Hide resolved

Zac-HD force-pushed the character-generation branch 2 times, most recently from b03bfaa to 79f85fa Compare October 25, 2018 01:50

DRMacIver reviewed Oct 25, 2018

View reviewed changes

hypothesis-python/src/hypothesis/searchstrategy/strings.py Outdated Show resolved Hide resolved

Zac-HD force-pushed the character-generation branch 4 times, most recently from 38284e6 to aa894b5 Compare October 25, 2018 12:21

Zac-HD mentioned this pull request Oct 27, 2018

Clarify the API and docs for st.text() #1660

Merged

Zac-HD force-pushed the character-generation branch from aa894b5 to b0b97b9 Compare October 27, 2018 11:20

Zac-HD force-pushed the character-generation branch from b0b97b9 to 396e56c Compare December 3, 2018 07:37

Improve shrinking order for text()

477d746

When passed an alphabet of characters (not a strategy), we can do much better for shrinking than simply sampling from it - delegating to characters() if possible or sorting before sampling otherwise.

Increase variety in characters() generation

f1f951d

Zac-HD force-pushed the character-generation branch from 396e56c to f1f951d Compare December 8, 2018 11:24

Zac-HD closed this Dec 8, 2018

This was referenced Dec 18, 2018

st.floats() with bounds do not shrink correctly #1704

Closed

We should add a guide to writing shrinker-friendly strategies #1705

Closed

Zac-HD mentioned this pull request Jul 9, 2019

Pass alphabet through to characters() #2044

Merged

Zac-HD mentioned this pull request Jul 10, 2020

Improve shrink ordering of text #2482

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase variation in Unicode character generation #1621

Increase variation in Unicode character generation #1621

Zac-HD commented Oct 5, 2018 •

edited

Loading

Zac-HD commented Oct 11, 2018

Zac-HD commented Oct 27, 2018

Zac-HD commented Dec 8, 2018

Zac-HD commented Jul 9, 2019

Increase variation in Unicode character generation #1621

Increase variation in Unicode character generation #1621

Conversation

Zac-HD commented Oct 5, 2018 • edited Loading

Zac-HD commented Oct 11, 2018

Zac-HD commented Oct 27, 2018

Zac-HD commented Dec 8, 2018

Zac-HD commented Jul 9, 2019

Zac-HD commented Oct 5, 2018 •

edited

Loading