-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Add extra support to JSON string schema #927
[Feature] Add extra support to JSON string schema #927
Conversation
Thanks for getting this started! Just a few comments for you as you're working on this...
|
For (1), I'd absolutely say that it's "at your own risk" functionality. For (2), I'm quite confident that we cannot simultaneously enforce both a pattern and a min/maxLength. Imagine something like Of more immediate concern is that my negative test cases are failing really weirdly :-/ |
Ah, the |
That was on my "to investigate" list; was suspecting something like that. Of more immediate concern: if you look closely at the failing tests, I believe you'll find that they ought to be passing. |
guidance/library/_json.py
Outdated
from ._at_most_n_repeats import at_most_n_repeats | ||
from ._exactly_n_repeats import exactly_n_repeats |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be nice to use/reuse these to reduce redundant code in the regex implementation :)
guidance/library/_json.py
Outdated
lm += exactly_n_repeats(value=select(STRING_CHARS), n_repeats=min_length) | ||
lm += at_most_n_repeats(value=select(STRING_CHARS), n_repeats=(max_length - min_length)) | ||
else: | ||
lm += select( | ||
STRING_CHARS, | ||
recurse=True, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This functionality feels reusable! I.e. repeat(value, min, max)
with sane behaviors when min and max are unset. This logic happens in the regex code too, so you should be able to consolidate
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #927 +/- ##
==========================================
+ Coverage 57.18% 59.18% +1.99%
==========================================
Files 64 63 -1
Lines 4711 4733 +22
==========================================
+ Hits 2694 2801 +107
+ Misses 2017 1932 -85 ☔ View full report in Codecov by Sentry. |
@pytest.mark.parametrize( | ||
["bad_string", "good_bytes", "failure_byte", "allowed_bytes"], | ||
[ | ||
('""', b'"', b'"', None), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had hoped to be able to use Select(STRING_CHARS)._values
for the 'allowed_bytes' in this 'too short' case, but that wasn't quite getting it right. So I'm allowing None
to be passed, to turn off that check
model += at_most_n_repeats(value=value, n_repeats=(max_length - min_length)) | ||
elif min_length is not None: | ||
model += exactly_n_repeats(value=value, n_repeats=min_length) | ||
model += select([optional(value)], recurse=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you reuse zero_or_more here to reduce code duplication?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had actually been planning on reimplementing those two functions in terms of sequence()
. Of course, something isn't quite right, and the errors I'm getting are very Pythonic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh fair enough. Sorry about your python 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is one test involving the mocked model which is interfering with zero_or_more()
being a synonym for sequence()
. It's related to the max_tokens argument on gen, so not a core part of this PR, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you open an issue if it feels worthy? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. It could also be something strange in the way the mock model is working. But it would be good to run down.
Looking good! One ask -- could you maybe reuse the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks @riedgar-ms
Looking to add support for
pattern
,minLength
andmaxLength
to strings in JSON schema. This should address half of #925This has lead to the creation of a few new library functions to help out, and consolidation into a new
_sequences.py
library file. It is also hooked into_regex.py
.