Skip to content

Commit

Permalink
Avoid slow regex_meet in _joinString
Browse files Browse the repository at this point in the history
There are already some places where the calling code takes care of
avoiding `.*` and just use `None`. However, this was not done in
`_joinString` and was producing a trivial `.{0,}` causing slow calls
to `regex_meet`.

A sample schema that is made faster by this change is:
```
{
    'anyOf': [
       {
           'title': 'MyEnum',
           'enum': [
               'aaaaaaaaaa',
               'bbbbbbbbbb',
               'cccccccccc',
               'dddddddddd',
               'eeeeeeeeee',
               'ffffffffff',
               'gggggggggg',
               'hhhhhhhhhh',
               'iiiiiiiiii',
               'kkkkkkkkkk'
           ]
       },
       {'type': 'string'}
    ]
}
```

Which takes ~6sec to be compared with itself with `isSubset` before this change
and ~0.05sec after the change.
  • Loading branch information
kmichel-aiven authored and shinnar committed Jun 5, 2024
1 parent 8e65354 commit 9c566f3
Showing 1 changed file with 10 additions and 4 deletions.
14 changes: 10 additions & 4 deletions jsonsubschema/_checkers.py
Original file line number Diff line number Diff line change
Expand Up @@ -344,10 +344,16 @@ def _joinString(s1, s2):
mx = max(s1.maxLength, s2.maxLength)
if utils.is_num(mx):
ret["maxLength"] = mx
s1_range = utils.string_range_to_regex(
s1.minLength, s1.maxLength)
s2_range = utils.string_range_to_regex(
s2.minLength, s2.maxLength)
if s1.minLength == 0 and s1.maxLength == I.inf:
s1_range = None
else:
s1_range = utils.string_range_to_regex(
s1.minLength, s1.maxLength)
if s2.minLength == 0 and s2.maxLength == I.inf:
s2_range = None
else:
s2_range = utils.string_range_to_regex(
s2.minLength, s2.maxLength)
s1_new_pattern = utils.regex_meet(s1_range, s1.pattern)
s2_new_pattern = utils.regex_meet(s2_range, s2.pattern)
if s1_new_pattern and s2_new_pattern:
Expand Down

0 comments on commit 9c566f3

Please sign in to comment.