Avoid slow regex_meet in _joinString

There are already some places where the calling code takes care of avoiding `.*` and just use `None`. However, this was not done in `_joinString` and was producing a trivial `.{0,}` causing slow calls to `regex_meet`. A sample schema that is made faster by this change is: ``` { 'anyOf': [ { 'title': 'MyEnum', 'enum': [ 'aaaaaaaaaa', 'bbbbbbbbbb', 'cccccccccc', 'dddddddddd', 'eeeeeeeeee', 'ffffffffff', 'gggggggggg', 'hhhhhhhhhh', 'iiiiiiiiii', 'kkkkkkkkkk' ] }, {'type': 'string'} ] } ``` Which takes ~6sec to be compared with itself with `isSubset` before this change and ~0.05sec after the change.
IBM · Jun 5, 2024 · 9c566f3 · 9c566f3
1 parent 8e65354
commit 9c566f3
Showing 1 changed file with 10 additions and 4 deletions.
diff --git a/jsonsubschema/_checkers.py b/jsonsubschema/_checkers.py
@@ -344,10 +344,16 @@ def _joinString(s1, s2):
                 mx = max(s1.maxLength, s2.maxLength)
                 if utils.is_num(mx):
                     ret["maxLength"] = mx
-                s1_range = utils.string_range_to_regex(
-                    s1.minLength, s1.maxLength)
-                s2_range = utils.string_range_to_regex(
-                    s2.minLength, s2.maxLength)
+                if s1.minLength == 0 and s1.maxLength == I.inf:
+                    s1_range = None
+                else:
+                    s1_range = utils.string_range_to_regex(
+                        s1.minLength, s1.maxLength)
+                if s2.minLength == 0 and s2.maxLength == I.inf:
+                    s2_range = None
+                else:
+                    s2_range = utils.string_range_to_regex(
+                        s2.minLength, s2.maxLength)
                 s1_new_pattern = utils.regex_meet(s1_range, s1.pattern)
                 s2_new_pattern = utils.regex_meet(s2_range, s2.pattern)
                 if s1_new_pattern and s2_new_pattern: