⚡️ Don't memoize SequenceSet#string on normalized sets
#554
+107
−29
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Not duplicating the data in
@tuplesand@stringsaves memory. For large sequence sets, this memory savings can be substantial.But this is a tradeoff: it saves time when the string is not used, but uses more time when the string is used more than once. Working with set operations can create many ephemeral sets, so avoiding unintentional string generation can save a lot of time.
Also, by quickly scanning the entries after a string is parsed, we can bypass the merge algorithm for normalized strings. But this does cause a small penalty for non-normalized strings.
Please note: It is still possible to create a memoized string on a normalized SequenceSet with
#append. For example: create a monotonically sorted SequenceSet with non-normal final entry, then call#appendwith an adjacently following entry.#appendcoalesces the final entry and converts it into normal form, but doesn't check whether the preceding entries of the SequenceSet are normalized.Benchmarks
Results from benchmarks/sequence_set-normalize.yml
There is still room for improvement here, because
#normalizegenerates the normalized string for comparison rather than just reparse the string.Results from benchmarks/sequence_set-new.yml
Note that this benchmark doesn't use
SequenceSet::new; it usesSequenceSet::[], which freezes the result. In this case, the benchmark result differences are mostly driven by improved performance of#freeze.The new code is ~1-6% slower for shuffled strings, but ~30-40% faster for sorted sets (note that unsorted non-string inputs create a sorted set).