Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pat: Correct number matching (again) #330

Merged
merged 3 commits into from
Jul 9, 2024
Merged

pat: Correct number matching (again) #330

merged 3 commits into from
Jul 9, 2024

Conversation

timbray
Copy link
Owner

@timbray timbray commented Jul 8, 2024

Thanks to @jsmorph for concurrency_test.go, whose surprising result revealed that the work for this PR was not nearly finished. More work was required on (a) detecting whether a number in an incoming event could be made comparable, (b) communicating that fact to the matching code, (c) marking NFAs which contained comparable numbers, leading to the following statement in value_matcher.go:

		if vmFields.hasQNumbers && eventField.IsQNumber {
			qNumber, err := qNumFromBytes(val)

Also, (d) the unit tests needed work to cover the cases that were previously issued.

Now, there is still a performance cost, particularly in benchmarks that do a lot of matching of numbers; correctness isn't free. But at the moment, I think the trade-off is acceptable.

Note that having previously called these things comparableNumbers and canonicalNumbers and disliking both and getting long ugly variable names, I renamed them to Q numbers, where Q is for Quamina.

timbray added 3 commits July 2, 2024 21:07
addresses #27

Signed-off-by: Tim Bray <tbray@textuality.com>
Signed-off-by: Tim Bray <tbray@textuality.com>
Signed-off-by: Tim Bray <tbray@textuality.com>
@codecov-commenter
Copy link

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 99.31034% with 1 line in your changes missing coverage. Please review.

Project coverage is 96.73%. Comparing base (b1c32f5) to head (c1b627e).

Files Patch % Lines
numbers.go 97.87% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #330      +/-   ##
==========================================
+ Coverage   96.48%   96.73%   +0.24%     
==========================================
  Files          18       18              
  Lines        1764     1837      +73     
==========================================
+ Hits         1702     1777      +75     
+ Misses         35       34       -1     
+ Partials       27       26       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@timbray
Copy link
Owner Author

timbray commented Jul 8, 2024

tl:dr: I'm OK with the slowdown reported here.

As remarked, the numeric processing cost is real. This benchmark that shows this is using the pattern:

{ "context": { "user_id": [9034], "friends_count": [158] } }

Which is purely numeric. So, instead of matching 3- and 4-byte numbers, we're matching 14-byte canonical representations. Interestingly, when I measure matches/second metrics, the slowdown is dramatically smaller than that reported by the GitHub CI benchmark runner. (Not sure why.)

Finally, for the particular type of NFA produced for number atching, there are several obvious optimizations in the way the NFA-matcher runs, which will further reduce the impact of numeric accuracy. I think accuracy is important, so that an event like the following would match the pattern above.

{ "context": { "user_id": 9034, "friends_count": 1.58e2 }} 

(or 158.0 or any other variant)

@timbray timbray merged commit 20480ea into main Jul 9, 2024
6 of 7 checks passed
@timbray timbray deleted the numbers branch July 9, 2024 17:01
@timbray timbray mentioned this pull request Jul 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants