-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve performance of regexp_count #13364
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Dimchikkk for your contribution.
Do you mean the Entry API misbehaved returning Vacant all the time and forced the regexp pattern to be recompiled?
Hi @comphead , |
Please rebase from latest main to avoid the CI failure and personally I like the numbers |
@comphead I actually found the root cause... it was the cloning of regex that is expensive. Now the numbers even more sexy: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Dimchikkk its great first PR 👍
Since this is a first I'll wait for another member to approve and we can merge it.
@Dandandan if you dont mind to approve?
Yeah, that makes much more sense |
Nice find! |
Thank you guys, now I am wondering why other regexp functions slower than regexp_count :) |
That would be a great thing to check... one thing I saw in the arrow-rs kernels some (string) cloning is happening. Would be great to check & improve! |
I'm wondering if other |
Would be nice if you can check really quick other regexp functions if they can be optimized the same way |
* improve performance of regexp_count * fix clippy * collect with Int64Array to eliminate one temp Vec --------- Co-authored-by: Dima <dima.rets@ballys.com>
Which issue does this PR close?
Closes #13011
Rationale for this change
regexp_count becomes the fastest from regexp functions :)
Are these changes tested?
Are there any user-facing changes?
No