Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(query): optimize stringsearch like #8720

Merged
merged 1 commit into from
Nov 9, 2022

Conversation

TCeason
Copy link
Collaborator

@TCeason TCeason commented Nov 9, 2022

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

optimize like parttern

0.15s -> 0.10s

-- before optimize:
mysql> select count() from orders where o_clerk like '%lerk#00000177%';
+---------+
| count() |
+---------+
|   15086 |
+---------+
1 row in set (0.15 sec)
Read 15000000 rows, 329.02 MiB in 0.145 sec., 103.29 million rows/sec., 2.21 GiB/sec.

--after optimize:

mysql> select count() from orders where o_clerk like '%lerk#00000177%';
+---------+
| count() |
+---------+
|   15086 |
+---------+
1 row in set (0.10 sec)
Read 15000000 rows, 329.02 MiB in 0.101 sec., 147.92 million rows/sec., 3.17 GiB/sec.

Closes #8458

@vercel
Copy link

vercel bot commented Nov 9, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Updated
databend ⬜️ Ignored (Inspect) Nov 9, 2022 at 11:41AM (UTC)

@mergify mergify bot added the pr-feature this PR introduces a new feature to the codebase label Nov 9, 2022
@TCeason
Copy link
Collaborator Author

TCeason commented Nov 9, 2022

In ClickHouse != string cost 0.055 sec, in databend != string cost 0.250 sec, And I retest the int comparison and modify the result in there.

!= string and int comparison ck 4x faster than databend.

And in this pr, the ck like function is also 4x faster than databend.

databend-arch :) select version();

SELECT version()

Query id: 9e68b4d6-c26c-40b5-9aee-646da7b573b0

┌─version()─┐
│ 22.7.1.1  │
└───────────┘

1 row in set. Elapsed: 0.004 sec. 

databend-arch :) select count() from orders where o_comment != '%s%p%';

SELECT count()
FROM orders
WHERE o_comment != '%s%p%'

Query id: 41b001f2-bf71-4a46-9815-bb3f73460d99

┌──count()─┐
│ 15000000 │
└──────────┘

1 row in set. Elapsed: 0.055 sec. Processed 15.00 million rows, 858.40 MB (274.07 million rows/s., 15.68 GB/s.)
mysql> select count() from orders where o_comment != '%s%p%';
+----------+
| count()  |
+----------+
| 15000000 |
+----------+
1 row in set (0.25 sec)
Read 15000000 rows, 808.13 MiB in 0.250 sec., 60 million rows/sec., 3.16 GiB/sec.

@TCeason TCeason requested a review from sundy-li November 9, 2022 11:51
@sundy-li sundy-li requested a review from RinChanNOWWW November 9, 2022 12:17
@BohuTANG BohuTANG merged commit 199a370 into databendlabs:main Nov 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature: Optimize StringLikeSearch faster
3 participants