Optimize mem utils functions #5658

solotzg · 2022-08-19T08:40:53Z

What problem does this PR solve?

Issue Number: ref #5294

What is changed and how it works?

add functions avx2_strstr, avx2_mem_equal, avx2_memchr
- avx2_strstr is same as std::string_view::find
- avx2_mem_equal is same as bcmp or std::memcmp(p1,p2,n) == 0
- avx2_memchr is same as std::memchr
optimize string equality comparison.

tiflash/libs/libcommon/include/common/StringRef.h

Line 93 in eaf1a4c

return mem_utils::memoryEqual(lhs.data, rhs.data, lhs.size);

is much slower than std::memcmp if str size is not very big.
- according to the test result, if str-size is more than 1000000, instructions about avx512 may begin to get better results
optimize expression like

Benchmark

ENV

tpch-100
1 tiflash
limit cpu up to 200%
x86-64/amd64
original commit: 8404e65

tiflash/dbms/src/Functions/tests/bench_collation.cpp

Lines 129 to 147 in eef9e22

    
           BENCH_EQ_COLLATOR(UTF8MB4_BIN); 
        
           BENCH_EQ_COLLATOR(UTF8MB4_GENERAL_CI); 
        
           BENCH_EQ_COLLATOR(UTF8MB4_UNICODE_CI); 
        
           BENCH_EQ_COLLATOR(UTF8_BIN); 
        
           BENCH_EQ_COLLATOR(UTF8_GENERAL_CI); 
        
           BENCH_EQ_COLLATOR(UTF8_UNICODE_CI); 
        
           BENCH_EQ_COLLATOR(ASCII_BIN); 
        
           BENCH_EQ_COLLATOR(BINARY); 
        
           BENCH_EQ_COLLATOR(LATIN1_BIN); 
        
           BENCH_LIKE_COLLATOR(UTF8MB4_BIN); 
        
           BENCH_LIKE_COLLATOR(UTF8MB4_GENERAL_CI); 
        
           BENCH_LIKE_COLLATOR(UTF8MB4_UNICODE_CI); 
        
           BENCH_LIKE_COLLATOR(UTF8_BIN); 
        
           BENCH_LIKE_COLLATOR(UTF8_GENERAL_CI); 
        
           BENCH_LIKE_COLLATOR(UTF8_UNICODE_CI); 
        
           BENCH_LIKE_COLLATOR(ASCII_BIN); 
        
           BENCH_LIKE_COLLATOR(BINARY); 
        
           BENCH_LIKE_COLLATOR(LATIN1_BIN);

Time(ns)	Original	Optimized	Improvement: (Original) / (Optimized) - 1.0
CollationEqBench/UTF8MB4_BIN	12428711	6228798	99.54%
CollationEqBench/UTF8_BIN	12956705	6141843	110.96%
CollationEqBench/ASCII_BIN	12625723	6229335	102.68%
CollationEqBench/BINARY	11870078	5837615	103.34%
CollationEqBench/LATIN1_BIN	13768201	6732640	104.50%
CollationLikeBench/UTF8MB4_BIN	37940667	20185747	87.96%
CollationLikeBench/UTF8_BIN	37803575	19914106	89.83%
CollationLikeBench/ASCII_BIN	36860160	17999743	104.78%
CollationLikeBench/BINARY	37449881	17599053	112.79%
CollationLikeBench/LATIN1_BIN	37503432	17675036	112.18%

test bcmp, mem_utils::memoryEqual(use avx512) and avx2_mem_equal
test std::string_view::find and avx2_strstr
MemUtilsEqual_xxx means test str-size is xxx
MemUtilsStrStr_xxx_yyy means test src-str-size is xxx and needle-str-size is yyy

Time(ns)	STL	Original-avx512	Optimized-avx2	Improvement: (STL) / (Optimized) - 1.0	Improvement: (Original) / (Optimized) - 1.0
check mem eq: MemUtilsEqual_${str-size}
MemUtilsEqual_13	4.46	7.22	4.15	7.47%	73.98%
MemUtilsEqual_65	4.88	8.69	4.31	13.23%	101.62%
MemUtilsEqual_100	9.9	13.3	5.65	75.22%	135.40%
MemUtilsEqual_10000	268	323	162	65.43%	99.38%
MemUtilsEqual_100000	3939	4353	3462	13.78%	25.74%
MemUtilsEqual_1000000	62265	53157	52600	18.37%	1.06%

str find: MemUtilsStrStr_${src-str-size}_${needle-str-size}
MemUtilsStrStr_1024_1	30882		21275	45.16%
MemUtilsStrStr_1024_7	34927		21279	64.14%
MemUtilsStrStr_1024_15	39364		23161	69.96%
MemUtilsStrStr_1024_31	40628		29435	38.03%
MemUtilsStrStr_1024_63	37381		26141	43.00%
MemUtilsStrStr_80_1	6130		3977	54.14%
MemUtilsStrStr_80_7	11720		6278	86.68%
MemUtilsStrStr_80_15	11585		5423	113.63%
MemUtilsStrStr_80_31	11467		9530	20.33%

SQL

select count(1) from orders where o_comment like '%pending%deposits%';

Time(s)	Original	Optimized		Improvement
	10.75	8.72
	10.92	8.87
	10.98	9.05
	10.7	8.64
	10.77	8.78		AVG(Original) / AVG(Optimized) - 1.0
AVG	10.824	8.812	Optimized : Original	22.83%

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No code

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

None

ti-chi-bot · 2022-08-19T08:40:54Z

[REVIEW NOTIFICATION]

This pull request has been approved by:

windtalker
zanmato1984

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

- add `avx2_strstr` to accelerate substr search - add `avx2_mem_equal` to accelerate mem equal cmp

solotzg · 2022-08-29T09:29:01Z

/run-all-tests

zanmato1984

LGTM

solotzg · 2022-08-29T13:39:49Z

/hold

solotzg · 2022-08-29T13:39:54Z

/merge

ti-chi-bot · 2022-08-29T13:39:55Z

@solotzg: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

ti-chi-bot · 2022-08-29T13:39:57Z

This pull request has been accepted and is ready to merge.

Commit hash: 8c85675

solotzg · 2022-08-30T01:26:56Z

/merge

ti-chi-bot · 2022-08-30T01:26:57Z

@solotzg: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

solotzg · 2022-08-30T01:35:46Z

/unhold

sre-bot · 2022-08-30T01:57:17Z

Coverage for changed files

Filename                                                        Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
dbms/src/Columns/ColumnString.cpp                                   172                81    52.91%          26                12    53.85%         402               204    49.25%         118                66    44.07%
dbms/src/Functions/CollationOperatorOptimized.h                     130                26    80.00%          11                 0   100.00%         201                 7    96.52%          78                 9    88.46%
dbms/src/Functions/CollationStringOptimized.cpp                       3                 0   100.00%           3                 0   100.00%           9                 0   100.00%           0                 0         -
dbms/src/Functions/CollationStringSearchOptimized.h                 141                 3    97.87%          20                 0   100.00%         302                 5    98.34%          84                 2    97.62%
dbms/src/Functions/FunctionsComparison.cpp                            8                 7    12.50%           8                 7    12.50%          42                29    30.95%           0                 0         -
dbms/src/Functions/FunctionsComparison.h                            604               304    49.67%          63                28    55.56%         949               505    46.79%         476               283    40.55%
dbms/src/Functions/FunctionsStringSearch.cpp                        645               344    46.67%          57                30    47.37%        1315               691    47.45%         410               223    45.61%
dbms/src/Functions/tests/gtest_strings_cmp.cpp                      112                14    87.50%           2                 0   100.00%          96                 0   100.00%          14                 8    42.86%
dbms/src/Storages/Transaction/CollatorUtils.h                        34                 4    88.24%          11                 1    90.91%          68                 8    88.24%          14                 1    92.86%
dbms/src/Storages/Transaction/tests/gtest_tidb_collator.cpp          23                 0   100.00%           6                 0   100.00%          91                 0   100.00%          14                 1    92.86%
libs/libcommon/include/common/StringRef.h                            49                12    75.51%          21                 5    76.19%          92                22    76.09%          26                12    53.85%
libs/libcommon/include/common/avx2_mem_utils.h                      215                21    90.23%          20                 3    85.00%         326                58    82.21%         118                 0   100.00%
libs/libcommon/include/common/avx2_strstr.h                         149                17    88.59%          14                 0   100.00%         229                31    86.46%          82                 3    96.34%
libs/libcommon/include/common/mem_utils.h                           127                34    73.23%           7                 0   100.00%         130                19    85.38%         116                37    68.10%
libs/libcommon/include/common/mem_utils_opt.h                         6                 1    83.33%           3                 1    66.67%          45                33    26.67%           2                 0   100.00%
libs/libcommon/src/avx2_mem_utils_impl.cpp                            5                 1    80.00%           5                 1    80.00%          15                 3    80.00%           0                 0         -
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                                              2423               869    64.14%         277                88    68.23%        4312              1615    62.55%        1552               645    58.44%

Coverage summary

Functions  MissedFunctions  Executed  Lines   MissedLines  Cover
18479      8325             54.95%    213806  85973        59.79%

full coverage report (for internal network access only)

solotzg · 2022-08-30T03:46:21Z

/run-sanitizer-test asan

This reverts commit a8c8cb1.

solotzg added type/enhancement The issue or PR belongs to an enhancement. type/performance labels Aug 19, 2022

ti-chi-bot added release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Aug 19, 2022

Optimize mem utils functions

0ce3688

- add `avx2_strstr` to accelerate substr search - add `avx2_mem_equal` to accelerate mem equal cmp

solotzg force-pushed the optimize-mem-utils branch from f4d60f7 to 0ce3688 Compare August 19, 2022 08:48

solotzg mentioned this pull request Aug 19, 2022

Improve the performance of new collation releated function/executor #5294

Closed

9 tasks

solotzg added 9 commits August 23, 2022 23:47

Fix bug & optimize like

e438abe

add bench

b828b89

fix scripts

69cc34f

remove like

7669928

rename modules

f7bd15c

rename val

d31e215

rename namespace

f9de754

fix compile

aa8da71

more comment

87d553c

solotzg requested review from zanmato1984 and windtalker August 24, 2022 09:40

fix typo

7067e4b

pingcap deleted a comment from sre-bot Aug 25, 2022

solotzg added 2 commits August 25, 2022 17:49

try optimize comparison

fea0d46

fix compile

32d89f4

solotzg force-pushed the optimize-mem-utils branch from ecc60e0 to 32d89f4 Compare August 25, 2022 10:05

solotzg added 3 commits August 25, 2022 18:32

clang-tidy

dbdb138

add more tests

9d14b5e

re fix compile

060a932

zanmato1984 approved these changes Aug 29, 2022

View reviewed changes

ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Aug 29, 2022

pingcap deleted a comment from sre-bot Aug 29, 2022

Merge branch 'master' into optimize-mem-utils

8c85675

ti-chi-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 29, 2022

ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Aug 29, 2022

pingcap deleted a comment from sre-bot Aug 29, 2022

solotzg added 2 commits August 29, 2022 23:12

Update avx2_strstr.h

13ec3fa

Merge branch 'master' into optimize-mem-utils

c8247f1

ti-chi-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 30, 2022

ti-chi-bot merged commit a8c8cb1 into pingcap:master Aug 30, 2022

solotzg deleted the optimize-mem-utils branch August 30, 2022 02:03

pingcap deleted a comment from sre-bot Aug 30, 2022

solotzg mentioned this pull request Aug 31, 2022

*: fix asan false positive case #5742

Merged

12 tasks

solotzg added a commit to solotzg/tiflash that referenced this pull request Sep 19, 2022

Revert "Optimize mem utils functions (pingcap#5658)"

48f1992

This reverts commit a8c8cb1.

JaySon-Huang mentioned this pull request Oct 2, 2022

v6.3.0 tiflash coredump when running in a CPU without AVX support #6075

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize mem utils functions #5658

Optimize mem utils functions #5658

solotzg commented Aug 19, 2022 •

edited

Loading

ti-chi-bot commented Aug 19, 2022 •

edited

Loading

solotzg commented Aug 29, 2022

zanmato1984 left a comment

solotzg commented Aug 29, 2022

solotzg commented Aug 29, 2022

ti-chi-bot commented Aug 29, 2022

ti-chi-bot commented Aug 29, 2022

solotzg commented Aug 30, 2022

ti-chi-bot commented Aug 30, 2022

solotzg commented Aug 30, 2022

sre-bot commented Aug 30, 2022

solotzg commented Aug 30, 2022

	BENCH_EQ_COLLATOR(UTF8MB4_BIN);
	BENCH_EQ_COLLATOR(UTF8MB4_GENERAL_CI);
	BENCH_EQ_COLLATOR(UTF8MB4_UNICODE_CI);
	BENCH_EQ_COLLATOR(UTF8_BIN);
	BENCH_EQ_COLLATOR(UTF8_GENERAL_CI);
	BENCH_EQ_COLLATOR(UTF8_UNICODE_CI);
	BENCH_EQ_COLLATOR(ASCII_BIN);
	BENCH_EQ_COLLATOR(BINARY);
	BENCH_EQ_COLLATOR(LATIN1_BIN);

	BENCH_LIKE_COLLATOR(UTF8MB4_BIN);
	BENCH_LIKE_COLLATOR(UTF8MB4_GENERAL_CI);
	BENCH_LIKE_COLLATOR(UTF8MB4_UNICODE_CI);
	BENCH_LIKE_COLLATOR(UTF8_BIN);
	BENCH_LIKE_COLLATOR(UTF8_GENERAL_CI);
	BENCH_LIKE_COLLATOR(UTF8_UNICODE_CI);
	BENCH_LIKE_COLLATOR(ASCII_BIN);
	BENCH_LIKE_COLLATOR(BINARY);
	BENCH_LIKE_COLLATOR(LATIN1_BIN);

Optimize mem utils functions #5658

Optimize mem utils functions #5658

Conversation

solotzg commented Aug 19, 2022 • edited Loading

What problem does this PR solve?

What is changed and how it works?

Benchmark

Check List

Release note

ti-chi-bot commented Aug 19, 2022 • edited Loading

solotzg commented Aug 29, 2022

zanmato1984 left a comment

Choose a reason for hiding this comment

solotzg commented Aug 29, 2022

solotzg commented Aug 29, 2022

ti-chi-bot commented Aug 29, 2022

ti-chi-bot commented Aug 29, 2022

solotzg commented Aug 30, 2022

ti-chi-bot commented Aug 30, 2022

solotzg commented Aug 30, 2022

sre-bot commented Aug 30, 2022

Coverage for changed files

Coverage summary

solotzg commented Aug 30, 2022

solotzg commented Aug 19, 2022 •

edited

Loading

ti-chi-bot commented Aug 19, 2022 •

edited

Loading