Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](like_func) incorrect result of like with 'NO_BACKSLASH_ESCAPES' mode #27842

Merged
merged 1 commit into from
Dec 1, 2023

Conversation

mrhhsg
Copy link
Member

@mrhhsg mrhhsg commented Nov 30, 2023

Proposed changes

Issue Number: close #xxx

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@mrhhsg
Copy link
Member Author

mrhhsg commented Nov 30, 2023

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit ba78782cbd0599c7e7f81fe9c1d3b2fd15755b2c, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4919	4657	4661	4657
q2	357	158	159	158
q3	1519	1386	1274	1274
q4	1154	945	977	945
q5	3261	3247	3220	3220
q6	259	133	135	133
q7	1030	530	593	530
q8	2229	2240	2240	2240
q9	6964	6966	6958	6958
q10	3313	3362	3409	3362
q11	347	218	219	218
q12	356	217	214	214
q13	4655	3932	3917	3917
q14	257	219	216	216
q15	584	529	524	524
q16	432	418	384	384
q17	1029	682	610	610
q18	8624	8437	7310	7310
q19	1587	1554	1547	1547
q20	554	352	588	352
q21	3478	3072	3089	3072
q22	381	301	314	301
Total cold run time: 47289 ms
Total hot run time: 42142 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4613	4591	4587	4587
q2	332	213	263	213
q3	3754	3773	3743	3743
q4	2524	2523	2509	2509
q5	6211	6176	6204	6176
q6	252	126	128	126
q7	2638	1954	1966	1954
q8	3770	3783	3864	3783
q9	9418	9301	9406	9301
q10	4070	4152	4165	4152
q11	692	521	531	521
q12	794	635	611	611
q13	4390	3687	3646	3646
q14	280	246	244	244
q15	611	522	532	522
q16	537	525	488	488
q17	2236	2192	2175	2175
q18	9459	8858	9106	8858
q19	1790	1773	1797	1773
q20	2321	2025	1962	1962
q21	7459	6946	7397	6946
q22	603	570	594	570
Total cold run time: 68754 ms
Total hot run time: 64860 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.24 seconds
stream load tsv: 564 seconds loaded 74807831229 Bytes, about 126 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.3 seconds inserted 10000000 Rows, about 341K ops/s
storage size: 17167330218 Bytes

@mrhhsg
Copy link
Member Author

mrhhsg commented Dec 1, 2023

run buildall

Copy link
Contributor

github-actions bot commented Dec 1, 2023

clang-tidy review says "All clean, LGTM! 👍"

@@ -317,6 +317,8 @@ Status FunctionLikeBase::hs_prepare(FunctionContext* context, const char* expres
return Status::RuntimeError("hs_alloc_scratch allocate scratch space error");
}

LOG(INFO) << "****** expression: " << expression;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe too many logs

@@ -544,6 +552,7 @@ Status FunctionLike::open(FunctionContext* context, FunctionContext::FunctionSta
const auto& pattern = pattern_col->get_data_at(0);

std::string pattern_str = pattern.to_string();
LOG(INFO) << "****** >>>>> pattern_str: " << pattern_str << ".";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too many logs

@mrhhsg
Copy link
Member Author

mrhhsg commented Dec 1, 2023

run buildall

Copy link
Contributor

github-actions bot commented Dec 1, 2023

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

@yiguolei yiguolei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

github-actions bot commented Dec 1, 2023

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.91 seconds
stream load tsv: 563 seconds loaded 74807831229 Bytes, about 126 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.8 seconds inserted 10000000 Rows, about 347K ops/s
storage size: 17164436394 Bytes

@yiguolei
Copy link
Contributor

yiguolei commented Dec 1, 2023

run buildall

yiguolei
yiguolei previously approved these changes Dec 1, 2023
Copy link
Contributor

@yiguolei yiguolei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

github-actions bot commented Dec 1, 2023

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Dec 1, 2023
Copy link
Contributor

github-actions bot commented Dec 1, 2023

PR approved by anyone and no changes requested.

Copy link
Contributor

github-actions bot commented Dec 1, 2023

clang-tidy review says "All clean, LGTM! 👍"

@mrhhsg
Copy link
Member Author

mrhhsg commented Dec 1, 2023

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 43.94 seconds
stream load tsv: 558 seconds loaded 74807831229 Bytes, about 127 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.7 seconds inserted 10000000 Rows, about 348K ops/s
storage size: 17167344429 Bytes

mrhhsg added a commit to mrhhsg/doris that referenced this pull request Dec 1, 2023
mrhhsg added a commit to mrhhsg/doris that referenced this pull request Dec 1, 2023
Copy link
Contributor

@HappenLee HappenLee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@yiguolei yiguolei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yiguolei yiguolei merged commit 007506c into apache:master Dec 1, 2023
26 of 29 checks passed
eldenmoon pushed a commit to eldenmoon/incubator-doris that referenced this pull request Dec 3, 2023
eldenmoon pushed a commit to eldenmoon/incubator-doris that referenced this pull request Dec 3, 2023
eldenmoon added a commit that referenced this pull request Dec 4, 2023
* [chore](case) Use correct insert stmt for cold heat separation case #27546 (#27585)

Co-authored-by: AlexYue <yj976240184@gmail.com>

* [enhance](S3) Print the error detail for every s3 operation (#27572) (#27615)

* [nereids] fix stats error when using dateTime type filter #27571 (#27577)

* [fix](planner)sort node should materialized required slots for itself #27605 (#27620)

* [fix](Nereids) non-deterministic expression should not be constant (#27606) (#27631)

* [enhancement](stats) Add process for aggstate type #27640 (#27642)

* [Fix](statistics)Fix bug and improve auto analyze. (#27626) (#27657)

1. Implement needReAnalyzeTable for ExternalTable. For now, external table will not be reanalyzed in 10 days.
2. For HiveMetastoreCache.loadPartitions, handle the empty iterator case to avoid Index out of boundary exception.
3. Wrap handle show analyze loop with try catch, so that when one table failed (for example, catalog dropped so the table couldn't be found anymore), we can still show the other tables.
4. For now, only OlapTable and Hive HMSExternalTable support sample analyze, throw exception for other types of table.
5. In StatisticsCollector, call constructJob after createTableLevelTaskForExternalTable to avoid NPE.

* [profile](bugfix) should not cache profile content because the profile may not be a full profile (#27635)

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>

* [Enhance](fe) Support setting initial root password when FE firstly launch (#27438) (#27603)

* [opt](plan) only lock olap table when query plan #27639 (#27656)

bp #27639

* select coordinator node from user's tag when exec streaming load (#27106) (#27677)

* [fix](statistics)Need to recalculate health value when table row count become 0  #27673 (#27674)

backport #27673

* [fix](statistics)Fix sample min max npe bug  #27702 (#27707)

backport #27702

* [Bug](join) try fix wrong _has_null_in_build_side setted (#27684) (#27710)

* [Fix](show-load)Show load npe(userinfo is null) (#27698) (#27719)

* [pick](nereids)temporary partition is always pruned #27636 (#27722)

* [enhancement](stats) limit bq cap size for analyze task #27685 (#27687)

* [improvement](statistics) Add config for the threshold of column count for auto analyze #27713 (#27723)

* [doc](fix) k8s operator docs fix to 2.0 (#27476)

* [Improvement](planner)support select tablets with nereids optimize #23164 #23365 (#27740)

#23164
#23365

* [FIX](complextype)fix complex type hash equals (#27743)

* [fix](statistics) Fix show auto analyze missing jobs bug (#27761)

* [bugfix](topn) fix coredump in copy_column_data_to_block when nullable mismatch

return RuntimeError if copy_column_data_to_block nullable mismatch to avoid coredump in input_col_ptr->filter_by_selector(sel_rowid_idx, select_size, raw_res_ptr) .

The problem is reported by a doris user but I can not reproduce it, so there is no testcase added currently.

* [opt](stats) Use escape rather than base64 for min/max value #27746 (#27748)

* [refactor](http) disable snapshot and get_log_file api (#27724) (#27770)

* [branch-2.0](pick 27738) Warning log to trace send fragment #27738 (#27760)

* [branch-2.0](pick #27771) Add more detail msg for waitRPC exception (#27773)

* [Bug](pipeline) prevent PipelineFragmentContext destruct early (#27790)

* [deps](compression) Opt gzip decompress by libdeflate on X86 and X86_64 platforms: 1. Add libdeflate lib.  (#27542) (#27711)

Backport from #27542.

* [FIX](case)fix case truncate table first #27792

* [doc](stats) add auto_analyze_table_width_threshold description. (#27818) (#27832)

* [fix](bdbje) Fix bdbje logging level not work (#27597) (#27788)

* `EnvironmentConfig.FILE_LOGGING_LEVEL` only set FileHandlerLevel, we should
   set logger level firstly, otherwise it will not take effect.

* [Opt](compression) Opt gzip decompress by libdeflate on X86 and X86_64 platforms: 2. Opt gzip decompression by libdeflate lib. (#27669) (#27801)

Backport from #27669.

* [branch-2.0](fix) Fix broken exception message #27836

* [Bug](func) coredump in equal for null in function (#27843)

* [minor](stats) Update olap table row count after analyze (#27858)

pick from master #27814

* [fix](stats)min and max return NaN when table is empty (#27863)

fix analyze empty table and min/max null value bug:
1. Skip empty analyze task for sample analyze task. (Full analyze task already skipped).
2. Check sample rows is not 0 before calculate the scale factor.
3. Remove ' in sql template after remove base64 encoding for min/max value.

backport #27862

* [minor](stats) Throw error when sync analyze failed (#27846)

pick from master #27845

* [fix](stats) Don't save colToPartitions anymore to save mem (#27880)

pick from master #27879

* [fix](nereids) set operation's result type is wrong if decimal overflows (#27872)

pick from master #27870

* [Config] Modify the default value of tablet_schema_cache_recycle_interval (#27877)

* [fix](like_func) incorrect result of like with 'NO_BACKSLASH_ESCAPES' mode(#27842) (#27851)

* [fix](fe) Fix show frontends npt in some situations (#27295) (#27789)

```
java.lang.NullPointerException: null
    at com.sleepycat.je.rep.util.ReplicationGroupAdmin.getMasterSocket(ReplicationGroupAdmin.java:191)
    at com.sleepycat.je.rep.util.ReplicationGroupAdmin.doMessageExchange(ReplicationGroupAdmin.java:607)
    at com.sleepycat.je.rep.util.ReplicationGroupAdmin.getGroup(ReplicationGroupAdmin.java:406)
    at org.apache.doris.ha.BDBHA.getElectableNodes(BDBHA.java:132)
    at org.apache.doris.common.proc.FrontendsProcNode.getFrontendsInfo(FrontendsProcNode.java:84)
    at org.apache.doris.qe.ShowExecutor.handleShowFrontends(ShowExecutor.java:1923)
    at org.apache.doris.qe.ShowExecutor.execute(ShowExecutor.java:355)
    at org.apache.doris.qe.StmtExecutor.handleShow(StmtExecutor.java:2113)
    ...
```

* [branch-2.0](fix) Fix extremely high CPU usage caused by rf merge #27894 (#27895)

* [fix](stacktrace) ignore stacktrace for error code INVALID_ARGUMENT INVERTED_INDEX_NOT_IMPLEMENTED (#27898)

* ignore stacktrace for error INVALID_ARGUMENT INVERTED_INDEX_NOT_IMPLEMENTED

* AndBlockColumnPredicate::evaluate

* [opt](nereids) Branch-2.0: remove partition & histogram from col stats to reduce memory usage #27885 (#27896)

* [pick](Nereids) temporary partition is selected only if user manually specified: Branch-2.0 #27893 (#27905)

* [fix](multi-catalog)support the max compute partition prune (#27154) (#27902)

backport #27154

* [fix](Nereids) should not push down project to the nullable side of outer join #27912 (#27913)

* fix compile

---------

Co-authored-by: Dongyang Li <hello_stephen@qq.com>
Co-authored-by: AlexYue <yj976240184@gmail.com>
Co-authored-by: xzj7019 <131111794+xzj7019@users.noreply.github.com>
Co-authored-by: starocean999 <40539150+starocean999@users.noreply.github.com>
Co-authored-by: morrySnow <101034200+morrySnow@users.noreply.github.com>
Co-authored-by: AKIRA <33112463+Kikyou1997@users.noreply.github.com>
Co-authored-by: Jibing-Li <64681310+Jibing-Li@users.noreply.github.com>
Co-authored-by: yiguolei <676222867@qq.com>
Co-authored-by: yiguolei <yiguolei@gmail.com>
Co-authored-by: DuRipeng <453243496@qq.com>
Co-authored-by: Mingyu Chen <morningman@163.com>
Co-authored-by: wangbo <wangbo@apache.org>
Co-authored-by: Pxl <pxl290@qq.com>
Co-authored-by: Calvin Kirs <acm_master@163.com>
Co-authored-by: minghong <englefly@gmail.com>
Co-authored-by: catpineapple <42031973+catpineapple@users.noreply.github.com>
Co-authored-by: amory <wangqiannan@selectdb.com>
Co-authored-by: Kang <kxiao.tiger@gmail.com>
Co-authored-by: zhiqiang <seuhezhiqiang@163.com>
Co-authored-by: Qi Chen <kaka11.chen@gmail.com>
Co-authored-by: Lei Zhang <27994433+SWJTU-ZhangLei@users.noreply.github.com>
Co-authored-by: HappenLee <happenlee@hotmail.com>
Co-authored-by: Lightman <31928846+Lchangliang@users.noreply.github.com>
Co-authored-by: Jerry Hu <mrhhsg@gmail.com>
Co-authored-by: slothever <18522955+wsjz@users.noreply.github.com>
gnehil pushed a commit to gnehil/doris that referenced this pull request Dec 4, 2023
XuJianxu pushed a commit to XuJianxu/doris that referenced this pull request Dec 14, 2023
} else {
// regular character or escaped special character
if (pattern[i] == '.' || pattern[i] == '[' || pattern[i] == ']' || pattern[i] == '{' ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these characters have special meaning in re2 so they must be escaped

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.0.3-merged reviewed usercase Important user case type label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants