Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](S3FileWriter) Fix boundary issue when multipart upload #43037

Merged

Conversation

gavinchou
Copy link
Contributor

When the file data size is a multiple of config::s3_write_buffer_size, number of parts may exceed the actual number of parts that need to be uploaded. This is because it is incremented by 1 in advance within the S3FileWriter::appendv method.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

When the file data size is a multiple of config::s3_write_buffer_size,
number of parts may exceed the actual number of parts that need to be uploaded.
This is because it is incremented by 1 in advance within the S3FileWriter::appendv method.
@gavinchou gavinchou force-pushed the gavin-fix-s3-file-writer-5m-boundary branch from d307e67 to 6266e36 Compare October 31, 2024 18:17
@gavinchou
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 41684 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 6266e36a06d1cbf375ee55f4cf94c2e51a86bc87, data reload: false

------ Round 1 ----------------------------------
q1	17715	7434	7348	7348
q2	2044	165	180	165
q3	10698	1111	1138	1111
q4	10433	817	881	817
q5	7781	3088	3138	3088
q6	239	149	149	149
q7	1029	621	605	605
q8	9376	1972	2067	1972
q9	6696	6513	6526	6513
q10	7097	2455	2412	2412
q11	473	258	262	258
q12	416	220	208	208
q13	17787	3036	3037	3036
q14	235	213	210	210
q15	591	530	516	516
q16	669	604	599	599
q17	990	530	539	530
q18	7414	6787	6899	6787
q19	1377	1012	1039	1012
q20	477	180	176	176
q21	4000	3174	3359	3174
q22	1102	998	1010	998
Total cold run time: 108639 ms
Total hot run time: 41684 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7348	7284	7286	7284
q2	327	229	225	225
q3	3065	2961	2956	2956
q4	2130	1874	1735	1735
q5	5753	5786	5877	5786
q6	219	141	135	135
q7	2293	1830	1774	1774
q8	3402	3530	3477	3477
q9	9000	8977	8947	8947
q10	3593	3547	3561	3547
q11	605	492	500	492
q12	828	665	637	637
q13	9554	3230	3218	3218
q14	301	271	278	271
q15	576	522	522	522
q16	705	645	645	645
q17	1847	1631	1593	1593
q18	8420	7875	7663	7663
q19	1729	1594	1490	1490
q20	2129	1850	1861	1850
q21	5622	5452	5490	5452
q22	1169	1035	1058	1035
Total cold run time: 70615 ms
Total hot run time: 60734 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.82% (9821/25969)
Line Coverage: 28.99% (81587/281458)
Region Coverage: 28.28% (42148/149038)
Branch Coverage: 24.84% (21380/86062)
Coverage Report: http://coverage.selectdb-in.cc/coverage/6266e36a06d1cbf375ee55f4cf94c2e51a86bc87_6266e36a06d1cbf375ee55f4cf94c2e51a86bc87/report/index.html

@doris-robot
Copy link

TPC-DS: Total hot run time: 196305 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 6266e36a06d1cbf375ee55f4cf94c2e51a86bc87, data reload: false

query1	1226	927	916	916
query2	6224	2023	2046	2023
query3	10783	3923	4013	3923
query4	67728	29183	23555	23555
query5	4921	449	424	424
query6	409	168	172	168
query7	5538	300	294	294
query8	302	224	222	222
query9	8683	2742	2760	2742
query10	443	251	252	251
query11	17421	15202	15898	15202
query12	159	97	101	97
query13	1480	416	438	416
query14	10893	7454	7173	7173
query15	218	189	195	189
query16	7068	438	421	421
query17	1013	560	553	553
query18	1773	292	297	292
query19	189	160	147	147
query20	113	108	112	108
query21	201	100	102	100
query22	4431	4385	4402	4385
query23	34643	33932	34204	33932
query24	6020	2776	2766	2766
query25	506	399	393	393
query26	646	163	159	159
query27	1659	284	288	284
query28	4227	2499	2466	2466
query29	688	432	426	426
query30	236	160	148	148
query31	955	784	803	784
query32	65	55	57	55
query33	413	273	265	265
query34	907	507	523	507
query35	842	728	733	728
query36	1085	979	965	965
query37	120	77	72	72
query38	4488	4332	4358	4332
query39	1472	1454	1432	1432
query40	199	98	97	97
query41	48	45	46	45
query42	111	99	94	94
query43	539	507	505	505
query44	1168	822	825	822
query45	187	170	176	170
query46	1138	710	699	699
query47	1965	1826	1887	1826
query48	427	312	320	312
query49	737	414	399	399
query50	819	401	400	400
query51	7316	7199	7147	7147
query52	102	90	89	89
query53	258	181	174	174
query54	528	422	405	405
query55	80	76	74	74
query56	265	250	255	250
query57	1329	1170	1134	1134
query58	220	229	205	205
query59	3240	3055	2957	2957
query60	271	247	240	240
query61	99	106	95	95
query62	778	695	688	688
query63	222	192	182	182
query64	1325	645	603	603
query65	3267	3216	3263	3216
query66	705	297	289	289
query67	15982	15796	15692	15692
query68	3612	594	586	586
query69	418	263	251	251
query70	1159	1157	1120	1120
query71	365	249	277	249
query72	6154	4050	4024	4024
query73	760	368	365	365
query74	10180	9023	9134	9023
query75	3366	2664	2683	2664
query76	1781	1027	960	960
query77	495	286	272	272
query78	10384	9463	9431	9431
query79	1475	596	618	596
query80	850	422	440	422
query81	506	237	250	237
query82	1267	114	114	114
query83	157	138	144	138
query84	281	72	66	66
query85	845	288	279	279
query86	350	295	300	295
query87	4977	4789	4685	4685
query88	3811	2197	2185	2185
query89	413	293	285	285
query90	2017	184	188	184
query91	131	97	99	97
query92	64	46	47	46
query93	1917	546	555	546
query94	786	295	286	286
query95	332	250	251	250
query96	628	287	282	282
query97	2892	2743	2733	2733
query98	213	199	194	194
query99	1564	1316	1327	1316
Total cold run time: 317663 ms
Total hot run time: 196305 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.59 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 6266e36a06d1cbf375ee55f4cf94c2e51a86bc87, data reload: false

query1	0.04	0.04	0.03
query2	0.07	0.03	0.03
query3	0.24	0.06	0.06
query4	1.62	0.10	0.11
query5	0.42	0.40	0.41
query6	1.13	0.65	0.65
query7	0.02	0.02	0.01
query8	0.04	0.03	0.04
query9	0.57	0.51	0.50
query10	0.54	0.54	0.57
query11	0.15	0.13	0.11
query12	0.14	0.10	0.10
query13	0.60	0.60	0.59
query14	2.71	2.77	2.84
query15	0.89	0.84	0.83
query16	0.40	0.39	0.40
query17	1.07	1.05	1.06
query18	0.20	0.20	0.19
query19	1.88	1.86	1.99
query20	0.02	0.01	0.01
query21	15.37	0.62	0.62
query22	2.54	1.87	1.52
query23	16.96	1.00	0.98
query24	2.89	2.15	1.05
query25	0.28	0.11	0.12
query26	0.50	0.12	0.13
query27	0.04	0.04	0.04
query28	9.91	1.09	1.08
query29	12.56	3.24	3.23
query30	0.25	0.07	0.05
query31	2.88	0.37	0.38
query32	3.27	0.46	0.46
query33	2.96	3.00	3.08
query34	17.16	4.45	4.47
query35	4.46	4.53	4.49
query36	0.67	0.50	0.47
query37	0.08	0.06	0.06
query38	0.04	0.03	0.03
query39	0.03	0.02	0.02
query40	0.15	0.13	0.13
query41	0.08	0.02	0.02
query42	0.03	0.02	0.02
query43	0.03	0.03	0.02
Total cold run time: 105.89 s
Total hot run time: 32.59 s

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 1, 2024
Copy link
Contributor

github-actions bot commented Nov 1, 2024

PR approved by at least one committer and no changes requested.

Copy link
Contributor

github-actions bot commented Nov 1, 2024

PR approved by anyone and no changes requested.

@gavinchou gavinchou merged commit be8b828 into apache:master Nov 5, 2024
24 of 27 checks passed
github-actions bot pushed a commit that referenced this pull request Nov 5, 2024
When the file data size is a multiple of config::s3_write_buffer_size,
number of parts may exceed the actual number of parts that need to be
uploaded. This is because it is incremented by 1 in advance within the
S3FileWriter::appendv method.
dataroaring pushed a commit that referenced this pull request Nov 7, 2024
…oad (#43254)

Cherry-picked from #43037

Co-authored-by: Gavin Chou <gavineaglechou@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/3.0.3-merged p0_b reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants