-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Feature](orc-reader) Implement new merge io facility for orc reader. #45966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature](orc-reader) Implement new merge io facility for orc reader. #45966
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
95af48c to
772ffb6
Compare
|
run buildall |
772ffb6 to
7df1d9d
Compare
|
run buildall |
7df1d9d to
2fecd9c
Compare
|
run buildall |
2fecd9c to
ee35b47
Compare
|
run buildall |
ee35b47 to
5b1e090
Compare
|
run buildall |
TPC-H: Total hot run time: 32432 ms |
TPC-DS: Total hot run time: 190964 ms |
ClickBench: Total hot run time: 30.76 s |
|
TeamCity be ut coverage result: |
|
This pull request introduces a new Enhancements to ORC file reading:
Updates to ORC reader implementation:
Profiling improvements:
These changes aim to optimize the ORC file reading process by merging small I/O operations, improving profiling, and handling large I/O operations more efficiently. |
5b1e090 to
d9c405d
Compare
|
run buildall |
TPC-H: Total hot run time: 33560 ms |
TPC-DS: Total hot run time: 194809 ms |
|
TeamCity be ut coverage result: |
ClickBench: Total hot run time: 30.93 s |
d9c405d to
97f3975
Compare
|
run buildall |
TPC-H: Total hot run time: 31560 ms |
TPC-DS: Total hot run time: 190810 ms |
ClickBench: Total hot run time: 31.07 s |
|
TeamCity be ut coverage result: |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
morningman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
hubgeter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…49718) ### What problem does this PR solve? Related PR: #45966 ### Release note [opt] (orc-reader) Turn on late materialization of orc complex types. After implementing the new merge io function in #45966 to adapt the complex type delayed materialization and the need to backtrack to solve the reading characteristics, turn on the late materialization of orc complex types in orc reader.
… of orc-reader. (#51102) ### What problem does this PR solve? Related PR: #45966 Fix merge range not sorted in new merge io facility of orc-reader. Because the ranges taken from std::unordered_map<orc::StreamId, io::PrefetchRange>&ranges are not sorted, merging adjacent ranges will have a very poor effect.
…apache#45966) ### What problem does this PR solve? related: apache/doris-thirdparty#270 Problem Summary: The original merge io mechanism `MergeRangeFileReader` requires that the range must be read in order, and the ranges can be out of order, so the range cannot be read back. And if you turn on delayed materialization of orc complex types, you will need to present a stream readback scenario, such as `select struct_element(info, 'age'), id from test_orc_struct, where struct_element(info, 'name') = 'Alice'`. When late materialization is turned on, the present stream of the parent node `info` will be read first after `name` is read. When reading `age`, the parent node `info` needs to be read back. So the late materialization of the orc complex type cannot be turned on at present.
…pache#49718) ### What problem does this PR solve? Related PR: apache#45966 ### Release note [opt] (orc-reader) Turn on late materialization of orc complex types. After implementing the new merge io function in apache#45966 to adapt the complex type delayed materialization and the need to backtrack to solve the reading characteristics, turn on the late materialization of orc complex types in orc reader.
… of orc-reader. (apache#51102) ### What problem does this PR solve? Related PR: apache#45966 Fix merge range not sorted in new merge io facility of orc-reader. Because the ranges taken from std::unordered_map<orc::StreamId, io::PrefetchRange>&ranges are not sorted, merging adjacent ranges will have a very poor effect.
…pache#49718) Related PR: apache#45966 [opt] (orc-reader) Turn on late materialization of orc complex types. After implementing the new merge io function in apache#45966 to adapt the complex type delayed materialization and the need to backtrack to solve the reading characteristics, turn on the late materialization of orc complex types in orc reader.
…apache#45966) related: apache/doris-thirdparty#270 Problem Summary: The original merge io mechanism `MergeRangeFileReader` requires that the range must be read in order, and the ranges can be out of order, so the range cannot be read back. And if you turn on delayed materialization of orc complex types, you will need to present a stream readback scenario, such as `select struct_element(info, 'age'), id from test_orc_struct, where struct_element(info, 'name') = 'Alice'`. When late materialization is turned on, the present stream of the parent node `info` will be read first after `name` is read. When reading `age`, the parent node `info` needs to be read back. So the late materialization of the orc complex type cannot be turned on at present.
…apache#45966) related: apache/doris-thirdparty#270 Problem Summary: The original merge io mechanism `MergeRangeFileReader` requires that the range must be read in order, and the ranges can be out of order, so the range cannot be read back. And if you turn on delayed materialization of orc complex types, you will need to present a stream readback scenario, such as `select struct_element(info, 'age'), id from test_orc_struct, where struct_element(info, 'name') = 'Alice'`. When late materialization is turned on, the present stream of the parent node `info` will be read first after `name` is read. When reading `age`, the parent node `info` needs to be read back. So the late materialization of the orc complex type cannot be turned on at present.
…apache#45966) related: apache/doris-thirdparty#270 Problem Summary: The original merge io mechanism `MergeRangeFileReader` requires that the range must be read in order, and the ranges can be out of order, so the range cannot be read back. And if you turn on delayed materialization of orc complex types, you will need to present a stream readback scenario, such as `select struct_element(info, 'age'), id from test_orc_struct, where struct_element(info, 'name') = 'Alice'`. When late materialization is turned on, the present stream of the parent node `info` will be read first after `name` is read. When reading `age`, the parent node `info` needs to be read back. So the late materialization of the orc complex type cannot be turned on at present.
…apache#45966) related: apache/doris-thirdparty#270 Problem Summary: The original merge io mechanism `MergeRangeFileReader` requires that the range must be read in order, and the ranges can be out of order, so the range cannot be read back. And if you turn on delayed materialization of orc complex types, you will need to present a stream readback scenario, such as `select struct_element(info, 'age'), id from test_orc_struct, where struct_element(info, 'name') = 'Alice'`. When late materialization is turned on, the present stream of the parent node `info` will be read first after `name` is read. When reading `age`, the parent node `info` needs to be read back. So the late materialization of the orc complex type cannot be turned on at present.
… of orc-reader. (apache#51102) ### What problem does this PR solve? Related PR: apache#45966 Fix merge range not sorted in new merge io facility of orc-reader. Because the ranges taken from std::unordered_map<orc::StreamId, io::PrefetchRange>&ranges are not sorted, merging adjacent ranges will have a very poor effect.
…pache#49718) Related PR: apache#45966 [opt] (orc-reader) Turn on late materialization of orc complex types. After implementing the new merge io function in apache#45966 to adapt the complex type delayed materialization and the need to backtrack to solve the reading characteristics, turn on the late materialization of orc complex types in orc reader.
What problem does this PR solve?
related: apache/doris-thirdparty#270
Problem Summary:
The original merge io mechanism
MergeRangeFileReaderrequires that the range must be read in order, and the ranges can be out of order, so the range cannot be read back.And if you turn on delayed materialization of orc complex types, you will need to present a stream readback scenario, such as
select struct_element(info, 'age'), id from test_orc_struct, where struct_element(info, 'name') = 'Alice'.When late materialization is turned on, the present stream of the parent node
infowill be read first afternameis read. When readingage, the parent nodeinfoneeds to be read back. So the late materialization of the orc complex type cannot be turned on at present.Release note
The new merge io mechanism classifies the ranges read by the stream of orc stripe into small ranges and large ranges according to the
orc_once_max_read_bytessize. The ranges smaller than theorc_once_max_read_bytessize are divided into small ranges, and the ranges exceeding theorc_once_max_read_bytessize are divided into large ranges.Finally, the merging of adjacent intervals for small ranges is established. The maximum merging length is orc_once_max_read_bytes, and the maximum merging distance allowed between intervals is
orc_max_merge_distance_bytes. The merged range is established through a cache of the merged range to a reader in memory, and a corresponding inputstream is builded for the lower layer orc reader to read. Large ranges are read directly through the underlying file reader. The current implementation is able to read arbitrarily in the merged range.Future Work
Currently, implementations like
OrcMergeRangeFileReaderandRangeCacheFileReadermust finally use memcpy from the cache to the result slice due to the limitations of the FileReader interface. But in theory, it is possible not to do memcpy, but to directly point to the cache location to represent the slice. This can be reconstructed and optimized in the future.Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)