-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[support](orc)support orc file meta cache. #54591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
TPC-H: Total hot run time: 33897 ms |
TPC-DS: Total hot run time: 184447 ms |
ClickBench: Total hot run time: 32.46 s |
| return buf.str(); | ||
| } | ||
|
|
||
| void FieldDescriptor::iceberg_sanitize(const std::vector<std::string>& read_columns) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is currently unused.
reason :
Prior to pull request #27108, the Iceberg Parquet reader parsed the field ID and file name from the Parquet file's Properties iceberg.schema.
Since iceberg.schema stores the table name, which is not in valid Avro format, the PR #27108 needs to convert it (sanitize_avro_name).
In the current master code implementation, the field ID comes from the schema.
|
run buildall |
TPC-H: Total hot run time: 33906 ms |
TPC-DS: Total hot run time: 184961 ms |
ClickBench: Total hot run time: 32.91 s |
|
run buildall |
TPC-H: Total hot run time: 33751 ms |
TPC-DS: Total hot run time: 184313 ms |
ClickBench: Total hot run time: 32.26 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
| : _file_reader; | ||
| } | ||
| if (_file_metadata) { | ||
| std::cout << "_file_metadata not null\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this
cbc1456 to
b6aea5b
Compare
|
run buildall |
TPC-H: Total hot run time: 33785 ms |
TPC-DS: Total hot run time: 184179 ms |
ClickBench: Total hot run time: 32.7 s |
|
run buildall |
TPC-H: Total hot run time: 34002 ms |
TPC-DS: Total hot run time: 185159 ms |
ClickBench: Total hot run time: 32.41 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
dataroaring
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
|
run check_coverage |
Problem Summary:
This PR includes three changes:
1. Support for file meta cache for ORC files.
2. Changed the file meta cache key from `file name + modification time`
to `file name + modification time / file size` , reduce the chance of
reading old meta.
3. Removed some unused code in the parquet meta.
4. Users can use profile to observe whether the cache hits or not.
`FileFooterHitCache`: hit cache.
`FileFooterReadCalls`: not hit cache or disable cache.
BTW : disable cache : be conf `max_external_file_meta_cache_num` <= 0
What problem does this PR solve?
Problem Summary:
This PR includes three changes:
file name + modification timetofile name + modification time / file size, reduce the chance of reading old meta.FileFooterHitCache: hit cache.FileFooterReadCalls: not hit cache or disable cache.BTW : disable cache : be conf
max_external_file_meta_cache_num<= 0Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)