-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[feature](hive)Support reading renamed Parquet Hive and Orc Hive tables. (#38432) #41825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
morningman
merged 1 commit into
apache:branch-3.0
from
hubgeter:pick_30_feature_read_hive_rename_table_parquet_orc
Oct 17, 2024
Merged
[feature](hive)Support reading renamed Parquet Hive and Orc Hive tables. (#38432) #41825
morningman
merged 1 commit into
apache:branch-3.0
from
hubgeter:pick_30_feature_read_hive_rename_table_parquet_orc
Oct 17, 2024
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
Contributor
Author
|
run buildall |
TPC-H: Total hot run time: 40304 ms |
TPC-DS: Total hot run time: 188362 ms |
…es. (apache#38432) Add `hive_parquet_use_column_names` and `hive_orc_use_column_names` session variables to read the table after rename column in `Hive`. These two session variables are referenced from `parquet_use_column_names` and `orc_use_column_names` of `Trino` hive connector. By default, these two session variables are true. When they are set to false, reading orc/parquet will access the columns according to the ordinal position in the Hive table definition. For example: ```mysql in Hive : hive> create table tmp (a int , b string) stored as parquet; hive> insert into table tmp values(1,"2"); hive> alter table tmp change column a new_a int; hive> insert into table tmp values(2,"4"); in Doris : mysql> set hive_parquet_use_column_names=true; Query OK, 0 rows affected (0.00 sec) mysql> select * from tmp; +-------+------+ | new_a | b | +-------+------+ | NULL | 2 | | 2 | 4 | +-------+------+ 2 rows in set (0.02 sec) mysql> set hive_parquet_use_column_names=false; Query OK, 0 rows affected (0.00 sec) mysql> select * from tmp; +-------+------+ | new_a | b | +-------+------+ | 1 | 2 | | 2 | 4 | +-------+------+ 2 rows in set (0.02 sec) ``` You can use `set parquet.column.index.access/orc.force.positional.evolution = true/false` in hive 3 to control the results of reading the table like these two session variables. However, for the rename struct inside column parquet table, the effects of hive and doris are different.
27231b3 to
4f8ea73
Compare
Contributor
Author
|
run buildall |
TPC-H: Total hot run time: 40271 ms |
TPC-DS: Total hot run time: 192148 ms |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
bp #38432
Proposed changes
Add
hive_parquet_use_column_namesandhive_orc_use_column_namessession variables to read the table after rename column inHive.These two session variables are referenced from
parquet_use_column_namesandorc_use_column_namesofTrinohive connector.By default, these two session variables are true. When they are set to false, reading orc/parquet will access the columns according to the ordinal position in the Hive table definition.
For example:
You can use
set parquet.column.index.access/orc.force.positional.evolution = true/falsein hive 3 to control the results of reading the table like these two session variables. However, for the rename struct inside column parquet table, the effects of hive and doris are different.