Skip to content

Conversation

@AshinGau
Copy link
Member

@AshinGau AshinGau commented Feb 23, 2023

Proposed changes

Support parsing map&struct type in parquet&orc reader.

Remaining Problems

  1. Doris use array type to build the key and value column of a map, but doesn't fill the offsets in value column, so the offsets in value column is wasted.
  2. Parquet support reading only key or value column in map, this PR hasn't supported yet.
  3. Parquet support reading partial columns in struct, this PR hasn't supported yet.

Fix bugs of nested array

Wrong number of ancestor nulls. Change the number of ancestor nulls from ancestor_nulls++ to ancestor_nulls += loop_read

Checklist(Required)

  • Does it affect the original behavior
  • Has unit tests been added
  • Has document been added or modified
  • Does it need to update dependencies
  • Is this PR support rollback (If NO, please explain WHY)

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@AshinGau AshinGau marked this pull request as ready for review February 23, 2023 14:06
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@AshinGau
Copy link
Member Author

run buildall

@AshinGau
Copy link
Member Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@hello-stephen
Copy link
Contributor

hello-stephen commented Feb 24, 2023

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 35.24 seconds
stream load tsv: 475 seconds loaded 74807831229 Bytes, about 150 MB/s
stream load json: 38 seconds loaded 2358488459 Bytes, about 59 MB/s
stream load orc: 69 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 28 seconds loaded 861443392 Bytes, about 29 MB/s
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230225153516_clickbench_pr_104407.html

@AshinGau
Copy link
Member Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman merged commit c43e521 into apache:master Feb 26, 2023
Yulei-Yang pushed a commit to Yulei-Yang/doris that referenced this pull request Mar 5, 2023
apache#17087)

Support parsing map&struct type in parquet&orc reader.

## Remaining Problems
1. Doris use array type to build the key and value column of a `map`, but doesn't fill the offsets in value column, so the offsets in value column is wasted.
2. Parquet support reading only key or value column in `map`, this PR hasn't supported yet.
3. Parquet support reading partial columns in `struct`, this PR hasn't supported yet.
yagagagaga pushed a commit to yagagagaga/doris that referenced this pull request Mar 9, 2023
apache#17087)

Support parsing map&struct type in parquet&orc reader.

## Remaining Problems
1. Doris use array type to build the key and value column of a `map`, but doesn't fill the offsets in value column, so the offsets in value column is wasted.
2. Parquet support reading only key or value column in `map`, this PR hasn't supported yet.
3. Parquet support reading partial columns in `struct`, this PR hasn't supported yet.
@AshinGau AshinGau deleted the complex branch August 10, 2023 08:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants