Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix orc/parquet column matching issue #393

Closed

Conversation

taiyang-li
Copy link

@taiyang-li taiyang-li commented Mar 30, 2023

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in official stable or prestable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Changes:

  • Fix orc column match issue
  • Add options ENABLE_LOCAL_FORMATS to decide which ORC/Parquet formats to use?
  • Fix code style, add necessary macros in code, remove useless including headers.

Notice: there are some bugs in ORC/Parquet input format under util/local-engine/ about case-insensitive column matching. I replaced them with the newest ORC/Parquet input format under src/.

@kyligence-git
Copy link
Collaborator

Can one of the admins verify this patch?

@taiyang-li taiyang-li changed the title Fix orc column match issue Fix orc column matching issue Mar 30, 2023
@zzcclp
Copy link
Collaborator

zzcclp commented Mar 30, 2023

test this please with 1249

@zzcclp
Copy link
Collaborator

zzcclp commented Mar 30, 2023

test this please with 1249

@zzcclp
Copy link
Collaborator

zzcclp commented Mar 30, 2023

there are two test cases failed:

  1. GlutenComplexTypesSuite
- Gluten - types bool/byte/short/float/double/decimal/binary/map/array/struct *** FAILED ***
  org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 22.0 failed 1 times, most recent failure: Lost task 0.0 in stage 22.0 (TID 36) (gluten-ci.c.kyligence-212803.internal executor driver): java.lang.RuntimeException: ./contrib/arrow/cpp/src/arrow/result.cc:28: ValueOrDie called on an error: Invalid: tried to rename a table of 10 columns but only 12 names were provided
  1. GlutenDynamicPartitionPruningV1SuiteAEOff
avoid reordering broadcast join keys to match input hash partitioning *** FAILED ***
  org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 221.0 failed 1 times, most recent failure: Lost task 0.0 in stage 221.0 (TID 1918) (gluten-ci.c.kyligence-212803.internal executor driver): java.lang.RuntimeException: No such name in Block::erase(): 'A'

@zzcclp
Copy link
Collaborator

zzcclp commented Mar 30, 2023

I will ignore the related orc test case first, when the above issues fixed, raise another pr to test

@taiyang-li taiyang-li changed the title Fix orc column matching issue Fix orc/parquet column matching issue Apr 3, 2023
@taiyang-li
Copy link
Author

test this please

@liuneng1994
Copy link
Collaborator

test this please

@taiyang-li
Copy link
Author

test this please

@taiyang-li
Copy link
Author

test this please

@taiyang-li
Copy link
Author

duplicated with apache/incubator-gluten#1390, close now.

@taiyang-li taiyang-li closed this Apr 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants