Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

velox parquet schema is incorrect #9242

Closed
qqibrow opened this issue Mar 25, 2024 · 4 comments
Closed

velox parquet schema is incorrect #9242

qqibrow opened this issue Mar 25, 2024 · 4 comments
Labels
bug Something isn't working parquet triage Newly created issue that needs attention.

Comments

@qqibrow
Copy link
Collaborator

qqibrow commented Mar 25, 2024

Bug description

for the file attached:
actual schema: ROW<test:ARRAY>
expected schema from presto: test:array(row(test varchar))

System information

Velox System Info v0.0.2
Commit: 1e186e548833750cdee4b95d829711ddad78aba1
CMake Version: 3.16.3
System: Linux-5.4.0-1063-aws
Arch: x86_64
C++ Compiler: /usr/bin/c++
C++ Compiler Version: 9.4.0
C Compiler: /usr/bin/cc
C Compiler Version: 9.4.0
CMake Prefix Path: /usr/local;/usr;/;/usr;/usr/local;/usr/X11R6;/usr/pkg;/opt

Relevant logs

https://www.dropbox.com/scl/fi/0y1i7hzvddg128bavq2l8/test2430573504112849669parquet?rlkey=r3czlx8c4cl86u6ig9a3jh8gu&dl=0
@qqibrow qqibrow added bug Something isn't working triage Newly created issue that needs attention. labels Mar 25, 2024
@qqibrow
Copy link
Collaborator Author

qqibrow commented Mar 25, 2024

for the file attached: https://www.dropbox.com/scl/fi/0y1i7hzvddg128bavq2l8/test2430573504112849669parquet?rlkey=r3czlx8c4cl86u6ig9a3jh8gu&dl=0

lniu@lniu-FXGFKFV Downloads % parquet head test2430573504112849669parquet
{"test": []}
{"test": [{"test": "0"}, {"test": "1"}]}
{"test": [{"test": "2"}, {"test": "3"}]}
{"test": [{"test": "4"}]}
{"test": []}
{"test": []}
{"test": [{"test": "5"}, {"test": "6"}, {"test": "7"}, {"test": "8"}]}
{"test": [{"test": "9"}]}
{"test": [{"test": "10"}]}
{"test": [{"test": "11"}]}
{
  "type" : "record",
  "name" : "hive_schema",
  "fields" : [ {
    "name" : "test",
    "type" : [ "null", {
      "type" : "array",
      "items" : {
        "type" : "record",
        "name" : "array",
        "fields" : [ {
          "name" : "test",
          "type" : [ "null", "string" ],
          "default" : null
        } ]
      }
    } ],
    "default" : null
  } ]
}

qqibrow pushed a commit to qqibrow/velox that referenced this issue Mar 28, 2024
qqibrow pushed a commit to qqibrow/velox that referenced this issue Mar 28, 2024
@8dukongjian
Copy link
Contributor

Facing complex situations, such as array(array(array(array(integer)))), the parsed schema is still not correct. testSingleLevelSchemaNestedArrays can reproduce the error.

actual schema: ROW<test:ARRAY>
expected schema from presto: array(array(array(array(integer))))

@yingsu00
Copy link
Collaborator

Fix #9533

@yingsu00
Copy link
Collaborator

Closing it for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working parquet triage Newly created issue that needs attention.
Projects
None yet
Development

No branches or pull requests

3 participants