Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Properly handle nested fields when computing stats / stats-schema #2572

Open
roeap opened this issue Jun 4, 2024 · 2 comments
Open

Properly handle nested fields when computing stats / stats-schema #2572

roeap opened this issue Jun 4, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@roeap
Copy link
Collaborator

roeap commented Jun 4, 2024

Bug

Right now our logic to compute stats, specifically using the `` property only considers root level fields, but does not traverse into fields. Also we need to parse fields as the field names may be escaped and contain special characters ..

https://github.com/delta-io/delta/blob/4b102d34a2ce881b2a851b4c6cfbf2ab3ab5534f/spark/src/main/scala/org/apache/spark/sql/delta/DeltaConfig.scala#L549-L561

What you expected to happen:

Properly parse field names when generating stats and stats schema

More details:

@roeap roeap added the bug Something isn't working label Jun 4, 2024
@ion-elgreco
Copy link
Collaborator

@roeap fyi, stats parsing seems also not entirely working for checkpoints
#2571

@alexwilcoxson-rel
Copy link
Contributor

alexwilcoxson-rel commented Jul 31, 2024

My team will need to address this to adopt 0.18.x.

My proposal is when looking up stats column in delta schema

for example: top.middle.bottom would find stats for

top struct([
    middle struct([
        bottom primitive
    ])
])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants