Skip to content

Conversation

@sungwy
Copy link
Contributor

@sungwy sungwy commented Mar 1, 2023

fixes #6978

Will backport the solution to 3.2 and 3.1 if this is approved

@github-actions github-actions bot added the spark label Mar 1, 2023
Copy link
Member

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for finding the issue. I think I've seen this before but didn't dig deep and didn't realize its related to schema evolution.

I think its root caused by : #1508. and maybe it's been around for different versions.

I do think we should implement that pr's feature for metadata tables (time travel using the right schema). It will matter for tables like files_table, where readable_metrics columncould be different based on different schemas.

That being said, I don't oppose fixing the bug now, but will like if we can raise an issue to track it.

Added my comments in the code.

@szehon-ho
Copy link
Member

Also FYI @aokolnychyi , @RussellSpitzer

@nastra nastra self-requested a review March 2, 2023 06:35
@sungwy sungwy changed the title Spark-3.3: Bug Fix for Reading Metadata tables with Snapshot Spark-3.3: Bug Fix for Reading Metadata tables with Snapshot ID Mar 2, 2023
Copy link
Member

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually looks good to me, one more comment on the test. Thanks for the changes!

@sungwy
Copy link
Contributor Author

sungwy commented Mar 2, 2023

@szehon-ho I think that should cover all of the requested changes.. Please let me know if this is good to approve! Since this is a pretty important bug fix, I'm hoping we could slot it in for the next release...
I can create ones for Spark-3.2 and Spark-3.1 once you sign off on this PR

@szehon-ho
Copy link
Member

szehon-ho commented Mar 2, 2023

Makes sense, I added it to Iceberg 1.2 milestone. I made a follow up issue to implement this feature for some tables that will be affected , like files table. #6991.

As other reviewers also commented, so will leave a chance for them to take another look.

List<Record> expectedFiles =
expectedEntries(table, FileContent.DATA, entriesTableSchema, expectedDataManifests, null);

Assert.assertEquals("actualFiles size should be 1", 2, actualFiles.size());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: One more fix here for the assert message

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appreciate all your help with these PRs @szehon-ho . Looking forward to 1.2.0 release and being able to work more with metadata tables :)

@szehon-ho szehon-ho merged commit 12bcffb into apache:master Mar 6, 2023
@szehon-ho
Copy link
Member

Merged, thanks @syun64 . We can track any other discussion in follow up , if any

krvikash pushed a commit to krvikash/iceberg that referenced this pull request Mar 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reading as of Snapshot ID fails on Metadata Tables after Iceberg Table Schema Update

2 participants