-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Schema for a branch should return table schema #9131
Conversation
02b9447
to
8da34e8
Compare
.containsExactly( | ||
new GenericRowWithSchema(new Object[] {1}, null), | ||
new GenericRowWithSchema(new Object[] {2}, null), | ||
new GenericRowWithSchema(new Object[] {3}, null)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use SimpleRecord like the rest of the tests do instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unfortunately that doesn't work, because SimpleRecord
expects the data
field to be populated. The particular error is [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name
data cannot be resolved. Did you mean one of the following? [
id].
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/source/TestSnapshotSelection.java
Show resolved
Hide resolved
d1a7ff8
to
761fdae
Compare
@@ -171,7 +171,7 @@ public Table loadTable(Identifier ident, String version) throws NoSuchTableExcep | |||
SparkTable sparkTable = (SparkTable) table; | |||
|
|||
Preconditions.checkArgument( | |||
sparkTable.snapshotId() == null, | |||
sparkTable.snapshotId() == null && sparkTable.branch() == null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure whether we actually want to fix this as part of this PR or a separate PR, but in the Iceberg sync we briefly talked about making sure that SELECT * from ns.table.branch_x VERSION AS OF ...
shouldn't be supported and should throw an error, which is what this check is doing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've applied this and moved this to #9219
@@ -173,6 +173,10 @@ public Long snapshotId() { | |||
return snapshotId; | |||
} | |||
|
|||
public String branch() { | |||
return branch; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wasn't introduced by this commit, but branch
should be final
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is effectively final as it's only set once. However, it's not marked as final
due to the way the different constructors in SparkTable
are called
.containsExactly( | ||
new SimpleRecord(1, null), new SimpleRecord(2, null), new SimpleRecord(3, null)); | ||
|
||
// writing new records into the branch should work with the re-introduced column |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is an appropriate place for the write test. It should be a new test case because this case tests the schema that is used when reading.
In addition, the test case should test writing when the current snapshot for a branch has a different schema than the table schema. With the column added back, the schemas are the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved this to a separate test and also used a different schema
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/sql/TestSelect.java
Outdated
Show resolved
Hide resolved
When retrieving the schema for branch we should always return the table schema instead of the snapshot schema. This is because the table schema is the schema that will be used when the branch will be created. We should only return the schema of the snapshot when we have a tag.
When retrieving the schema for branch we should always return the table schema instead of the snapshot schema. This is because the table schema is the schema that will be used when the branch will be created. We should only return the schema of the snapshot when we have a tag.
When retrieving the schema for branch we should always return the table schema instead of the snapshot schema. This is because the table schema is the schema that will be used when the branch will be created. We should only return the schema of the snapshot when we have a tag.
When retrieving the schema for branch we should always return the table schema instead of the snapshot schema. This is because the table schema is the schema that will be used when the branch will be created. We should only return the schema of the snapshot when we have a tag.
Below is an example that shows the weird schema behavior when describing a table.