Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(tables): add basic table implementation #11

Merged
merged 6 commits into from
Oct 13, 2023

Conversation

zeroshade
Copy link
Member

@zeroshade zeroshade commented Sep 26, 2023

Basic table/metadata/snapshots implementation

CC @rdblue @nastra @Fokko @coded9 @bitsondatadev

table/metadata.go Outdated Show resolved Hide resolved
// https://iceberg.apache.org/spec/#iceberg-table-spec
type Metadata interface {
// Version indicates the version of this metadata, 1 for V1, 2 for V2, etc.
Version() int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe better to name this FormatVersion()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are already using Version() as the method for the manifest files, so I was maintaining consistency here. If we want to rename this as FormatVersion() we should do the same for manifest files and anywhere else we are exposing the format version via method, thoughts?

table/metadata.go Outdated Show resolved Hide resolved
table/metadata.go Outdated Show resolved Hide resolved
table/snapshots.go Outdated Show resolved Hide resolved
table/snapshots.go Outdated Show resolved Hide resolved
table/metadata.go Outdated Show resolved Hide resolved
table/metadata.go Outdated Show resolved Hide resolved
SnapshotByID(int64) *Snapshot
// SnapshotByName searches the list of snapshots for a snapshot with a given
// ref name. Returns nil if there's no ref with this name for a snapshot.
SnapshotByName(name string) *Snapshot
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, I think this should be plural

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, it's singular because it only returns 1 snapshot, not multiple. Just to clarify my understanding of iceberg a bit, is it possible / legal for there to be multiple snapshots with the same ref name? or will it always be exactly 1 or 0 snapshots?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be either 0 or 1 snapshot that is returned. In my head I was thinking about snapshotsById that we use in the Java impl, so having singular here is fine I think

// Metadata for an iceberg table as specified in the Iceberg spec
//
// https://iceberg.apache.org/spec/#iceberg-table-spec
type Metadata interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main thing that's currently missing is the handling of fresh IDs for table metadata, similarly to how it's handled in Java: https://github.com/apache/iceberg/blob/2e291c2b67a643ffe93e139483df55f3639cc39d/core/src/main/java/org/apache/iceberg/TableMetadata.java#L105-L149. However, it's probably ok to handle this in a follow-up

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I didn't include a NewMetadata / MetadataBuilder here, right now it only parses/unmarshals JSON metadata. I planned for a follow-up to include that. If that's okay

table/snapshots_test.go Outdated Show resolved Hide resolved
table/table.go Show resolved Hide resolved
func (t Table) Equals(other Table) bool {
return slices.Equal(t.identifier, other.identifier) &&
t.metadataLocation == other.metadataLocation &&
reflect.DeepEqual(t.metadata, other.metadata)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so comparing metadata directly might be tricky due to Schema equality. You might want to look at the Java impl on how Schema comparison is performed (basically the schema ID is not taken into account, because a schema ID can be re-assigned)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can go ahead and merge this PR, but this is something to keep in mind for a follow-up

Copy link
Contributor

@nastra nastra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @zeroshade

@nastra nastra merged commit aa389ff into apache:main Oct 13, 2023
8 checks passed
@zeroshade zeroshade deleted the feat/tables branch October 18, 2023 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants