Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Impl rest catalog + table updates & requirements #146

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

jwtryg
Copy link

@jwtryg jwtryg commented Sep 9, 2024

Hi @zeroshade

I think it's really cool that you are working on a golang-iceberg implementation, and I would like to contribute if I can.
I have tried to finish the rest catalog implementation, and then I have added table updates and requirements, using a new MetadataBuilder struct that can simplify updating metadata. Like the existing implementation, I have tried to stay close to the pyIceberg implementation.

I thought this could be a good starting point for also implementing transactions and table updates. I would love to get your input and change/adjust the implementation accordingly.

Copy link
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! I like the idea in general. I've done a first pass to list a bunch of requested changes.

catalog/catalog.go Outdated Show resolved Hide resolved
// DropTable tells the catalog to drop the table entirely
DropTable(ctx context.Context, identifier table.Identifier) error
DropTable(ctx context.Context, identifier table.Identifier, purge bool) error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should add explanation of what the purge argument does. is that something that was recently added to the REST spec?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete:
tags:
- Catalog API
summary: Drop a table from the catalog
operationId: dropTable
description: Remove a table from the catalog
parameters:
- name: purgeRequested
in: query
required: false
description: Whether the user requested to purge the underlying table's data and metadata
schema:
type: boolean
default: false

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partitions.go Outdated Show resolved Hide resolved
table/metadata.go Outdated Show resolved Hide resolved

func (b *MetadataBuilder) AddPartitionSpec(spec *iceberg.PartitionSpec, initial bool) (*MetadataBuilder, error) {
for _, s := range b.specs {
if s.ID() == spec.ID() && !initial {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused about the semantics for inital = true. Why do we allow adding a partition spec with an existing id if initial is true? but not false?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize, the support for the initial flag was not fully implemented. I have updated it.

The semantics are that if this initial flag is specified, this partition spec should be treated as the initial spec, and any existing specs are ignored.

type MetadataV1 struct {
Schema iceberg.Schema `json:"schema"`
type metadataV1 struct {
Schema *iceberg.Schema `json:"schema"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't this change mean we'll have to perform nil checks everywhere we try to use the schema now?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the metadataV1 is now unexported, so this shoulod be relatively few places. But yes, it does. However, if we assign schemas directly, vet will throw an error as we are copiyng a lock.

table/requirements.go Outdated Show resolved Hide resolved
table/requirements.go Outdated Show resolved Hide resolved
table/updates.go Show resolved Hide resolved
table/updates.go Outdated Show resolved Hide resolved
Copy link
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next round of comments

Comment on lines +87 to +90
type Identifier struct {
Namespace []string `json:"namespace"`
Name string `json:"name"`
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we're going to export this type, we should probably name it RestIdentifier or something equivalent to separate it from other catalog identifier types.

Comment on lines -549 to +643
Identifiers []struct {
Namespace []string `json:"namespace"`
Name string `json:"name"`
} `json:"identifiers"`
Identifiers []Identifier `json:"identifiers"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the identifier type used anywhere other than here? The reason I had done it inline here was because it was only used in this one spot and i didn't want it to get confused with table.Identifier

Comment on lines +688 to +690
for k, v := range ret.Config {
config[k] = v
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why loop instead of just doing maps.Copy (which does the loop internally)

Comment on lines +714 to 716
for k, v := range ret.Config {
config[k] = v
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question, why not maps.Copy(config, ret.Config)?

Comment on lines +120 to +124
// Fields returns a clone of the partition fields in this spec.
func (ps *PartitionSpec) Fields() []PartitionField {
return slices.Clone(ps.fields)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we're okay with bumping to go1.23 we could use iter here and do slices.Values(ps.Fields) this way we don't have clone the entire slice but also maintain that we disallow users from modifying the slice.

Thus this function would be:

func (ps *PartitionSpec) Fields() iter.Seq[PartitionField] {
    return slices.Values(ps.fields)
}

and a user would be able to iterate over the fields:

for f := range spec.Fields() {
    // do something
}

Alternately, you could use slices.All if you want to preserve the index, value nature of the range

Comment on lines +596 to +603
func containsBy[S []E, E any](elems S, found func(e E) bool) bool {
for _, e := range elems {
if found(e) {
return true
}
}
return false
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace with slices.ContainsFunc


// maxBy returns the maximum value of extract(e) for all e in elems.
// If elems is empty, returns 0.
func maxBy[S []E, E any](elems S, extract func(e E) int) int {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use ~[]E for better coverage of generics and future proofing

}

func (c *commonMetadata) Ref() SnapshotRef { return c.SnapshotRefs[MainBranch] }
func (c *commonMetadata) Refs() map[string]SnapshotRef { return maps.Clone(c.SnapshotRefs) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets use maps.All like i mentioned before for slices.All/slices.Values so that we can return an iterator without having to clone the whole map.

Comment on lines +689 to +690
func (c *commonMetadata) SnapshotLogs() []SnapshotLogEntry { return slices.Clone(c.SnapshotLog) }
func (c *commonMetadata) PreviousFiles() []MetadataLogEntry { return slices.Clone(c.MetadataLog) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as before about using iterators and slices.Values or slices.All

Comment on lines +692 to +694
if c == nil || other == nil {
return c == other
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in what scenario is c nil?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants