-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Impl rest catalog + table updates & requirements #146
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! I like the idea in general. I've done a first pass to list a bunch of requested changes.
// DropTable tells the catalog to drop the table entirely | ||
DropTable(ctx context.Context, identifier table.Identifier) error | ||
DropTable(ctx context.Context, identifier table.Identifier, purge bool) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should add explanation of what the purge argument does. is that something that was recently added to the REST spec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete:
tags:
- Catalog API
summary: Drop a table from the catalog
operationId: dropTable
description: Remove a table from the catalog
parameters:
- name: purgeRequested
in: query
required: false
description: Whether the user requested to purge the underlying table's data and metadata
schema:
type: boolean
default: false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took this field directly from the OpenAPI specification:
https://github.com/apache/iceberg/blob/34cd01ba2e057866cdb13db8f9919bc98e11e638/open-api/rest-catalog-open-api.yaml#L1096
|
||
func (b *MetadataBuilder) AddPartitionSpec(spec *iceberg.PartitionSpec, initial bool) (*MetadataBuilder, error) { | ||
for _, s := range b.specs { | ||
if s.ID() == spec.ID() && !initial { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused about the semantics for inital = true
. Why do we allow adding a partition spec with an existing id if initial
is true? but not false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I apologize, the support for the initial flag was not fully implemented. I have updated it.
The semantics are that if this initial
flag is specified, this partition spec should be treated as the initial spec, and any existing specs are ignored.
type MetadataV1 struct { | ||
Schema iceberg.Schema `json:"schema"` | ||
type metadataV1 struct { | ||
Schema *iceberg.Schema `json:"schema"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't this change mean we'll have to perform nil checks everywhere we try to use the schema now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the metadataV1 is now unexported, so this shoulod be relatively few places. But yes, it does. However, if we assign schemas directly, vet will throw an error as we are copiyng a lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
next round of comments
type Identifier struct { | ||
Namespace []string `json:"namespace"` | ||
Name string `json:"name"` | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we're going to export this type, we should probably name it RestIdentifier
or something equivalent to separate it from other catalog identifier types.
Identifiers []struct { | ||
Namespace []string `json:"namespace"` | ||
Name string `json:"name"` | ||
} `json:"identifiers"` | ||
Identifiers []Identifier `json:"identifiers"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the identifier type used anywhere other than here? The reason I had done it inline here was because it was only used in this one spot and i didn't want it to get confused with table.Identifier
for k, v := range ret.Config { | ||
config[k] = v | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why loop instead of just doing maps.Copy
(which does the loop internally)
for k, v := range ret.Config { | ||
config[k] = v | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question, why not maps.Copy(config, ret.Config)
?
// Fields returns a clone of the partition fields in this spec. | ||
func (ps *PartitionSpec) Fields() []PartitionField { | ||
return slices.Clone(ps.fields) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we're okay with bumping to go1.23
we could use iter
here and do slices.Values(ps.Fields)
this way we don't have clone the entire slice but also maintain that we disallow users from modifying the slice.
Thus this function would be:
func (ps *PartitionSpec) Fields() iter.Seq[PartitionField] {
return slices.Values(ps.fields)
}
and a user would be able to iterate over the fields:
for f := range spec.Fields() {
// do something
}
Alternately, you could use slices.All
if you want to preserve the index, value
nature of the range
func containsBy[S []E, E any](elems S, found func(e E) bool) bool { | ||
for _, e := range elems { | ||
if found(e) { | ||
return true | ||
} | ||
} | ||
return false | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace with slices.ContainsFunc
|
||
// maxBy returns the maximum value of extract(e) for all e in elems. | ||
// If elems is empty, returns 0. | ||
func maxBy[S []E, E any](elems S, extract func(e E) int) int { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use ~[]E
for better coverage of generics and future proofing
} | ||
|
||
func (c *commonMetadata) Ref() SnapshotRef { return c.SnapshotRefs[MainBranch] } | ||
func (c *commonMetadata) Refs() map[string]SnapshotRef { return maps.Clone(c.SnapshotRefs) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets use maps.All
like i mentioned before for slices.All
/slices.Values
so that we can return an iterator without having to clone the whole map.
func (c *commonMetadata) SnapshotLogs() []SnapshotLogEntry { return slices.Clone(c.SnapshotLog) } | ||
func (c *commonMetadata) PreviousFiles() []MetadataLogEntry { return slices.Clone(c.MetadataLog) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as before about using iterators and slices.Values
or slices.All
if c == nil || other == nil { | ||
return c == other | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in what scenario is c nil?
Hi @zeroshade
I think it's really cool that you are working on a golang-iceberg implementation, and I would like to contribute if I can.
I have tried to finish the rest catalog implementation, and then I have added table updates and requirements, using a new MetadataBuilder struct that can simplify updating metadata. Like the existing implementation, I have tried to stay close to the pyIceberg implementation.
I thought this could be a good starting point for also implementing transactions and table updates. I would love to get your input and change/adjust the implementation accordingly.