Impl rest catalog + table updates & requirements #146

jwtryg · 2024-09-09T18:58:19Z

I think it's really cool that you are working on a golang-iceberg implementation, and I would like to contribute if I can.
I have tried to finish the rest catalog implementation, and then I have added table updates and requirements, using a new MetadataBuilder struct that can simplify updating metadata. Like the existing implementation, I have tried to stay close to the pyIceberg implementation.

I thought this could be a good starting point for also implementing transactions and table updates. I would love to get your input and change/adjust the implementation accordingly.

zeroshade

Thanks for this! I like the idea in general. I've done a first pass to list a bunch of requested changes.

catalog/catalog.go

zeroshade · 2024-09-09T20:13:01Z

catalog/catalog.go

 // DropTable tells the catalog to drop the table entirely
- DropTable(ctx context.Context, identifier table.Identifier) error
+ DropTable(ctx context.Context, identifier table.Identifier, purge bool) error


should add explanation of what the purge argument does. is that something that was recently added to the REST spec?

delete:
tags:
- Catalog API
summary: Drop a table from the catalog
operationId: dropTable
description: Remove a table from the catalog
parameters:
- name: purgeRequested
in: query
required: false
description: Whether the user requested to purge the underlying table's data and metadata
schema:
type: boolean
default: false

I took this field directly from the OpenAPI specification:
https://github.com/apache/iceberg/blob/34cd01ba2e057866cdb13db8f9919bc98e11e638/open-api/rest-catalog-open-api.yaml#L1096

partitions.go

table/metadata.go

zeroshade · 2024-09-09T20:21:06Z

table/metadata.go

+
+func (b *MetadataBuilder) AddPartitionSpec(spec *iceberg.PartitionSpec, initial bool) (*MetadataBuilder, error) {
+ for _, s := range b.specs {
+ if s.ID() == spec.ID() && !initial {


I'm confused about the semantics for inital = true. Why do we allow adding a partition spec with an existing id if initial is true? but not false?

I apologize, the support for the initial flag was not fully implemented. I have updated it.

The semantics are that if this initial flag is specified, this partition spec should be treated as the initial spec, and any existing specs are ignored.

zeroshade · 2024-09-09T20:30:49Z

table/metadata.go

-type MetadataV1 struct {
- Schema iceberg.Schema  `json:"schema"`
+type metadataV1 struct {
+ Schema *iceberg.Schema `json:"schema"`


doesn't this change mean we'll have to perform nil checks everywhere we try to use the schema now?

But the metadataV1 is now unexported, so this shoulod be relatively few places. But yes, it does. However, if we assign schemas directly, vet will throw an error as we are copiyng a lock.

table/requirements.go

table/updates.go

zeroshade

next round of comments

zeroshade · 2024-09-15T17:48:27Z

catalog/rest.go

+type Identifier struct {
+ Namespace []string `json:"namespace"`
+ Name string `json:"name"`
+}


if we're going to export this type, we should probably name it RestIdentifier or something equivalent to separate it from other catalog identifier types.

zeroshade · 2024-09-15T17:50:30Z

catalog/rest.go

- Identifiers []struct {
- Namespace []string `json:"namespace"`
- Name string `json:"name"`
- } `json:"identifiers"`
+ Identifiers []Identifier `json:"identifiers"`


is the identifier type used anywhere other than here? The reason I had done it inline here was because it was only used in this one spot and i didn't want it to get confused with table.Identifier

zeroshade · 2024-09-15T17:54:30Z

catalog/rest.go

+ for k, v := range ret.Config {
+ config[k] = v
+ }


why loop instead of just doing maps.Copy (which does the loop internally)

zeroshade · 2024-09-15T17:54:47Z

catalog/rest.go

+ for k, v := range ret.Config {
+ config[k] = v
 }


same question, why not maps.Copy(config, ret.Config)?

zeroshade · 2024-09-15T17:59:44Z

partitions.go

+// Fields returns a clone of the partition fields in this spec.
+func (ps *PartitionSpec) Fields() []PartitionField {
+ return slices.Clone(ps.fields)
+}
+


if we're okay with bumping to go1.23 we could use iter here and do slices.Values(ps.Fields) this way we don't have clone the entire slice but also maintain that we disallow users from modifying the slice.

Thus this function would be:

func (ps *PartitionSpec) Fields() iter.Seq[PartitionField] { return slices.Values(ps.fields) }

and a user would be able to iterate over the fields:

for f := range spec.Fields() { // do something }

Alternately, you could use slices.All if you want to preserve the index, value nature of the range

zeroshade · 2024-09-15T18:12:20Z

table/metadata.go

+func containsBy[S []E, E any](elems S, found func(e E) bool) bool {
+ for _, e := range elems {
+ if found(e) {
+ return true
+ }
+ }
+ return false
+}


replace with slices.ContainsFunc

zeroshade · 2024-09-15T18:13:26Z

table/metadata.go

+
+// maxBy returns the maximum value of extract(e) for all e in elems.
+// If elems is empty, returns 0.
+func maxBy[S []E, E any](elems S, extract func(e E) int) int {


use ~[]E for better coverage of generics and future proofing

zeroshade · 2024-09-15T18:14:26Z

table/metadata.go

 }

+func (c *commonMetadata) Ref() SnapshotRef { return c.SnapshotRefs[MainBranch] }
+func (c *commonMetadata) Refs() map[string]SnapshotRef { return maps.Clone(c.SnapshotRefs) }


lets use maps.All like i mentioned before for slices.All/slices.Values so that we can return an iterator without having to clone the whole map.

zeroshade · 2024-09-15T18:14:46Z

table/metadata.go

+func (c *commonMetadata) SnapshotLogs() []SnapshotLogEntry { return slices.Clone(c.SnapshotLog) }
+func (c *commonMetadata) PreviousFiles() []MetadataLogEntry { return slices.Clone(c.MetadataLog) }


same comment as before about using iterators and slices.Values or slices.All

zeroshade · 2024-09-15T18:15:09Z

table/metadata.go

+ if c == nil || other == nil {
+ return c == other
+ }


in what scenario is c nil?

jwtryg added 4 commits September 9, 2024 16:44

table implementations

f9febaf

catalog implementation

3b3a10f

main & go files

3416129

add test

56c8458

zeroshade suggested changes Sep 9, 2024

View reviewed changes

jwtryg added 14 commits September 11, 2024 09:14

add props back to LoadTable()

4c49f96

clone fields

bded0e6

document purge flag

ca9796b

check for snapshot == nil

1f9edd7

update logic for initial updates

c48301c

update SetSnapshotRef logic

c48e9f7

reduce duplication

adcef8f

maintain immutability

ca88a69

handle pointers

bdef610

add apache licenses

fa6c5b5

naming

795a6c9

docstrings for updates

a952f2b

spelling

48b3f2d

fix tests

5b70ef0

zeroshade reviewed Sep 15, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Impl rest catalog + table updates & requirements #146

Impl rest catalog + table updates & requirements #146

jwtryg commented Sep 9, 2024

zeroshade left a comment

zeroshade Sep 9, 2024

loicalleyne Sep 10, 2024

jwtryg Sep 11, 2024

zeroshade Sep 9, 2024

jwtryg Sep 11, 2024

zeroshade Sep 9, 2024

jwtryg Sep 11, 2024

zeroshade left a comment

zeroshade Sep 15, 2024

zeroshade Sep 15, 2024

zeroshade Sep 15, 2024

zeroshade Sep 15, 2024

zeroshade Sep 15, 2024

zeroshade Sep 15, 2024

zeroshade Sep 15, 2024

zeroshade Sep 15, 2024

zeroshade Sep 15, 2024

zeroshade Sep 15, 2024

		func (c *commonMetadata) SnapshotLogs() []SnapshotLogEntry { return slices.Clone(c.SnapshotLog) }
		func (c *commonMetadata) PreviousFiles() []MetadataLogEntry { return slices.Clone(c.MetadataLog) }

Impl rest catalog + table updates & requirements #146

Are you sure you want to change the base?

Impl rest catalog + table updates & requirements #146

Conversation

jwtryg commented Sep 9, 2024

zeroshade left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zeroshade left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment