Skip to content

Conversation

@gh-yzou
Copy link
Contributor

@gh-yzou gh-yzou commented Mar 10, 2025

This PR defines the open api spec for Generic Tables operations, corresponding definition defined in https://docs.google.com/document/d/1_R9jBIwoH3CV9G7gSoRJPQVEcOsROp4IEiGoeYeQE8A/edit?pli=1&tab=t.0

The set of APIs includes:
listGenericTables
createGenericTable
loadGenericTable
dropGenericTable

Another discussion is happening about whether we want to introduce a completely new service file for new prefix, we will do this as a refactor if we decided to.

@github-project-automation github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Mar 11, 2025
@HonahX
Copy link
Contributor

HonahX commented Mar 11, 2025

Could you please also run https://github.com/apache/polaris/tree/main/spec#generate-the-bundle to update the bundled-*.yaml, so that others can see new APIs on the website

parameters:
- $ref: '../iceberg-rest-catalog-open-api.yaml#/components/parameters/prefix'
- $ref: '../iceberg-rest-catalog-open-api.yaml#/components/parameters/namespace'
- $ref: '../iceberg-rest-catalog-open-api.yaml#/components/parameters/table'
Copy link
Contributor

@HonahX HonahX Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[non-blocking] Should this parameter named genericTable to differentiate from Iceberg's table?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to reuse as much as possible from iceberg, but i agree it is probably better to be more specific on this, and that parameter definition is also simple. Updated to have new parameter generic-table ( this is the same naming style as iceberg parameter)

Copy link
Contributor

@flyrain flyrain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. Left some minor comments.

# Polaris-native API #
######################
###################################
# Polaris-native notification API #
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: should we be consistent with the name, the following section(line 138) doesn't say Polaris-native

Suggested change
# Polaris-native notification API #
# Polaris notification API #

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was changed to make it clear the notifications don't relate to generic tables (although I suppose they could). So in that way, you could say it's good to have the names mismatch.

However I still think Polaris Notification API & Polaris Generic Table API is fine. Polaris-native Generic Table API is a bit of a mouthful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Polaris Notification API sounds good to me, updated

Comment on lines 54 to 55
419:
$ref: '../iceberg-rest-catalog-open-api.yaml#/components/responses/AuthenticationTimeoutResponse'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove it? Polaris doesn't implement it, more context is here #791. And Iceberg considers it optional and will deprecate it soon. apache/iceberg#12376

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see, thanks for the information! Removed

Comment on lines 101 to 102
419:
$ref: '../iceberg-rest-catalog-open-api.yaml#/components/responses/AuthenticationTimeoutResponse'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

examples:
TableToLoadDoesNotExist:
$ref: '../iceberg-rest-catalog-open-api.yaml#/components/examples/NoSuchTableError'
419:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Comment on lines 175 to 176
419:
$ref: '../iceberg-rest-catalog-open-api.yaml#/components/responses/AuthenticationTimeoutResponse'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

type: object
additionalProperties:
type: string
register-at:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be create-at?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on our design discussion, we settled on register-at to be more specific that this is when the entity is registered with the service.

To make things more clear, i added a description for the GenericTable to describe the meaning of each field

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should call the endpoint register... instead of create... then

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the operation methods should be create, since it is an operation called on table create and used to create table entities at the service, and this also aligns with how iceberg and other catalog services names the method also.

For that particular filed, the original proposal is create-at, but there is complain that the name kind of indicates it is when the actual table is created, and then we switched to use this name and hope this can make it clear that it is the time for the Polaris service, which I think should be fine as far as we can also make our doc description clear about this particular field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indicates it is when the actual table is created

It is also possible that generic table is actually created in Polaris, right? Considering this use case, I'm leaning toward the name create-at, as it is a catalog property, not table one. Any users checking the real table creation time will have to check the real table's properties(e.g. metadata.json file for iceberg), instead of relying on a table property in catalog.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@flyrain would you mind give more details about "It is also possible that generic table is actually created in Polaris"? do you mean the generic table entity is created in Polaris, or do are you really talking about Polaris is creating a table, what is that use case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, Spark create a brand new Delta table under one namespace in Polaris. Is it possible or not?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that is case we are supporting, we are going to support the replacement of Spark Catalog. I also prefer create-at, i think as far as we make the doc clear that is the server side time we should be good. Let me update to create-at

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, since this field is not really needed to achieve functionality, we can remove it for now, and added it back later if this information is needed for client side to perform monitoring capability

@gh-yzou gh-yzou requested a review from flyrain March 12, 2025 00:51
@gh-yzou gh-yzou requested a review from sfc-gh-yzou March 12, 2025 21:19
Copy link
Contributor

@collado-mike collado-mike left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments only

Comment on lines +140 to +144
/polaris/v1/{prefix}/namespaces/{namespace}/generic-tables:
$ref: './polaris-catalog-apis/generic-tables-api.yaml#/paths/~1polaris~1v1~1{prefix}~1namespaces~1{namespace}~1generic-tables'

/polaris/v1/{prefix}/namespaces/{namespace}/generic-tables/{generic-table}:
$ref: './polaris-catalog-apis/generic-tables-api.yaml#/paths/~1polaris~1v1~1{prefix}~1namespaces~1{namespace}~1generic-tables~1{generic-table}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we keep these separate from the rest of the Iceberg APIs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good question. In fact, @HonahX is going to carry out a discussion about this later about do we want to have an independent service file for the APIs under new prefix.

I think we can keep this discussion separate with this PR, and if we decided to do that, we can just have one pr to put all new definitions to a separate service file, and there is some other things like (auth definition) need to be take care of, so that we can keep the purpose of each pr more clear. Meanwhile @HonahX and I can carry this discussion independently soon. Would that work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gh-yzou for summarizing. We can always re-organize these spec definitions if needed without affecting the implementaion. The current way allow us to share common open api definitions (servers, security scheme definition), and we can still generate models and services in separate module, like how they are generated in this PR. But let's discuss this in a separate thread to decide whether we want to separate them into completely different files.

spec/README.md Outdated
- `polaris-catalog-service.yaml` - Defines the specification for the Polaris Catalog API, which encompasses both the Iceberg REST Catalog API
and Polaris-native API.
- `polaris-catalog-apis` - Contains the specifications of Polaris-native API
- `polaris-catalog-apis` - Contains the specification for Polaris-native Catalog APIs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alongside Yufei's comment, I kinda think that the text "Polaris-native", in reference to table types, should go away. Polaris doesn't have table types - unlike, for example, Polaris-native principals or Polaris-native roles. Maybe "Polaris-specific" or "Polaris proprietary" APIs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Polaris specific APIs make sense to me, let me update that.

@flyrain flyrain merged commit 9425a6e into apache:main Mar 13, 2025
5 checks passed
@github-project-automation github-project-automation bot moved this from Ready to merge to Done in Basic Kanban Board Mar 13, 2025
@flyrain
Copy link
Contributor

flyrain commented Mar 13, 2025

Thanks a lot @gh-yzou for adding new APIs for generic table! This is going to add a lot of value to Polaris users. Thanks @eric-maynard @collado-mike @HonahX for the review.

@snazy
Copy link
Member

snazy commented Mar 14, 2025

Related to the discussion on the dev-ML: Concerns about changing and/or depending on the Iceberg REST spec have been raised multiple times on PRs and on the dev-mailing-list. This PR introduces a hard dependency on the Iceberg REST spec.

The approach take in this PR and #808 should have discussed explicitly on the dev mailing list, as stated in the project's contributing guidelines, which explicitly say: "Change of public interface (or more generally speaking Polaris extension point) should be discussed and approved on the dev mailing list. The discussion on the dev mailing list should happen before having a “ready-for-review” Pull Request."

@gh-yzou
Copy link
Contributor Author

gh-yzou commented Mar 14, 2025

Hi @snazy, I see your point about the dependency on the Iceberg REST spec. However, I believe the position Polaris taken is IRC compatible, in that case, i think it make sense for us to reuse the Iceberg definitions, such as namespace, or errors. cc @RussellSpitzer @flyrain
If you still have concern about the dependency on Iceberg REST spec definition, please feel free to bring this up as a discussion on the dev mailing list, we will be happy to accommodate any further change needed to fit into how Polaris works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants