-
Notifications
You must be signed in to change notification settings - Fork 338
Polaris Catalog Spec: Generic Table APIs #1150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Could you please also run https://github.com/apache/polaris/tree/main/spec#generate-the-bundle to update the |
| parameters: | ||
| - $ref: '../iceberg-rest-catalog-open-api.yaml#/components/parameters/prefix' | ||
| - $ref: '../iceberg-rest-catalog-open-api.yaml#/components/parameters/namespace' | ||
| - $ref: '../iceberg-rest-catalog-open-api.yaml#/components/parameters/table' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[non-blocking] Should this parameter named genericTable to differentiate from Iceberg's table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to reuse as much as possible from iceberg, but i agree it is probably better to be more specific on this, and that parameter definition is also simple. Updated to have new parameter generic-table ( this is the same naming style as iceberg parameter)
flyrain
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall. Left some minor comments.
spec/polaris-catalog-service.yaml
Outdated
| # Polaris-native API # | ||
| ###################### | ||
| ################################### | ||
| # Polaris-native notification API # |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: should we be consistent with the name, the following section(line 138) doesn't say Polaris-native
| # Polaris-native notification API # | |
| # Polaris notification API # |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was changed to make it clear the notifications don't relate to generic tables (although I suppose they could). So in that way, you could say it's good to have the names mismatch.
However I still think Polaris Notification API & Polaris Generic Table API is fine. Polaris-native Generic Table API is a bit of a mouthful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Polaris Notification API sounds good to me, updated
| 419: | ||
| $ref: '../iceberg-rest-catalog-open-api.yaml#/components/responses/AuthenticationTimeoutResponse' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove it? Polaris doesn't implement it, more context is here #791. And Iceberg considers it optional and will deprecate it soon. apache/iceberg#12376
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see, thanks for the information! Removed
| 419: | ||
| $ref: '../iceberg-rest-catalog-open-api.yaml#/components/responses/AuthenticationTimeoutResponse' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
| examples: | ||
| TableToLoadDoesNotExist: | ||
| $ref: '../iceberg-rest-catalog-open-api.yaml#/components/examples/NoSuchTableError' | ||
| 419: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
| 419: | ||
| $ref: '../iceberg-rest-catalog-open-api.yaml#/components/responses/AuthenticationTimeoutResponse' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
| type: object | ||
| additionalProperties: | ||
| type: string | ||
| register-at: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be create-at?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on our design discussion, we settled on register-at to be more specific that this is when the entity is registered with the service.
To make things more clear, i added a description for the GenericTable to describe the meaning of each field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should call the endpoint register... instead of create... then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the operation methods should be create, since it is an operation called on table create and used to create table entities at the service, and this also aligns with how iceberg and other catalog services names the method also.
For that particular filed, the original proposal is create-at, but there is complain that the name kind of indicates it is when the actual table is created, and then we switched to use this name and hope this can make it clear that it is the time for the Polaris service, which I think should be fine as far as we can also make our doc description clear about this particular field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indicates it is when the actual table is created
It is also possible that generic table is actually created in Polaris, right? Considering this use case, I'm leaning toward the name create-at, as it is a catalog property, not table one. Any users checking the real table creation time will have to check the real table's properties(e.g. metadata.json file for iceberg), instead of relying on a table property in catalog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@flyrain would you mind give more details about "It is also possible that generic table is actually created in Polaris"? do you mean the generic table entity is created in Polaris, or do are you really talking about Polaris is creating a table, what is that use case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, Spark create a brand new Delta table under one namespace in Polaris. Is it possible or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that is case we are supporting, we are going to support the replacement of Spark Catalog. I also prefer create-at, i think as far as we make the doc clear that is the server side time we should be good. Let me update to create-at
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, since this field is not really needed to achieve functionality, we can remove it for now, and added it back later if this information is needed for client side to perform monitoring capability
collado-mike
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments only
| /polaris/v1/{prefix}/namespaces/{namespace}/generic-tables: | ||
| $ref: './polaris-catalog-apis/generic-tables-api.yaml#/paths/~1polaris~1v1~1{prefix}~1namespaces~1{namespace}~1generic-tables' | ||
|
|
||
| /polaris/v1/{prefix}/namespaces/{namespace}/generic-tables/{generic-table}: | ||
| $ref: './polaris-catalog-apis/generic-tables-api.yaml#/paths/~1polaris~1v1~1{prefix}~1namespaces~1{namespace}~1generic-tables~1{generic-table}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we keep these separate from the rest of the Iceberg APIs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good question. In fact, @HonahX is going to carry out a discussion about this later about do we want to have an independent service file for the APIs under new prefix.
I think we can keep this discussion separate with this PR, and if we decided to do that, we can just have one pr to put all new definitions to a separate service file, and there is some other things like (auth definition) need to be take care of, so that we can keep the purpose of each pr more clear. Meanwhile @HonahX and I can carry this discussion independently soon. Would that work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @gh-yzou for summarizing. We can always re-organize these spec definitions if needed without affecting the implementaion. The current way allow us to share common open api definitions (servers, security scheme definition), and we can still generate models and services in separate module, like how they are generated in this PR. But let's discuss this in a separate thread to decide whether we want to separate them into completely different files.
spec/README.md
Outdated
| - `polaris-catalog-service.yaml` - Defines the specification for the Polaris Catalog API, which encompasses both the Iceberg REST Catalog API | ||
| and Polaris-native API. | ||
| - `polaris-catalog-apis` - Contains the specifications of Polaris-native API | ||
| - `polaris-catalog-apis` - Contains the specification for Polaris-native Catalog APIs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alongside Yufei's comment, I kinda think that the text "Polaris-native", in reference to table types, should go away. Polaris doesn't have table types - unlike, for example, Polaris-native principals or Polaris-native roles. Maybe "Polaris-specific" or "Polaris proprietary" APIs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Polaris specific APIs make sense to me, let me update that.
|
Thanks a lot @gh-yzou for adding new APIs for generic table! This is going to add a lot of value to Polaris users. Thanks @eric-maynard @collado-mike @HonahX for the review. |
|
Related to the discussion on the dev-ML: Concerns about changing and/or depending on the Iceberg REST spec have been raised multiple times on PRs and on the dev-mailing-list. This PR introduces a hard dependency on the Iceberg REST spec. The approach take in this PR and #808 should have discussed explicitly on the dev mailing list, as stated in the project's contributing guidelines, which explicitly say: "Change of public interface (or more generally speaking Polaris extension point) should be discussed and approved on the dev mailing list. The discussion on the dev mailing list should happen before having a “ready-for-review” Pull Request." |
|
Hi @snazy, I see your point about the dependency on the Iceberg REST spec. However, I believe the position Polaris taken is IRC compatible, in that case, i think it make sense for us to reuse the Iceberg definitions, such as namespace, or errors. cc @RussellSpitzer @flyrain |
This PR defines the open api spec for Generic Tables operations, corresponding definition defined in https://docs.google.com/document/d/1_R9jBIwoH3CV9G7gSoRJPQVEcOsROp4IEiGoeYeQE8A/edit?pli=1&tab=t.0
The set of APIs includes:
listGenericTables
createGenericTable
loadGenericTable
dropGenericTable
Another discussion is happening about whether we want to introduce a completely new service file for new prefix, we will do this as a refactor if we decided to.