-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: Introduce the seamless integration of AMS Catalog Service and Flink Engine #1860
Comments
This is a good idea, we are designing such a catalog management system, but we have to consider the low version of Flink (<1.18); |
@dpengpeng Thanks for your feedback. What tool do you use to submit the Flink SQL task, Flink SQL-Client, Flink SQL-Gateway, or internal submission tool? |
@YesOrNo828 We consider using flink sql client, which also supports the use of TABLE/SQL API to write Java job running tasks. |
@YesOrNo828 Do you consider external data sources such as MysqlCatalog, KafkaCatalog, etc. |
We have discussed the possibility of using AMS as the stream meta-store to register Kafka stream table , which is valuable in the context of Flink SQL, but it is not currently included in the roadmap. Also, we haven't figured out a good way to support this feature in the low version (<1.18) Flink SQL Client/Gateway. |
@baiyangtx On the Alibaba Cloud Flink VVP platform, a variety of catalogs have been supported, such as KafkaCatlaog and MysqlCatalog. They can directly use Flink SQL to query table data and metadata information. We guess whether it is also possible to use a custom Catalog to manipulate external data sources. |
Expanding support for more data source types and enabling smoother usage of multiple catalogs in Flink SQL is a fairly broad topic. Here, I will break it down and discuss it further.
If you are interested in those features, you can participate in the development together. Step 3 of expanding data source types can begin after supporting |
@dpengpeng @baiyangtx Glad to see the active discussion for enhancing the seamless integration of AMS and compute engines. I'd like to share some thoughts on this:
|
How can this conflict be avoided and resolved if different catalogs have the same database or table name? There won't be such conflict issue. The Catalog itself will solve this problem, which is also the core capability of the Catalog. |
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible. |
This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' |
Description
AMS offers a Catalog service that can handle various formats such as iceberg, mixed hive, and paimon (as per #1269). The aim is to provide a speedy method of interfacing with the Flink engine without needing to create a catalog through Flink SQL DDL or Java. Ideally, I would like to find a way to seamlessly integrate with the Flink engine, thus avoiding the need to create AMS catalogs through flink SQL or Java.
Use case/motivation
One potential use case for AMS Catalog Service's support for multiple formats is for companies looking to improve the speed and efficiency of their data processing. By seamlessly integrating with the Flink engine, AMS Catalog Service allows for faster and more efficient data processing without the need for creating catalogs through flink SQL or Java. This can save companies time and resources while improving their overall data processing capabilities.
Now creating catalogs in AMS:
To access the AMS metadata using the Flink engine, we must create Flink Catalogs and register them into Flink's CatalogManager individually.
Or through the Java language:
Expected:
I want to introduce a simple way that allows the Flink engine to access all AMS Catalogs directly.
Avoid users creating and registering AMS catalogs.
Describe the solution
1. Based on Flip-295 provide CatalogStoreFactory store AMS catalogs.
Using Configuration:
AMSCatalogStore will fetch and save AMS catalogs through the specific AMS thrift address.
Using Table API
Limitation: Flip-295 implements in version 1.18.
2. Provides a custom TableEnvironment with built-in AMS catalogs
The above two approaches are just a simplified description of the outline, the detailed design will be initiated again later.
Anyone who is interested can take part in the discussion.
Subtasks
No response
Related issues
No response
Are you willing to submit a PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: