Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Enrich the storage type registry when creating a catalog. #2007

Closed
1 of 5 tasks
Tracked by #1847
baiyangtx opened this issue Sep 20, 2023 · 9 comments
Closed
1 of 5 tasks
Tracked by #1847
Labels
type:feature Feature Requests

Comments

@baiyangtx
Copy link
Contributor

baiyangtx commented Sep 20, 2023

Description

Currently, the default storage system registered in the catalog is based on Hadoop, although S3 protocol is now supported, Hadoop-related information still needs to be filled in.

Now, we need to enrich the registration of storage and authentication methods, such as S3-type storage systems and authentication systems based on Access Key/Secret Key.

Use case/motivation

To register a catalog:

  • Select the S3/Minio storage system, fill in the bucket name, endpoints, select the AK/SK authentication method, and fill in the AK/SK.
  • Selete the hadoop storage system, fill in the core/hdfs site. Select the kerboeros authentication method, and fill in the keytab info.
  • Create a catalog with hive metastore, using S3/Minio storage system, using Kerberos authentication mehod.

Describe the solution

No response

Subtasks

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@baiyangtx baiyangtx added the type:feature Feature Requests label Sep 20, 2023
@baiyangtx baiyangtx mentioned this issue Sep 20, 2023
70 tasks
@XBaith
Copy link
Contributor

XBaith commented Sep 21, 2023

I have a suggestion regarding the configuration of S3: It's probably best not to restrict the configuration to use Access Key (AK) and Secret Key (SK) on the web page. Some users may prefer to control permissions using a role ARN.

@wangtaohz
Copy link
Contributor

wangtaohz commented Sep 26, 2023

I have summarized the current requirements that need to be met in order to make adjustments to the product.

Part1
The configurations of Catalog can be divided into several sections.

1.Metastore

  • Internal (AMS)
  • External
    • Hive Metastore:
    • Hadoop:
    • Glue:
    • Custom:

2.Storage(support select Type)

  • Hadoop: upload [core-site][,hdfs-site][,hive-site] (S3A also supports in this way)
  • S3: set [Endpoint][,Region]

3.Authentication

  • SIMPLE: set Hadoop Username
  • KERBEROS: set Principal, upload Keytab Krb5
  • AK/SK: set Access Key, Secret Key
  • CUSTOM:

4.Properties

  • dynamically related to Metastore
  • hidden properties

5.Table Default Properties

  • ignore table.

Part2
All currently supported configurations combinations are as follows:

Metastore Table Format Storage Authentication
Internal (AMS) Iceberg, Mixed Iceberg Hadoop SIMPLE,KERBEROS
Internal (AMS) Iceberg S3 AK/SK,Custom
Hive Metastore Iceberg, Mixed Iceberg, Mixed Hive Hadoop SIMPLE,KERBEROS
Hadoop Iceberg, Mixed Iceberg Hadoop SIMPLE,KERBEROS
Glue Iceberg, Mixed Iceberg S3 AK/SK,CUSTOM
Custom Iceberg, Mixed Iceberg Hadoop SIMPLE,KERBEROS
Custom Iceberg S3 AK/SK,CUSTOM

@wangtaohz
Copy link
Contributor

wangtaohz commented Sep 27, 2023

Page design
add Storage Type (Hadoop/S3)
image

image

@wangtaohz
Copy link
Contributor

I have a suggestion regarding the configuration of S3: It's probably best not to restrict the configuration to use Access Key (AK) and Secret Key (SK) on the web page. Some users may prefer to control permissions using a role ARN.

I think supporting ARN is very meaningful! To support it, I'm thinking of designing the Catalog page like this, but I'm a bit worried about that the configuration of Endpoints/Region may be redundant with ARNs.

Do you have any suggestions based on your usage experience? @XBaith

image

@wangtaohz
Copy link
Contributor

The End points for S3 should be Endpoint.

@kmozaid
Copy link

kmozaid commented Oct 10, 2023

@wangtaohz @XBaith Amoro and it's optimizer can use IAM Role for Service Account (IRSA) for authenticating with AWS Services like S3 and Glue when they are deployed in Amazon EKS.
To use GlueCatalog with IRSA, Following initial setup is required -

  1. Create an IAM role, create an IAM Policy with required S3 and Glue permissions and attach IAM policy to IAM role (This can done via terraform or AWS console or any other way).
  2. Create a k8s service account in a EKS' namespace. Annotate this service account with IAM Role ARN. Annotation is - eks.amazonaws.com/role-arn: <IAM Role ARN>
  3. Configure Amoro and Optimizer containers to use this service account.

Then, When configuring iceberg catalog on Amoro UI with IRSA authentication type (ARN), set following catalog property -
client.credentials-provider: software.amazon.awssdk.auth.credentials.WebIdentityTokenFileCredentialsProvider. (There is no need to provide IAM Role ARN on UI)

@wangtaohz
Copy link
Contributor

@kmozaid Thank you for sharing this valuable information about working with Amazon EKS.
If I understand correctly, there is no demand for configuring ARN on the Web UI in this scenario. And also, I would like to provide an Authentication Type None for S3, so that users can customize their authentication method in properties.

@zhoujinsong
Copy link
Contributor

@kmozaid Thank you for sharing this valuable information about working with Amazon EKS. If I understand correctly, there is no demand for configuring ARN on the Web UI in this scenario. And also, I would like to provide an Authentication Type None for S3, so that users can customize their authentication method in properties.

Is Custom more appropriate than None?

@wangtaohz
Copy link
Contributor

Is Custom more appropriate than None?

Yes, it makes sense to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:feature Feature Requests
Projects
None yet
Development

No branches or pull requests

5 participants