Skip to content

[#3337] improvement(hadoop-catalog): Support user impersonation for Hadoop catalog.#3352

Merged
yuqi1129 merged 20 commits intoapache:mainfrom
yuqi1129:issue_3337
May 29, 2024
Merged

[#3337] improvement(hadoop-catalog): Support user impersonation for Hadoop catalog.#3352
yuqi1129 merged 20 commits intoapache:mainfrom
yuqi1129:issue_3337

Conversation

@yuqi1129
Copy link
Contributor

What changes were proposed in this pull request?

Add user impersonation for the Hadoop catalog.

Why are the changes needed?

We need authentication for the encrypted HDFS cluster.

Fix: #3337

Does this PR introduce any user-facing change?

N/A.

How was this patch tested?

UT(TO add).

@yuqi1129 yuqi1129 marked this pull request as draft May 11, 2024 14:00
@yuqi1129 yuqi1129 marked this pull request as ready for review May 15, 2024 12:20
@yuqi1129 yuqi1129 self-assigned this May 20, 2024
@jerryshao
Copy link
Contributor

Is it ready for review?

@yuqi1129
Copy link
Contributor Author

Is it ready for review?

I'm afraid we need to add some tests using the HDFS cluster, not just a mini cluster here. If this does not matter, I think it's ready for review.

@jerryshao
Copy link
Contributor

We could have a separate PR for e2e test, using mock test here to cover the logic should be enough.

@yuqi1129
Copy link
Contributor Author

yuqi1129 commented May 21, 2024

The following point needs clarification:

  • After introducing a common module that handles Kerberos authentication, the Hive catalog will depend on hadoop3(see below), is it acceptable? I think it's a bit risky to use a different release version for the Hive catalog.
image
  • If the first option is not desirable, I believe we may not be able to share code between modules that use Kerberos authentication, can we accept duplicated code of Kerberos code logic here?

@jerryshao
What's your opinion? I'm looking forward to your kind reply, thank you.

@jerryshao
Copy link
Contributor

jerryshao commented May 21, 2024

I would choose the option 2 as the bottom line. If we have a better solution to avoid code duplication while not changing Hadoop version, that would be better.

@yuqi1129
Copy link
Contributor Author

@qqqttt123 @jerryshao
The code has been updated, please help to review it if you have time, thanks.

@yuqi1129
Copy link
Contributor Author

@jerryshao
Please help to review this pr, The Hadoop e2e test depends on this PR.

@yuqi1129 yuqi1129 closed this May 23, 2024
@yuqi1129 yuqi1129 reopened this May 23, 2024
@yuqi1129 yuqi1129 closed this May 24, 2024
@yuqi1129 yuqi1129 reopened this May 24, 2024
@yuqi1129
Copy link
Contributor Author

All resolved, please help to double-check if it's okay now.
@qqqttt123

@yuqi1129 yuqi1129 closed this May 29, 2024
@yuqi1129 yuqi1129 reopened this May 29, 2024
@roryqi
Copy link
Contributor

roryqi commented May 29, 2024

You need to notice this apache/uniffle#824

@roryqi
Copy link
Contributor

roryqi commented May 29, 2024

What configurations do you need if you support IAM? Could you configure it with kerberos at the same time?
I also have concern the support of fileset Kerberos. We can't use keytabs of different KDC in the same catalog.

@yuqi1129
Copy link
Contributor Author

yuqi1129 commented May 29, 2024

What configurations do you need if you support IAM? Could you configure it with kerberos at the same time? I also have concern the support of fileset Kerberos. We can't use keytabs of different KDC in the same catalog.

I would prefer to use another PR solution to support multiple KDCs and multiple tables in the Hadoop catalogs, I think the Hive catalog also needs to handle this issue.

I may take some time to think about it and complete it. Currently, we don't have enough time to handle it in this version.

@roryqi
Copy link
Contributor

roryqi commented May 29, 2024

What configurations do you need if you support IAM? Could you configure it with kerberos at the same time? I also have concern the support of fileset Kerberos. We can't use keytabs of different KDC in the same catalog.

I would prefer to use another PR solution to support multiple KDCs and multiple key tables in the Hadoop catalogs, I think the Hive catalog also needs to handle this issue.

I may take some time to think about it and complete it. Currently, we don't have enough time to handle it in this version.

Hive catalog doesn't need to handle this.

@yuqi1129
Copy link
Contributor Author

What configurations do you need if you support IAM? Could you configure it with kerberos at the same time? I also have concern the support of fileset Kerberos. We can't use keytabs of different KDC in the same catalog.

I would prefer to use another PR solution to support multiple KDCs and multiple key tables in the Hadoop catalogs, I think the Hive catalog also needs to handle this issue.
I may take some time to think about it and complete it. Currently, we don't have enough time to handle it in this version.

Hive catalog doesn't need to handle this.

I have added the following keys and other according changes:

public static final String AUTH_TYPE_KEY = "hadoop.authentication.type";
public static final String ENABLE_AUTH_KEY = "hadoop.authentication.enable";

.put(
AUTH_TYPE_KEY,
PropertyEntry.stringImmutablePropertyEntry(
AUTH_TYPE_KEY, "The uri of key tab for the catalog", false, null, false, false))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment isn't right.

import java.util.Map;

public class AuthenticationConfig extends Config {
public static final String AUTH_TYPE_KEY = "hadoop.authentication.type";
Copy link
Contributor

@roryqi roryqi May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should use authentication.type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the configuration of Hadoop catalog, why do we need to remove hadoop here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hadoop catalog properties don't start with hadoop. You can refer to location. hadoop is rebudant information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me check it.

import org.apache.commons.lang3.StringUtils;

public class KerberosConfig extends AuthenticationConfig {
public static final String KEY_TAB_URI_KEY = "kerberos.keytab-uri";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

authentication.kerberos.keytab-uri?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense.

@roryqi
Copy link
Contributor

roryqi commented May 29, 2024

LGTM, @jerryshao Could you check whether the config options are ok?

@yuqi1129 yuqi1129 closed this May 29, 2024
@yuqi1129 yuqi1129 reopened this May 29, 2024
@yuqi1129 yuqi1129 merged commit 783839f into apache:main May 29, 2024
@yuqi1129 yuqi1129 deleted the issue_3337 branch May 29, 2024 13:39
diqiu50 pushed a commit to diqiu50/gravitino that referenced this pull request Jun 13, 2024
… for Hadoop catalog. (apache#3352)

### What changes were proposed in this pull request?

Add user impersonation for the Hadoop catalog.

### Why are the changes needed?

We need authentication for the encrypted HDFS cluster.

Fix: apache#3337

### Does this PR introduce _any_ user-facing change?

N/A. 

### How was this patch tested?

UT(TO add).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Improvement] Authentication between the Hadoop catalog and Gravitino.

5 participants