Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug report] The Hive Catalog Bug In Multiple Kerberized HMS #3295

Closed
theoryxu opened this issue May 7, 2024 · 5 comments · Fixed by #3321
Closed

[Bug report] The Hive Catalog Bug In Multiple Kerberized HMS #3295

theoryxu opened this issue May 7, 2024 · 5 comments · Fixed by #3321
Assignees
Labels
bug Something isn't working

Comments

@theoryxu
Copy link
Contributor

theoryxu commented May 7, 2024

Version

main branch

Describe what's wrong

precondition 1:

The form of hive.metastore.kerberos.principal should be like USER/_HOST@EXAMPLE.COM, the instance of principal is a placeholder:_HOST

precondition 2:

When Gravitino HiveCatalogOperations initializes, the gravitinoConfig overwrites byPassConfig if possible, which means the kerberos.principal will overwrite the hive.metastore.kerberos.principal.

precondition 3:

the kerberos.principal's instance can not be a placeholder. Otherwise, the catalog can not log in under Kerberos.

conclusion:

In case an HMS has another instance (like in another host), the Gravitino's catalog doesn't work

possible solution

Increasing the priority of the gravitinoConfig. And Setting gravitino.bypass.hive.metastore.kerberos.principal in the right form.``

Error message and/or stacktrace

image

How to reproduce

step 1

Install gravitino in host1, Install HMS1 in host1, and install HMS2 in host2

step 2

create catalog1 for HMS1:

curl -L -X POST 'http://host1:8090/api/metalakes/mk1/catalogs'
-H 'Content-Type: application/json'
-H 'Accept: application/vnd.gravitino.v1+json'
--data-raw '{
"name": "catalog1",
"type": "relational",
"provider": "hive",
"properties": {
"metastore.uris": "thrift://host1:7004",
"kerberos.principal": "hadoop/host1@EXAMPLE.COM",
"kerberos.keytab-uri": "/var/krb5kdc/emr.keytab",
"gravitino.bypass.hadoop.security.authentication": "kerberos",
"gravitino.bypass.hive.metastore.sasl.enabled": true
}
}'

step 3

create catalog1 for HMS1:

curl -L -X POST 'http://host1:8090/api/metalakes/mk1/catalogs'
-H 'Content-Type: application/json'
-H 'Accept: application/vnd.gravitino.v1+json'
--data-raw '{
"name": "catalog2",
"type": "relational",
"provider": "hive",
"properties": {
"metastore.uris": "thrift://host2:7004",
"kerberos.principal": "hadoop/host1@EXAMPLE.COM",
"kerberos.keytab-uri": "/var/krb5kdc/emr.keytab",
"gravitino.bypass.hadoop.security.authentication": "kerberos",
"gravitino.bypass.hive.metastore.sasl.enabled": true
}
}'

step 4

curl -L -X GET 'http://host1:8090/api/metalakes/mk1/catalogs/catalog1/schemas'
success

curl -L -X GET 'http://host1:8090/api/metalakes/mk1/catalogs/catalog2/schemas'
failed

Additional context

No response

@theoryxu theoryxu added the bug Something isn't working label May 7, 2024
@jerryshao
Copy link
Contributor

@qqqttt123 can you please take a look.

@qqqttt123
Copy link
Contributor

I will take a look.

@qqqttt123
Copy link
Contributor

qqqttt123 commented May 9, 2024

@theoryxu Did you ever try gravitino.bypass.hive.metastore.kerberos.principal?

@theoryxu
Copy link
Contributor Author

@theoryxu Did you ever try gravitino.bypass.hive.metastore.kerberos.principal?

I did try, but It did not work. the kerberos.principal will overwrite the hive.metastore.kerberos.principal.

image image

@qqqttt123
Copy link
Contributor

qqqttt123 commented May 10, 2024

I misunderstand the meaning of METASTORE_KERBEROS_PRINCIPAL. Maybe we should remove PRINCIPAL -> METASTORE_KERBEROS_PRINCIPAL in the GRAVITINO_CONFIG_TO_HIVE.

theoryxu pushed a commit to theoryxu/gravitino that referenced this issue May 10, 2024
theoryxu pushed a commit to theoryxu/gravitino that referenced this issue May 10, 2024
theoryxu pushed a commit to theoryxu/gravitino that referenced this issue May 10, 2024
theoryxu pushed a commit to theoryxu/gravitino that referenced this issue May 10, 2024
qqqttt123 pushed a commit that referenced this issue May 10, 2024
… HMS (#3321)

### What changes were proposed in this pull request?

remove PRINCIPAL -> METASTORE_KERBEROS_PRINCIPAL in the
GRAVITINO_CONFIG_TO_HIVE

### Why are the changes needed?

The hive.metastore.kerberos.principal is not the same as
kerberos.principal functionally.

Fix: #3295 

### Does this PR introduce _any_ user-facing change?

yes, add the document

### How was this patch tested?

existing test (TestHiveCatalogOperations)

#### test in inner environment as follow:

step 1
Install gravitino in host1, Install HMS1 in host1, and install HMS2 in
host2

step 2
create catalog1 for HMS1:

curl -L -X POST 'http://host1:8090/api/metalakes/mk1/catalogs'
-H 'Content-Type: application/json'
-H 'Accept: application/vnd.gravitino.v1+json'
--data-raw '{
"name": "catalog1",
"type": "relational",
"provider": "hive",
"properties": {
"metastore.uris": "thrift://host1:7004",
"kerberos.principal":
"hadoop/[host1@EXAMPLE.COM](mailto:host1@EXAMPLE.COM)",
"kerberos.keytab-uri": "/var/krb5kdc/emr.keytab",
"gravitino.bypass.hadoop.security.authentication": "kerberos",
"gravitino.bypass.hive.metastore.kerberos.principal":
"hadoop/[_HOST@EXAMPLE.COM](mailto:_HOST@EXAMPLE.COM)",
"gravitino.bypass.hive.metastore.sasl.enabled": true
}
}'

step 3
create catalog1 for HMS1:

curl -L -X POST 'http://host1:8090/api/metalakes/mk1/catalogs'
-H 'Content-Type: application/json'
-H 'Accept: application/vnd.gravitino.v1+json'
--data-raw '{
"name": "catalog2",
"type": "relational",
"provider": "hive",
"properties": {
"metastore.uris": "thrift://host2:7004",
"kerberos.principal":
"hadoop/[host1@EXAMPLE.COM](mailto:host1@EXAMPLE.COM)",
"kerberos.keytab-uri": "/var/krb5kdc/emr.keytab",
"gravitino.bypass.hadoop.security.authentication": "kerberos",
"gravitino.bypass.hive.metastore.kerberos.principal":
"hadoop/[_HOST@EXAMPLE.COM](mailto:_HOST@EXAMPLE.COM)",
"gravitino.bypass.hive.metastore.sasl.enabled": true
}
}'

step 4
curl -L -X GET
'http://host1:8090/api/metalakes/mk1/catalogs/catalog1/schemas'
success

curl -L -X GET
'http://host1:8090/api/metalakes/mk1/catalogs/catalog2/schemas'
success

Co-authored-by: theoryxu <theoryxu@tencent.com>
github-actions bot pushed a commit that referenced this issue May 10, 2024
… HMS (#3321)

### What changes were proposed in this pull request?

remove PRINCIPAL -> METASTORE_KERBEROS_PRINCIPAL in the
GRAVITINO_CONFIG_TO_HIVE

### Why are the changes needed?

The hive.metastore.kerberos.principal is not the same as
kerberos.principal functionally.

Fix: #3295 

### Does this PR introduce _any_ user-facing change?

yes, add the document

### How was this patch tested?

existing test (TestHiveCatalogOperations)

#### test in inner environment as follow:

step 1
Install gravitino in host1, Install HMS1 in host1, and install HMS2 in
host2

step 2
create catalog1 for HMS1:

curl -L -X POST 'http://host1:8090/api/metalakes/mk1/catalogs'
-H 'Content-Type: application/json'
-H 'Accept: application/vnd.gravitino.v1+json'
--data-raw '{
"name": "catalog1",
"type": "relational",
"provider": "hive",
"properties": {
"metastore.uris": "thrift://host1:7004",
"kerberos.principal":
"hadoop/[host1@EXAMPLE.COM](mailto:host1@EXAMPLE.COM)",
"kerberos.keytab-uri": "/var/krb5kdc/emr.keytab",
"gravitino.bypass.hadoop.security.authentication": "kerberos",
"gravitino.bypass.hive.metastore.kerberos.principal":
"hadoop/[_HOST@EXAMPLE.COM](mailto:_HOST@EXAMPLE.COM)",
"gravitino.bypass.hive.metastore.sasl.enabled": true
}
}'

step 3
create catalog1 for HMS1:

curl -L -X POST 'http://host1:8090/api/metalakes/mk1/catalogs'
-H 'Content-Type: application/json'
-H 'Accept: application/vnd.gravitino.v1+json'
--data-raw '{
"name": "catalog2",
"type": "relational",
"provider": "hive",
"properties": {
"metastore.uris": "thrift://host2:7004",
"kerberos.principal":
"hadoop/[host1@EXAMPLE.COM](mailto:host1@EXAMPLE.COM)",
"kerberos.keytab-uri": "/var/krb5kdc/emr.keytab",
"gravitino.bypass.hadoop.security.authentication": "kerberos",
"gravitino.bypass.hive.metastore.kerberos.principal":
"hadoop/[_HOST@EXAMPLE.COM](mailto:_HOST@EXAMPLE.COM)",
"gravitino.bypass.hive.metastore.sasl.enabled": true
}
}'

step 4
curl -L -X GET
'http://host1:8090/api/metalakes/mk1/catalogs/catalog1/schemas'
success

curl -L -X GET
'http://host1:8090/api/metalakes/mk1/catalogs/catalog2/schemas'
success

Co-authored-by: theoryxu <theoryxu@tencent.com>
jerryshao added a commit that referenced this issue May 10, 2024
… HMS (#3324)

### What changes were proposed in this pull request?

remove PRINCIPAL -> METASTORE_KERBEROS_PRINCIPAL in the
GRAVITINO_CONFIG_TO_HIVE

### Why are the changes needed?

The hive.metastore.kerberos.principal is not the same as
kerberos.principal functionally.

Fix: #3295 

### Does this PR introduce _any_ user-facing change?

yes, add the document

### How was this patch tested?

existing test (TestHiveCatalogOperations)

#### test in inner environment as follow:

step 1
Install gravitino in host1, Install HMS1 in host1, and install HMS2 in
host2

step 2
create catalog1 for HMS1:

curl -L -X POST 'http://host1:8090/api/metalakes/mk1/catalogs'
-H 'Content-Type: application/json'
-H 'Accept: application/vnd.gravitino.v1+json'
--data-raw '{
"name": "catalog1",
"type": "relational",
"provider": "hive",
"properties": {
"metastore.uris": "thrift://host1:7004",
"kerberos.principal":
"hadoop/[host1@EXAMPLE.COM](mailto:host1@EXAMPLE.COM)",
"kerberos.keytab-uri": "/var/krb5kdc/emr.keytab",
"gravitino.bypass.hadoop.security.authentication": "kerberos",
"gravitino.bypass.hive.metastore.kerberos.principal":
"hadoop/[_HOST@EXAMPLE.COM](mailto:_HOST@EXAMPLE.COM)",
"gravitino.bypass.hive.metastore.sasl.enabled": true
}
}'

step 3
create catalog1 for HMS1:

curl -L -X POST 'http://host1:8090/api/metalakes/mk1/catalogs'
-H 'Content-Type: application/json'
-H 'Accept: application/vnd.gravitino.v1+json'
--data-raw '{
"name": "catalog2",
"type": "relational",
"provider": "hive",
"properties": {
"metastore.uris": "thrift://host2:7004",
"kerberos.principal":
"hadoop/[host1@EXAMPLE.COM](mailto:host1@EXAMPLE.COM)",
"kerberos.keytab-uri": "/var/krb5kdc/emr.keytab",
"gravitino.bypass.hadoop.security.authentication": "kerberos",
"gravitino.bypass.hive.metastore.kerberos.principal":
"hadoop/[_HOST@EXAMPLE.COM](mailto:_HOST@EXAMPLE.COM)",
"gravitino.bypass.hive.metastore.sasl.enabled": true
}
}'

step 4
curl -L -X GET
'http://host1:8090/api/metalakes/mk1/catalogs/catalog1/schemas'
success

curl -L -X GET
'http://host1:8090/api/metalakes/mk1/catalogs/catalog2/schemas'
success

Co-authored-by: theoryxu <xuxiaotheory@gmail.com>
Co-authored-by: theoryxu <theoryxu@tencent.com>
diqiu50 pushed a commit to diqiu50/gravitino that referenced this issue Jun 13, 2024
…erized HMS (apache#3321)

### What changes were proposed in this pull request?

remove PRINCIPAL -> METASTORE_KERBEROS_PRINCIPAL in the
GRAVITINO_CONFIG_TO_HIVE

### Why are the changes needed?

The hive.metastore.kerberos.principal is not the same as
kerberos.principal functionally.

Fix: apache#3295 

### Does this PR introduce _any_ user-facing change?

yes, add the document

### How was this patch tested?

existing test (TestHiveCatalogOperations)

#### test in inner environment as follow:

step 1
Install gravitino in host1, Install HMS1 in host1, and install HMS2 in
host2

step 2
create catalog1 for HMS1:

curl -L -X POST 'http://host1:8090/api/metalakes/mk1/catalogs'
-H 'Content-Type: application/json'
-H 'Accept: application/vnd.gravitino.v1+json'
--data-raw '{
"name": "catalog1",
"type": "relational",
"provider": "hive",
"properties": {
"metastore.uris": "thrift://host1:7004",
"kerberos.principal":
"hadoop/[host1@EXAMPLE.COM](mailto:host1@EXAMPLE.COM)",
"kerberos.keytab-uri": "/var/krb5kdc/emr.keytab",
"gravitino.bypass.hadoop.security.authentication": "kerberos",
"gravitino.bypass.hive.metastore.kerberos.principal":
"hadoop/[_HOST@EXAMPLE.COM](mailto:_HOST@EXAMPLE.COM)",
"gravitino.bypass.hive.metastore.sasl.enabled": true
}
}'

step 3
create catalog1 for HMS1:

curl -L -X POST 'http://host1:8090/api/metalakes/mk1/catalogs'
-H 'Content-Type: application/json'
-H 'Accept: application/vnd.gravitino.v1+json'
--data-raw '{
"name": "catalog2",
"type": "relational",
"provider": "hive",
"properties": {
"metastore.uris": "thrift://host2:7004",
"kerberos.principal":
"hadoop/[host1@EXAMPLE.COM](mailto:host1@EXAMPLE.COM)",
"kerberos.keytab-uri": "/var/krb5kdc/emr.keytab",
"gravitino.bypass.hadoop.security.authentication": "kerberos",
"gravitino.bypass.hive.metastore.kerberos.principal":
"hadoop/[_HOST@EXAMPLE.COM](mailto:_HOST@EXAMPLE.COM)",
"gravitino.bypass.hive.metastore.sasl.enabled": true
}
}'

step 4
curl -L -X GET
'http://host1:8090/api/metalakes/mk1/catalogs/catalog1/schemas'
success

curl -L -X GET
'http://host1:8090/api/metalakes/mk1/catalogs/catalog2/schemas'
success

Co-authored-by: theoryxu <theoryxu@tencent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants