-
Notifications
You must be signed in to change notification settings - Fork 392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#3292] fix(spark-connector): passing Gravitino catalog properties to spark connector #3270
Conversation
c7de58c
to
d76c394
Compare
e8d26af
to
0c87206
Compare
@jerryshao @qqqttt123 @yuqi1129 @diqiu50 @caican00 please help to review this PR when you have time, thanks |
KyuubiHiveConnectorDelegationTokenProvider needs to inject catalog properties to spark conf, but I'm not sure does Gravitino provide a self managed token provider or reuse the kyuubi token provider. @qqqttt123 |
static final String ICEBERG_CATALOG_BACKEND_HIVE = CatalogUtil.ICEBERG_CATALOG_TYPE_HIVE; | ||
|
||
static final String GRAVITINO_ICEBERG_CATALOG_BACKEND_JDBC = "jdbc"; | ||
static final String ICEBERG_CATALOG_BACKEND_JDBC = "jdbc"; | ||
|
||
private IcebergPropertiesConstants() {} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How to distinguish which constants here are for the catalog and which are for the table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they are all catalog properties
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You suggest to add CATALOG
to all property names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that, the class name IcebergCatalogPropertiesConstants
is better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to keep current file name, because it maybe support table properties constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That works; find a way to distinguish them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That works; find a way to distinguish them.
added CATALOG
to property names.
Prefer self-managed tokenProvider. Because the tokenProvider will be used for Iceberg, Hive, Paimon. |
Does self-managed tokenProvider needs catalog properties from spark conf? If not, I will keep current implement and not inject catalog properties to spark conf. |
docs/apache-hive-catalog.md
Outdated
@@ -42,6 +42,9 @@ The Hive catalog supports creating, updating, and deleting databases and tables | |||
|
|||
When you use the Gravitino with Trino. You can pass the Trino Hive connector configuration using prefix `trino.bypass.`. For example, using `trino.bypass.hive.config.resources` to pass the `hive.config.resources` to the Gravitino Hive catalog in Trino runtime. | |||
|
|||
When you use the Gravitino with Spark. You can pass the Spark Hive connector configuration using prefix `spark.bypass.`. For example, using `spark.bypass.hive.config.resources` to pass the `hive.config.resources` to the Spark Hive connector in Spark runtime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure these two configs spark.bypass.hive.config.resources
and hive.config.resources
works in Spark?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hive.config.resources
is just a config example, change it to a real config like hive.exec.dynamic.partition.mode
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you use a non-existent config as an example, this is really confusing. Please think of as a user who reads your document, does he really understand what you want to describe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it, fixed
docs/lakehouse-iceberg-catalog.md
Outdated
@@ -43,6 +43,9 @@ Any properties not defined by Gravitino with `gravitino.bypass.` prefix will pas | |||
|
|||
When you use the Gravitino with Trino. You can pass the Trino Iceberg connector configuration using prefix `trino.bypass.`. For example, using `trino.bypass.iceberg.table-statistics-enabled` to pass the `iceberg.table-statistics-enabled` to the Gravitino Iceberg catalog in Trino runtime. | |||
|
|||
When you use the Gravitino with Spark. You can pass the Spark Iceberg connector configuration using prefix `spark.bypass.`. For example, using `spark.bypass.iceberg.config.resources` to pass the `iceberg.config.resources` to the Spark Iceberg connector in Spark runtime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|---------------------------------|-----------------------------|----------------------------|---------------| | ||
| `metastore.uris` | `hive.metastore.uris` | Hive metastore uri address | 0.5.0 | | ||
|
||
Catalog properties with prefix `spark.bypass.` are passed to Spark Hive connector. For example, using `spark.bypass.config.resources` to pass the `config.resources` to the Spark Hive connector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So should we use "Gravitino catalog property name" or "Spark catalog property name", you bring in new concepts without any explanation, how do users use it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace with Gravitino catalog property names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, as I mentioned here, you add two tables in here and iceberg for a bunch of properties without explaining what are they and how user uses them, so what actually do you want to deliver to users?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add some description, please help to review again
* @param properties Gravitino catalog properties. | ||
* @return properties for the Spark connector. | ||
*/ | ||
Map<String, String> transformSparkCatalogProperties(Map<String, String> properties); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the meaning of transform
, why don't you use toXXX
to align with below interfaces if the semantics are the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
HashMap<String, String> all = new HashMap<>(options); | ||
all.put(GravitinoSparkConfig.SPARK_HIVE_METASTORE_URI, metastoreUri); | ||
Map<String, String> all = | ||
HivePropertiesConverter.getInstance().toSparkCatalogProperties(options, properties); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't you use getPropertiesConverter().toXXX
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@jerryshao @diqiu50 , all comments are addressed, please help review again |
/** | ||
* Transform properties from Gravitino catalog properties to Spark connector properties. | ||
* | ||
* <p>This interface focus on the catalog specific transform logic, the common logic are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"This interface focuses..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@FANNG1 just one minor issue, please fix it. |
fixed |
@jerryshao could you help to review again? thanks |
… spark connector (#3270) ### What changes were proposed in this pull request? 1. passing Gravitino catalog properties with `spark.bypass.` prefix to spark connector 2. refactor the catalog properties transform logic. ### Why are the changes needed? Fix: #3292 ### Does this PR introduce _any_ user-facing change? yes, add document ### How was this patch tested? 1. add UT 3. test in local envriment
…ies to spark connector (apache#3270) ### What changes were proposed in this pull request? 1. passing Gravitino catalog properties with `spark.bypass.` prefix to spark connector 2. refactor the catalog properties transform logic. ### Why are the changes needed? Fix: apache#3292 ### Does this PR introduce _any_ user-facing change? yes, add document ### How was this patch tested? 1. add UT 3. test in local envriment
What changes were proposed in this pull request?
spark.bypass.
prefix to spark connectorWhy are the changes needed?
Fix: #3292
Does this PR introduce any user-facing change?
yes, add document
How was this patch tested?