Skip to content

Commit df681aa

Browse files
snazyadutraXN137poojanilangekareric-maynard
authored
Dremio merge 2025 08 20 13 44 (apache#105)
* Refactor Authenticator and PolarisPrincipal (apache#2307) The main goal of this change is to facilitate future integration of federated principals: - `AuthenticatedPolarisPrincipal` becomes an interface `PolarisPrincipal`, as the original class leaks implementation details (references to `PrincipalEntity` and thus to the storage layer). The new interface does not reference the storage layer. This is one step further towards easy pluggability of authentication in Polaris. - The `Authenticator.authenticate()` method does not return an `Optional` anymore, as this was ambiguous (returning `Optional.empty()` vs throwing `NotAuthorizedException`). - Also the `Authenticator` interface is not generic anymore. This was an artifact of times when there were two kinds of `Authenticators` in Polaris (one for internal auth, the other for external) and is not necessary anymore. * Add PolarisDiagnostics field to TransactionalMetaStoreManagerImpl (apache#2361) the ultimate goal is removing the PolarisCallContext parameter from every PolarisMetaStoreManager interface method, so we make steps towards reducing its usage first. * Support HMS Federation (apache#2355) Supports federating to HiveCatalog using the Iceberg REST library. All hive dependencies are added in an independent module, i.e., `polaris-extensions-federation-hive` and can be removed/converted to a compile time flag if necessary. Similar to HadoopCatalog, HMS federation support is currently restricted to `IMPLICIT` auth. The underlying authentication can be any form that Hive supports, however Polaris will not store and manage any of these credentials. Again, similar to HadoopCatalog, this version supports federating to a single Hive instance. This PR relies on Polaris discovering the `hive-site.xml` file to get the configuration options from the classpath (including `HADOOP_CONF_DIR`). The spec change has been discussed in the [dev mailing list](https://lists.apache.org/thread/5qktjv6rzd8pghcl6f4oohko798o2p2g), followed by a discussion in the Polaris community sync on Aug 7, 2025. Testing: Modified the regression test to locally test that Hive federation works as expected. The next step would be to add a regression test once the change is baked into the Polaris docker image (for CI builds). This PR primarily builds on apache#1305 and apache#1466. Thank you @dennishuo and @eric-maynard for helping out with this! * Add PolarisDiagnostics field to TransactionWorkspaceMetaStoreManager (apache#2359) the ultimate goal is removing the `PolarisCallContext` parameter from every `PolarisMetaStoreManager` interface method, so we make steps towards reducing its usage first. * Rat-ignore user-settings for hugo-run-in-docker (apache#2376) * Modularize generic table federation (apache#2379) In apache#2369 Iceberg table federation was refactored around the new `IcebergRESTExternalCatalogFactory` type based on discussion in the community sync. This has unblocked the ability to federate to more non-Iceberg catalogs, such as in apache#2355. This PR refactors generic table federation to go through the same mechanism. After this, we can go through and implement generic table federation for the existing `IcebergRESTExternalCatalogFactory` implementations. * Update community meeting dates (apache#2382) * Reduce getRealmConfig calls (apache#2337) Classes with a `CallContext` field should call `getRealmConfig` once and store it as a field as well. The idea is that long term we would want to stop relying on the `CallContext` itself but instead inject its individual items. Thus we also add `RealmConfig` to `TestServices`. * Python client: make S3 role-ARN optional and add missing endpoint-internal property (apache#2339) * fix(deps): update dependency io.prometheus:prometheus-metrics-exporter-servlet-jakarta to v1.4.1 (apache#2377) * chore(deps): bump s3mock from 3.11.0 to 4.7.0 (apache#2375) Updates S3Mock testcontainer dependency from 3.11.0 to 4.7.0 and refactors usage into a centralized wrapper class in runtime/test-common. Changes Upgraded S3Mock testcontainer to 4.7.0 Created S3Mock wrapper class for consistent configuration Consolidated S3 config properties generation Updated integration tests to use new wrapper No functional changes to test behavior. * Nit: extract getResolvedCatalogEntity method in IcebergCatalogHandler (apache#2387) * Nit: remove transitive dependencies from runtime/server/build.gradle.kts (apache#2385) * Nit: add methods isExternal and isStaticFacade to CatalogEntity (apache#2386) * Minor refactor of integration test classes (apache#2384) This change promotes `CatalogConfig` and `RestCatalogConfig` to top-level, public annotations and introduces a few "hooks" in `PolarisRestCatalogIntegrationBase` that can be overridden by subclasses. This change is a preparatory work for apache#2280 (S3 remote signing). * Remove BaseMetaStoreManager.serializeProperties (apache#2374) similar to 7af85be we should prefer the existing helper methods on the entity instead * fix: minor corrections of documentation (apache#2397) - fixed dead link to catalog definition in Iceberg docs on Entities page - removed single quotes from credential parameter in the cmdline example for connecting a local spark-sql: env variables need to be resolved in cmdline, they will not be resolved by spark-sql itself. * chore(deps): update azure/setup-helm action to v4.3.1 (apache#2402) * Add 1.0.1 release to the website (apache#2400) * Add PolarisDiagnostics field to AbstractTransactionalPersistence (apache#2372) The ultimate goal is removing the `PolarisCallContext` parameter from every `PolarisMetaStoreManager` interface method, so we make steps towards reducing its usage first. * NoSQL: javadoc nit * Last merged commit fcd4777 --------- Co-authored-by: Alexandre Dutra <adutra@apache.org> Co-authored-by: Christopher Lambert <xn137@gmx.de> Co-authored-by: Pooja Nilangekar <poojan@umd.edu> Co-authored-by: Eric Maynard <eric.maynard+oss@snowflake.com> Co-authored-by: JB Onofré <jbonofre@apache.org> Co-authored-by: Mend Renovate <bot@renovateapp.com> Co-authored-by: Artur Rakhmatulin <from_github@binaryc.at> Co-authored-by: olsoloviov <40199597+olsoloviov@users.noreply.github.com>
1 parent da546bd commit df681aa

File tree

123 files changed

+1694
-1265
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

123 files changed

+1694
-1265
lines changed

.github/workflows/helm.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ jobs:
5050
distribution: 'temurin'
5151

5252
- name: Set up Helm
53-
uses: azure/setup-helm@b9e51907a09c216f16ebe8536097933489208112 # v4.3.0
53+
uses: azure/setup-helm@1a275c3b69536ee54be43f2070a358922e12c8d4 # v4.3.1
5454
with:
5555
version: 'v3.16.0'
5656

build.gradle.kts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,7 @@ tasks.named<RatTask>("rat").configure {
109109

110110
// Web site
111111
excludes.add("**/go.sum")
112+
excludes.add("site/.user-settings")
112113
excludes.add("site/node_modules/**")
113114
excludes.add("site/layouts/robots.txt")
114115
// Ignore generated stuff, when the Hugo is run w/o Docker

client/python/cli/command/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,7 @@ def options_get(key, f=lambda x: x):
6666
iceberg_remote_catalog_name=options_get(Arguments.ICEBERG_REMOTE_CATALOG_NAME),
6767
remove_properties=[] if remove_properties is None else remove_properties,
6868
endpoint=options_get(Arguments.ENDPOINT),
69+
endpoint_internal=options_get(Arguments.ENDPOINT_INTERNAL),
6970
sts_endpoint=options_get(Arguments.STS_ENDPOINT),
7071
path_style_access=options_get(Arguments.PATH_STYLE_ACCESS),
7172
catalog_connection_type=options_get(Arguments.CATALOG_CONNECTION_TYPE),

client/python/cli/command/catalogs.py

Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@ class CatalogsCommand(Command):
6565
hadoop_warehouse: str
6666
iceberg_remote_catalog_name: str
6767
endpoint: str
68+
endpoint_internal: str
6869
sts_endpoint: str
6970
path_style_access: bool
7071
catalog_connection_type: str
@@ -121,18 +122,17 @@ def validate(self):
121122
f" {Argument.to_flag_name(Arguments.CATALOG_SERVICE_IDENTITY_IAM_ARN)}")
122123

123124
if self.storage_type == StorageType.S3.value:
124-
if not self.role_arn:
125-
raise Exception(
126-
f"Missing required argument for storage type 's3':"
127-
f" {Argument.to_flag_name(Arguments.ROLE_ARN)}"
128-
)
129125
if self._has_azure_storage_info() or self._has_gcs_storage_info():
130126
raise Exception(
131-
f"Storage type 's3' supports the storage credentials"
127+
f"Storage type 's3' supports the options"
132128
f" {Argument.to_flag_name(Arguments.ROLE_ARN)},"
133129
f" {Argument.to_flag_name(Arguments.REGION)},"
134-
f" {Argument.to_flag_name(Arguments.EXTERNAL_ID)}, and"
135-
f" {Argument.to_flag_name(Arguments.USER_ARN)}"
130+
f" {Argument.to_flag_name(Arguments.EXTERNAL_ID)},"
131+
f" {Argument.to_flag_name(Arguments.USER_ARN)},"
132+
f" {Argument.to_flag_name(Arguments.ENDPOINT)},"
133+
f" {Argument.to_flag_name(Arguments.ENDPOINT_INTERNAL)},"
134+
f" {Argument.to_flag_name(Arguments.STS_ENDPOINT)}, and"
135+
f" {Argument.to_flag_name(Arguments.PATH_STYLE_ACCESS)}"
136136
)
137137
elif self.storage_type == StorageType.AZURE.value:
138138
if not self.tenant_id:
@@ -142,7 +142,7 @@ def validate(self):
142142
)
143143
if self._has_aws_storage_info() or self._has_gcs_storage_info():
144144
raise Exception(
145-
"Storage type 'azure' supports the storage credentials"
145+
"Storage type 'azure' supports the options"
146146
f" {Argument.to_flag_name(Arguments.TENANT_ID)},"
147147
f" {Argument.to_flag_name(Arguments.MULTI_TENANT_APP_NAME)}, and"
148148
f" {Argument.to_flag_name(Arguments.CONSENT_URL)}"
@@ -160,11 +160,11 @@ def validate(self):
160160
or self._has_gcs_storage_info()
161161
):
162162
raise Exception(
163-
"Storage type 'file' does not support any storage credentials"
163+
"Storage type 'file' does not support any additional options"
164164
)
165165

166166
def _has_aws_storage_info(self):
167-
return self.role_arn or self.external_id or self.user_arn or self.region or self.endpoint or self.sts_endpoint or self.path_style_access
167+
return self.role_arn or self.external_id or self.user_arn or self.region or self.endpoint or self.endpoint_internal or self.sts_endpoint or self.path_style_access
168168

169169
def _has_azure_storage_info(self):
170170
return self.tenant_id or self.multi_tenant_app_name or self.consent_url
@@ -183,6 +183,7 @@ def _build_storage_config_info(self):
183183
user_arn=self.user_arn,
184184
region=self.region,
185185
endpoint=self.endpoint,
186+
endpoint_internal=self.endpoint_internal,
186187
sts_endpoint=self.sts_endpoint,
187188
path_style_access=self.path_style_access,
188189
)

client/python/cli/constants.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -168,6 +168,7 @@ class Arguments:
168168
HADOOP_WAREHOUSE = "hadoop_warehouse"
169169
ICEBERG_REMOTE_CATALOG_NAME = "iceberg_remote_catalog_name"
170170
ENDPOINT = "endpoint"
171+
ENDPOINT_INTERNAL = "endpoint_internal"
171172
STS_ENDPOINT = "sts_endpoint"
172173
PATH_STYLE_ACCESS = "path_style_access"
173174
CATALOG_CONNECTION_TYPE = "catalog_connection_type"
@@ -223,11 +224,12 @@ class Create:
223224
"Multiple locations can be provided by specifying this option more than once."
224225
)
225226

226-
ROLE_ARN = "(Required for S3) A role ARN to use when connecting to S3"
227+
ROLE_ARN = "(Only for S3) A role ARN to use when connecting to S3"
227228
EXTERNAL_ID = "(Only for S3) The external ID to use when connecting to S3"
228229
REGION = "(Only for S3) The region to use when connecting to S3"
229230
USER_ARN = "(Only for S3) A user ARN to use when connecting to S3"
230231
ENDPOINT = "(Only for S3) The S3 endpoint to use when connecting to S3"
232+
ENDPOINT_INTERNAL = "(Only for S3) The S3 endpoint used by Polaris to use when connecting to S3, if different from the one that clients use"
231233
STS_ENDPOINT = (
232234
"(Only for S3) The STS endpoint to use when connecting to STS"
233235
)

client/python/cli/options/option_tree.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,7 @@ def get_tree() -> List[Option]:
117117
choices=[st.value for st in StorageType]),
118118
Argument(Arguments.DEFAULT_BASE_LOCATION, str, Hints.Catalogs.Create.DEFAULT_BASE_LOCATION),
119119
Argument(Arguments.ENDPOINT, str, Hints.Catalogs.Create.ENDPOINT),
120+
Argument(Arguments.ENDPOINT_INTERNAL, str, Hints.Catalogs.Create.ENDPOINT_INTERNAL),
120121
Argument(Arguments.STS_ENDPOINT, str, Hints.Catalogs.Create.STS_ENDPOINT),
121122
Argument(Arguments.PATH_STYLE_ACCESS, bool, Hints.Catalogs.Create.PATH_STYLE_ACCESS),
122123
Argument(Arguments.ALLOWED_LOCATION, str, Hints.Catalogs.Create.ALLOWED_LOCATION,

extensions/federation/hadoop/src/main/java/org/apache/polaris/extensions/federation/hadoop/HadoopFederatedCatalogFactory.java

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
import org.apache.iceberg.catalog.Catalog;
2525
import org.apache.iceberg.hadoop.HadoopCatalog;
2626
import org.apache.polaris.core.catalog.ExternalCatalogFactory;
27+
import org.apache.polaris.core.catalog.GenericTableCatalog;
2728
import org.apache.polaris.core.connection.AuthenticationParametersDpo;
2829
import org.apache.polaris.core.connection.AuthenticationType;
2930
import org.apache.polaris.core.connection.ConnectionConfigInfoDpo;
@@ -58,4 +59,12 @@ public Catalog createCatalog(
5859
warehouse, connectionConfigInfoDpo.asIcebergCatalogProperties(userSecretsManager));
5960
return hadoopCatalog;
6061
}
62+
63+
@Override
64+
public GenericTableCatalog createGenericCatalog(
65+
ConnectionConfigInfoDpo connectionConfig, UserSecretsManager userSecretsManager) {
66+
// TODO implement
67+
throw new UnsupportedOperationException(
68+
"Generic table federation to this catalog is not supported.");
69+
}
6170
}
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
<!--
2+
Licensed to the Apache Software Foundation (ASF) under one
3+
or more contributor license agreements. See the NOTICE file
4+
distributed with this work for additional information
5+
regarding copyright ownership. The ASF licenses this file
6+
to you under the Apache License, Version 2.0 (the
7+
"License"); you may not use this file except in compliance
8+
with the License. You may obtain a copy of the License at
9+
10+
http://www.apache.org/licenses/LICENSE-2.0
11+
12+
Unless required by applicable law or agreed to in writing,
13+
software distributed under the License is distributed on an
14+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
KIND, either express or implied. See the License for the
16+
specific language governing permissions and limitations
17+
under the License.
18+
-->
19+
### Using the `HiveFederatedCatalogFactory`
20+
21+
This `HiveFederatedCatalogFactory` module is an independent compilation unit and will be built into the Polaris binary only when the following flag is set in the gradle.properties file:
22+
```
23+
NonRESTCatalogs=HIVE,<alternates>
24+
```
25+
26+
The other option is to pass it as an argument to the gradle JVM as follows:
27+
```
28+
./gradlew build -DNonRESTCatalogs=HIVE
29+
```
30+
31+
Without this flag, the Hive factory won't be compiled into Polaris and therefore Polaris will not load the class at runtime, throwing an unsupported exception for federated catalog calls.
Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
/*
2+
* Licensed to the Apache Software Foundation (ASF) under one
3+
* or more contributor license agreements. See the NOTICE file
4+
* distributed with this work for additional information
5+
* regarding copyright ownership. The ASF licenses this file
6+
* to you under the Apache License, Version 2.0 (the
7+
* "License"); you may not use this file except in compliance
8+
* with the License. You may obtain a copy of the License at
9+
*
10+
* http://www.apache.org/licenses/LICENSE-2.0
11+
*
12+
* Unless required by applicable law or agreed to in writing,
13+
* software distributed under the License is distributed on an
14+
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
* KIND, either express or implied. See the License for the
16+
* specific language governing permissions and limitations
17+
* under the License.
18+
*/
19+
20+
plugins {
21+
id("polaris-client")
22+
alias(libs.plugins.jandex)
23+
}
24+
25+
dependencies {
26+
// Polaris dependencies
27+
implementation(project(":polaris-core"))
28+
29+
implementation(platform(libs.iceberg.bom))
30+
implementation("org.apache.iceberg:iceberg-api")
31+
implementation("org.apache.iceberg:iceberg-core")
32+
implementation("org.apache.iceberg:iceberg-common")
33+
// Use iceberg-hive-metastore but exclude conflicting hive dependencies
34+
implementation("org.apache.iceberg:iceberg-hive-metastore") { exclude(group = "org.apache.hive") }
35+
// Add our own Hive 4.1.0 dependencies
36+
implementation(libs.hive.metastore) {
37+
exclude("org.slf4j", "slf4j-reload4j")
38+
exclude("org.slf4j", "slf4j-log4j12")
39+
exclude("ch.qos.reload4j", "reload4j")
40+
exclude("log4j", "log4j")
41+
exclude("org.apache.zookeeper", "zookeeper")
42+
}
43+
44+
// Hadoop dependencies
45+
implementation(libs.hadoop.common) {
46+
exclude("org.slf4j", "slf4j-reload4j")
47+
exclude("org.slf4j", "slf4j-log4j12")
48+
exclude("ch.qos.reload4j", "reload4j")
49+
exclude("log4j", "log4j")
50+
exclude("org.apache.zookeeper", "zookeeper")
51+
exclude("org.apache.hadoop.thirdparty", "hadoop-shaded-protobuf_3_25")
52+
exclude("com.github.pjfanning", "jersey-json")
53+
exclude("com.sun.jersey", "jersey-core")
54+
exclude("com.sun.jersey", "jersey-server")
55+
exclude("com.sun.jersey", "jersey-servlet")
56+
exclude("io.dropwizard.metrics", "metrics-core")
57+
}
58+
59+
// CDI dependencies for runtime discovery
60+
implementation(libs.jakarta.enterprise.cdi.api)
61+
implementation(libs.smallrye.common.annotation)
62+
63+
// Logging
64+
implementation(libs.slf4j.api)
65+
}
Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
/*
2+
* Licensed to the Apache Software Foundation (ASF) under one
3+
* or more contributor license agreements. See the NOTICE file
4+
* distributed with this work for additional information
5+
* regarding copyright ownership. The ASF licenses this file
6+
* to you under the Apache License, Version 2.0 (the
7+
* "License"); you may not use this file except in compliance
8+
* with the License. You may obtain a copy of the License at
9+
*
10+
* http://www.apache.org/licenses/LICENSE-2.0
11+
*
12+
* Unless required by applicable law or agreed to in writing,
13+
* software distributed under the License is distributed on an
14+
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
* KIND, either express or implied. See the License for the
16+
* specific language governing permissions and limitations
17+
* under the License.
18+
*/
19+
package org.apache.polaris.extensions.federation.hive;
20+
21+
import io.smallrye.common.annotation.Identifier;
22+
import jakarta.enterprise.context.ApplicationScoped;
23+
import org.apache.iceberg.catalog.Catalog;
24+
import org.apache.iceberg.hive.HiveCatalog;
25+
import org.apache.polaris.core.catalog.ExternalCatalogFactory;
26+
import org.apache.polaris.core.catalog.GenericTableCatalog;
27+
import org.apache.polaris.core.connection.AuthenticationParametersDpo;
28+
import org.apache.polaris.core.connection.AuthenticationType;
29+
import org.apache.polaris.core.connection.ConnectionConfigInfoDpo;
30+
import org.apache.polaris.core.connection.ConnectionType;
31+
import org.apache.polaris.core.connection.hive.HiveConnectionConfigInfoDpo;
32+
import org.apache.polaris.core.secrets.UserSecretsManager;
33+
import org.slf4j.Logger;
34+
import org.slf4j.LoggerFactory;
35+
36+
/** Factory class for creating a Hive catalog handle based on connection configuration. */
37+
@ApplicationScoped
38+
@Identifier(ConnectionType.HIVE_FACTORY_IDENTIFIER)
39+
public class HiveFederatedCatalogFactory implements ExternalCatalogFactory {
40+
private static final Logger LOGGER = LoggerFactory.getLogger(HiveFederatedCatalogFactory.class);
41+
42+
@Override
43+
public Catalog createCatalog(
44+
ConnectionConfigInfoDpo connectionConfigInfoDpo, UserSecretsManager userSecretsManager) {
45+
// Currently, Polaris supports Hive federation only via IMPLICIT authentication.
46+
// Hence, prior to initializing the configuration, ensure that the catalog uses
47+
// IMPLICIT authentication.
48+
AuthenticationParametersDpo authenticationParametersDpo =
49+
connectionConfigInfoDpo.getAuthenticationParameters();
50+
if (authenticationParametersDpo.getAuthenticationTypeCode()
51+
!= AuthenticationType.IMPLICIT.getCode()) {
52+
throw new IllegalStateException("Hive federation only supports IMPLICIT authentication.");
53+
}
54+
String warehouse = ((HiveConnectionConfigInfoDpo) connectionConfigInfoDpo).getWarehouse();
55+
// Unlike Hadoop, HiveCatalog does not require us to create a Configuration object, the iceberg
56+
// rest library find the default configuration by reading hive-site.xml in the classpath
57+
// (including HADOOP_CONF_DIR classpath).
58+
59+
// TODO: In the future, we could support multiple HiveCatalog instances based on polaris/catalog
60+
// properties.
61+
// A brief set of setps involved (and the options):
62+
// 1. Create a configuration without default properties.
63+
// `Configuration conf = new Configuration(boolean loadDefaults=false);`
64+
// 2a. Specify the hive-site.xml file path in the configuration.
65+
// `conf.addResource(new Path(hiveSiteXmlPath));`
66+
// 2b. Specify individual properties in the configuration.
67+
// `conf.set(property, value);`
68+
// Polaris could support federating to multiple LDAP based Hive metastores. Multiple
69+
// Kerberos instances are not suitable because Kerberos ties a single identity to the server.
70+
HiveCatalog hiveCatalog = new HiveCatalog();
71+
hiveCatalog.initialize(
72+
warehouse, connectionConfigInfoDpo.asIcebergCatalogProperties(userSecretsManager));
73+
return hiveCatalog;
74+
}
75+
76+
@Override
77+
public GenericTableCatalog createGenericCatalog(
78+
ConnectionConfigInfoDpo connectionConfig, UserSecretsManager userSecretsManager) {
79+
// TODO implement
80+
throw new UnsupportedOperationException(
81+
"Generic table federation to this catalog is not supported.");
82+
}
83+
}

0 commit comments

Comments
 (0)