Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow impl for LocationProvider for to be created during runtime #1531

Merged
Merged
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
db7f3e5
new table properties
mickjermsurawong-stripe Sep 29, 2020
4074a94
expose helper method to get class from partition field
mickjermsurawong-stripe Sep 29, 2020
9412875
localized location provider
mickjermsurawong-stripe Sep 29, 2020
ad8b9c1
test dataframe writes
mickjermsurawong-stripe Sep 29, 2020
9b5f96f
formatting
mickjermsurawong-stripe Sep 30, 2020
51ced5f
format tests
mickjermsurawong-stripe Sep 30, 2020
c0abcf6
dynamically load location provider
mickjermsurawong-stripe Oct 1, 2020
bd2d8af
revert refactoring
mickjermsurawong-stripe Oct 1, 2020
a5eff74
remove extra line
mickjermsurawong-stripe Oct 1, 2020
706cfcb
rename test
mickjermsurawong-stripe Oct 1, 2020
2c51cd7
address feedback: test location provider
mickjermsurawong-stripe Oct 3, 2020
34192da
revert old test
mickjermsurawong-stripe Oct 3, 2020
77210d3
improve error message
mickjermsurawong-stripe Oct 3, 2020
73fe85d
format test
mickjermsurawong-stripe Oct 3, 2020
9b22a8c
propagate exception
mickjermsurawong-stripe Oct 3, 2020
c0c25a2
support no-arg and write more negative tests
mickjermsurawong-stripe Oct 5, 2020
6f60d5f
use assert helper for future test extension
mickjermsurawong-stripe Oct 5, 2020
c14078a
order and update python props
mickjermsurawong-stripe Oct 5, 2020
a34b5d2
prefix with new prop with WRITE
mickjermsurawong-stripe Oct 5, 2020
299ed00
update docs
mickjermsurawong-stripe Oct 5, 2020
0eee288
add doc stiring
mickjermsurawong-stripe Oct 5, 2020
5906be0
fix formatting
mickjermsurawong-stripe Oct 5, 2020
5f9be8c
simplify constructor
mickjermsurawong-stripe Oct 5, 2020
3aeb9e5
add doc to new property
mickjermsurawong-stripe Oct 5, 2020
2729b79
revert manual html updates
mickjermsurawong-stripe Oct 5, 2020
8c324c5
propagate cause
mickjermsurawong-stripe Oct 5, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 24 additions & 1 deletion core/src/main/java/org/apache/iceberg/LocationProviders.java
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@

import java.util.Map;
import org.apache.hadoop.fs.Path;
import org.apache.iceberg.common.DynConstructors;
import org.apache.iceberg.io.LocationProvider;
import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
import org.apache.iceberg.transforms.Transform;
Expand All @@ -36,7 +37,29 @@ private LocationProviders() {
}

public static LocationProvider locationsFor(String location, Map<String, String> properties) {
if (PropertyUtil.propertyAsBoolean(properties,
if (properties.containsKey(TableProperties.WRITE_LOCATION_PROVIDER_IMPL)) {
String impl = properties.get(TableProperties.WRITE_LOCATION_PROVIDER_IMPL);
DynConstructors.Ctor<LocationProvider> ctor;
try {
ctor = DynConstructors.builder(LocationProvider.class)
.impl(impl, String.class, Map.class)
.impl(impl).build(); // fall back to no-arg constructor
} catch (RuntimeException e) {
throw new IllegalArgumentException(String.format(
"Unable to find a constructor for implementation %s of %s. " +
"Make sure the implementation is in classpath, and that it either " +
"has a public no-arg constructor or a two-arg constructor " +
"taking in the string base table location and its property string map.",
impl, LocationProvider.class));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're catching the exception, it probably makes more sense to use buildChecked so the exception class isn't generic. That exception also has good information about why all of the implementations failed, so I would recommend adding that exception as a cause to this one that is thrown. I like throwing this one for context, though!

}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If provided class has the specified constructor signature, but not implementing LocationProvider, it can still be loaded into the dynConstructor. The following block differentiates this missing interface more explicitly.
Might be a bit more code churn, but it does give more signals to users.

try {
return ctor.newInstance(location, properties);
} catch (ClassCastException e) {
throw new IllegalArgumentException(
String.format("Provided implementation for dynamic instantiation should implement %s, " +
"but found dynamic constructor %s.", LocationProvider.class, ctor));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dynamic constructor?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here as well you should probably throw the exception with the original as a cause so the context isn't lost.

}
} else if (PropertyUtil.propertyAsBoolean(properties,
TableProperties.OBJECT_STORE_ENABLED,
TableProperties.OBJECT_STORE_ENABLED_DEFAULT)) {
return new ObjectStoreLocationProvider(location, properties);
Expand Down
2 changes: 2 additions & 0 deletions core/src/main/java/org/apache/iceberg/TableProperties.java
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,8 @@ private TableProperties() {

public static final String OBJECT_STORE_PATH = "write.object-storage.path";

public static final String WRITE_LOCATION_PROVIDER_IMPL = "write.location-provider.impl";

// This only applies to files written after this property is set. Files previously written aren't
// relocated to reflect this parameter.
// If not set, defaults to a "data" folder underneath the root path of the table.
Expand Down
215 changes: 215 additions & 0 deletions core/src/test/java/org/apache/iceberg/TestLocationProvider.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,215 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package org.apache.iceberg;

import java.util.Map;
import org.apache.iceberg.io.LocationProvider;
import org.junit.Assert;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.junit.runners.Parameterized;

@RunWith(Parameterized.class)
public class TestLocationProvider extends TableTestBase {
@Parameterized.Parameters
public static Object[][] parameters() {
return new Object[][] {
new Object[] { 1 },
new Object[] { 2 },
};
}

public TestLocationProvider(int formatVersion) {
super(formatVersion);
}

// publicly visible for testing to be dynamically loaded
public static class TwoArgDynamicallyLoadedLocationProvider implements LocationProvider {
String tableLocation;
Map<String, String> properties;

public TwoArgDynamicallyLoadedLocationProvider(String tableLocation, Map<String, String> properties) {
this.tableLocation = tableLocation;
this.properties = properties;
}

@Override
public String newDataLocation(String filename) {
return String.format("%s/test_custom_provider/%s", this.tableLocation, filename);
}

@Override
public String newDataLocation(PartitionSpec spec, StructLike partitionData, String filename) {
throw new RuntimeException("Test custom provider does not expect any invocation");
}
}

// publicly visible for testing to be dynamically loaded
public static class NoArgDynamicallyLoadedLocationProvider implements LocationProvider {
// No-arg public constructor

@Override
public String newDataLocation(String filename) {
return String.format("test_no_arg_provider/%s", filename);
}

@Override
public String newDataLocation(PartitionSpec spec, StructLike partitionData, String filename) {
throw new RuntimeException("Test custom provider does not expect any invocation");
}
}

// publicly visible for testing to be dynamically loaded
public static class InvalidArgTypesDynamicallyLoadedLocationProvider implements LocationProvider {

public InvalidArgTypesDynamicallyLoadedLocationProvider(Integer bogusArg1, String bogusArg2) {
}

@Override
public String newDataLocation(String filename) {
throw new RuntimeException("Invalid provider should have not been instantiated!");
}

@Override
public String newDataLocation(PartitionSpec spec, StructLike partitionData, String filename) {
throw new RuntimeException("Invalid provider should have not been instantiated!");
}
}

// publicly visible for testing to be dynamically loaded
public static class InvalidNoInterfaceDynamicallyLoadedLocationProvider {
// Default no-arg constructor is present, but does not impelemnt interface LocationProvider
}

@Test
public void testDefaultLocationProvider() {
this.table.updateProperties()
.commit();

this.table.locationProvider().newDataLocation("my_file");
Assert.assertEquals(
"Default data path should have table location as root",
String.format("%s/data/%s", this.table.location(), "my_file"),
this.table.locationProvider().newDataLocation("my_file")
);
}

@Test
public void testDefaultLocationProviderWithCustomDataLocation() {
this.table.updateProperties()
.set(TableProperties.WRITE_NEW_DATA_LOCATION, "new_location")
.commit();

this.table.locationProvider().newDataLocation("my_file");
Assert.assertEquals(
"Default location provider should allow custom path location",
"new_location/my_file",
this.table.locationProvider().newDataLocation("my_file")
);
}

@Test
public void testNoArgDynamicallyLoadedLocationProvider() {
String invalidImpl = String.format("%s$%s",
this.getClass().getCanonicalName(),
NoArgDynamicallyLoadedLocationProvider.class.getSimpleName());
this.table.updateProperties()
.set(TableProperties.WRITE_LOCATION_PROVIDER_IMPL, invalidImpl)
.commit();

Assert.assertEquals(
"Custom provider should take base table location",
"test_no_arg_provider/my_file",
this.table.locationProvider().newDataLocation("my_file")
);
}

@Test
public void testTwoArgDynamicallyLoadedLocationProvider() {
this.table.updateProperties()
.set(TableProperties.WRITE_LOCATION_PROVIDER_IMPL,
String.format("%s$%s",
this.getClass().getCanonicalName(),
TwoArgDynamicallyLoadedLocationProvider.class.getSimpleName()))
.commit();

Assert.assertTrue(String.format("Table should load impl defined in its properties"),
this.table.locationProvider() instanceof TwoArgDynamicallyLoadedLocationProvider
);

Assert.assertEquals(
"Custom provider should take base table location",
String.format("%s/test_custom_provider/%s", this.table.location(), "my_file"),
this.table.locationProvider().newDataLocation("my_file")
);
}

@Test
public void testDynamicallyLoadedLocationProviderNotFound() {
String nonExistentImpl = String.format("%s$NonExistent%s",
this.getClass().getCanonicalName(),
TwoArgDynamicallyLoadedLocationProvider.class.getSimpleName());
this.table.updateProperties()
.set(TableProperties.WRITE_LOCATION_PROVIDER_IMPL, nonExistentImpl)
.commit();

AssertHelpers.assertThrows("Non-existent implementation should fail on finding constructor",
IllegalArgumentException.class,
String.format("Unable to find a constructor for implementation %s of %s. ",
nonExistentImpl, LocationProvider.class),
() -> table.locationProvider()
);
}

@Test
public void testInvalidNoInterfaceDynamicallyLoadedLocationProvider() {
String invalidImpl = String.format("%s$%s",
this.getClass().getCanonicalName(),
InvalidNoInterfaceDynamicallyLoadedLocationProvider.class.getSimpleName());
this.table.updateProperties()
.set(TableProperties.WRITE_LOCATION_PROVIDER_IMPL, invalidImpl)
.commit();

AssertHelpers.assertThrows(
"Class with missing interface implementation should fail on instantiation.",
IllegalArgumentException.class,
String.format("Provided implementation for dynamic instantiation should implement %s",
LocationProvider.class),
() -> table.locationProvider()
);
}

@Test
public void testInvalidArgTypesDynamicallyLoadedLocationProvider() {
String invalidImpl = String.format("%s$%s",
this.getClass().getCanonicalName(),
InvalidArgTypesDynamicallyLoadedLocationProvider.class.getSimpleName());
this.table.updateProperties()
.set(TableProperties.WRITE_LOCATION_PROVIDER_IMPL, invalidImpl)
.commit();

AssertHelpers.assertThrows("Implementation with invalid arg types should fail on finding constructor",
IllegalArgumentException.class,
String.format("Unable to find a constructor for implementation %s of %s. ",
invalidImpl, LocationProvider.class),
() -> table.locationProvider()
);
}
}
2 changes: 2 additions & 0 deletions python/iceberg/core/table_properties.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,8 @@ class TableProperties(object):

OBJECT_STORE_PATH = "write.object-storage.path"

WRITE_LOCATION_PROVIDER_IMPL = "write.location-provider.impl"

WRITE_NEW_DATA_LOCATION = "write.folder-storage.path"

WRITE_METADATA_LOCATION = "write.metadata.path"
Expand Down
1 change: 1 addition & 0 deletions site/docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ Iceberg tables support table properties to configure table behavior, like the de
| write.parquet.compression-codec | gzip | Parquet compression codec |
| write.parquet.compression-level | null | Parquet compression level |
| write.avro.compression-codec | gzip | Avro compression codec |
| write.location-provider.impl | null | Optional custom implemention for LocationProvider |
| write.metadata.compression-codec | none | Metadata compression codec; none or gzip |
| write.metadata.metrics.default | truncate(16) | Default metrics mode for all columns in the table; none, counts, truncate(length), or full |
| write.metadata.metrics.column.col1 | (not set) | Metrics mode for column 'col1' to allow per-column tuning; none, counts, truncate(length), or full |
Expand Down