-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow impl for LocationProvider for to be created during runtime #1531
Allow impl for LocationProvider for to be created during runtime #1531
Conversation
I have 2 concerns with this PR:
|
hi @jackye1995 thanks for the feedback here! This is aligned with the discussion we had during the sync for solution that folks can easily customizable their locations. I've re-purposed this PR to address this. If this is in, we can easily implement our data localization here. |
Thank you! I created an issue for this, can you link it to this PR? #1538 |
cc @stripe/iceberg-reviewers, I iterated on this based on feedback to have pluggable impl for location provider instead |
Thanks for working on this, @mickjermsurawong-stripe! I like dynamically loading a location provider, and I like that the changes required are pretty small. That looks good to me. What I'd like to change is how this is tested. There are two main issues with it. First, the tests are in an unrelated suite for Spark that is mainly testing that Spark works with a variety of different schemas. It doesn't seem appropriate to put the tests for injecting a location provider in a different module and into this suite. Second, the I would prefer a more targeted test in a new suited in core, We also need negative tests for when the provider can't be found. I'd like to have a good error message like "Cannot find location provider class: %s". |
@jackye1995, I think this dynamic loading approach might be a good idea for injecting S3FileIO as well. What do you think? |
@rdblue yes, that is exactly how I implement, just waiting for your branch to cross check all details about S3FileIO before opening a PR |
@rdblue, ah thank you for the feedback on test! I guess the complicated test here is the artifacts of me trying to test the logic in my first data localization feature. I created a separate test as suggested; I made use of the existing |
String.format("Unable to instantiate the provided implementation %s for %s.", | ||
nonExistentImpl, | ||
LocationProvider.class)); | ||
this.table.locationProvider(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of exception rules, we like to use AssertHelpers.assertThrows
. That way, the test doesn't need to end when the exception is thrown and you can run other validations.
Also, we don't typically use the prefix this.
unless setting an instance field, unless it helps understanding what's going on. We always using when assigning, though.
|
||
// publicly visible for testing to be dynamically loaded | ||
public static class InvalidDynamicallyLoadedLocationProvider implements LocationProvider { | ||
// No public constructor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we support a no-arg constructor as well as the one that passes the table location and properties?
} | ||
|
||
@Test | ||
public void testInvalidDynamicallyLoadedLocationProvider() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the class doesn't implement LocationProvider
?
Could you add a test case for that?
@@ -87,6 +87,8 @@ private TableProperties() { | |||
public static final String OBJECT_STORE_ENABLED = "write.object-storage.enabled"; | |||
public static final boolean OBJECT_STORE_ENABLED_DEFAULT = false; | |||
|
|||
public static final String LOCATION_PROVIDER_IMPL = "write.location-provider.impl"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to add this to the docs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
DynConstructors.builder(LocationProvider.class) | ||
.impl(impl).build() | ||
); | ||
Optional<DynConstructors.Ctor<LocationProvider>> twoArgConstructor = Optional.empty(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the arguments passed to one constructor are a prefix of the args passed to the other, there is no need to use a different builder or call because the arguments actually passed are limited by the number of parameters defined by the constructor. That makes it possible to pass both arguments to either one, and any extras are ignored.
That helps simplify cases like this because you don't need the extra optional logic:
ctor = DynConstructors.builder(LocationProvider.class)
.impl(impl, String.class, Map.class)
.impl(impl) // fall back to no-arg constructor
.build()
return ctor.newInstance(location, properties);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the explanation here
<td class="colFirst"><a name="org.apache.iceberg.TableProperties.WRITE_METADATA_LOCATION"> | ||
<!-- --> | ||
</a><code>public static final java.lang.String</code></td> | ||
<td><code><a href="org/apache/iceberg/TableProperties.html#WRITE_METADATA_LOCATION">WRITE_METADATA_LOCATION</a></code></td> | ||
<td class="colLast"><code>"write.metadata.path"</code></td> | ||
</tr> | ||
<tr class="altColor"> | ||
<tr class="rowColor"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove the html changes please? The docs that @aokolnychyi was referring to are built from markdown source located in site/docs/configuration.md
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh. ack!
Thanks for the update, @mickjermsurawong-stripe! I think the implementation can be quite a bit simpler and more like the original, even with support for the no-arg constructor. Sorry I didn't mention that earlier, before you went ahead and implemented it this way. |
"has a public no-arg constructor or a two-arg constructor " + | ||
"taking in the string base table location and its property string map.", | ||
impl, LocationProvider.class)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If provided class has the specified constructor signature, but not implementing LocationProvider
, it can still be loaded into the dynConstructor. The following block differentiates this missing interface more explicitly.
Might be a bit more code churn, but it does give more signals to users.
thanks @rdblue for the review so far! I believe this should be ready for another review. |
} catch (ClassCastException e) { | ||
throw new IllegalArgumentException( | ||
String.format("Provided implementation for dynamic instantiation should implement %s, " + | ||
"but found dynamic constructor %s.", LocationProvider.class, ctor)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dynamic constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here as well you should probably throw the exception with the original as a cause so the context isn't lost.
"Make sure the implementation is in classpath, and that it either " + | ||
"has a public no-arg constructor or a two-arg constructor " + | ||
"taking in the string base table location and its property string map.", | ||
impl, LocationProvider.class)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're catching the exception, it probably makes more sense to use buildChecked
so the exception class isn't generic. That exception also has good information about why all of the implementations failed, so I would recommend adding that exception as a cause to this one that is thrown. I like throwing this one for context, though!
Propagating the causes now should give stacktrace for these two cases, with more specific causes.
and
|
Thanks, @mickjermsurawong-stripe! I merged this. Nice work, and thank you for your patience and quick turn-around with the review comments! |
Thank you Ryan! This will help with our internal feature :) |
This PR allows users to supply implementation of
LocationProvider
to be created during runtime.This is motivated from the proposal to solve data localization.
From the community sync, the feedback is to make to leave implementation to users so other use cases outside data localization could make use of this too