-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix injection failure of StorageLocationSelectorStrategy objects #10363
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
...ain/java/org/apache/druid/segment/loading/LeastBytesUsedStorageLocationSelectorStrategy.java
Outdated
Show resolved
Hide resolved
.../java/org/apache/druid/segment/loading/MostAvailableSizeStorageLocationSelectorStrategy.java
Outdated
Show resolved
Hide resolved
...er/src/main/java/org/apache/druid/segment/loading/RandomStorageLocationSelectorStrategy.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/apache/druid/segment/loading/RoundRobinStorageLocationSelectorStrategy.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/apache/druid/segment/loading/StorageLocationSelectorStrategyTest.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/apache/druid/segment/loading/StorageLocationSelectorStrategyTest.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/segment/loading/SegmentLoaderLocalCacheManager.java
Outdated
Show resolved
Hide resolved
Tagged "Design Review" since it changes user-facing configurations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have an alternate approach recommended that I think will be more robust.
} | ||
|
||
@VisibleForTesting | ||
LeastBytesUsedStorageLocationSelectorStrategy(List<StorageLocation> storageLocations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this approach is brittle and might break when a new implementation of StorageLocationSelectorStrategy
is added. Instead, I think we should make List<StorageLocation> storageLocations
injectable instead via a module, something like
@Provides
@Singleton
public List<StorageLocation> provideStorageLocations(SegmentLoaderConfig config)
{
this.locations = new ArrayList<>();
for (StorageLocationConfig locationConfig : config.getLocations()) {
locations.add(
new StorageLocation(
locationConfig.getPath(),
locationConfig.getMaxSize(),
locationConfig.getFreeSpacePercent()
)
);
return locations;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @suneet-s I check the code and find that this approach is a little bit complex for the existing code.
Storage selector strategy object is constructed during deserialization of SegmentLoaderConfig
, and only after the construction of SegmentLoaderConfig
, could it be possible to inject SegmentLoaderConfig
to other objects to get list of StorageLocationConfig
inside the object.
If we want to inject StorageLocation
objects as the way you suggest, the strategy object must be separated from SegmentLoaderConfig
into another new config class so that both SegmentLoaderConfig
and this new config class can be injected into SegmentLocalCacheManager
. There will be lots of test cases involved to change to meet the new ctor of SegmentLocalCacheManager
. So I think these change involve more complexity.
Back to the concern you mentioned, I think there's no need to worry that new implementation would break the constraints. Because if it breaks, the problem would be easily detected during unit test or integrated test.
What do you think ?
docs/configuration/index.md
Outdated
@@ -1379,7 +1379,7 @@ These Historical configurations can be defined in the `historical/runtime.proper | |||
|Property|Description|Default| | |||
|--------|-----------|-------| | |||
|`druid.segmentCache.locations`|Segments assigned to a Historical process are first stored on the local file system (in a disk cache) and then served by the Historical process. These locations define where that local cache resides. This value cannot be NULL or EMPTY. Here is an example `druid.segmentCache.locations=[{"path": "/mnt/druidSegments", "maxSize": "10k", "freeSpacePercent": 1.0}]`. "freeSpacePercent" is optional, if provided then enforces that much of free disk partition space while storing segments. But, it depends on File.getTotalSpace() and File.getFreeSpace() methods, so enable if only if they work for your File System.| none | | |||
|`druid.segmentCache.locationSelectorStrategy`|The strategy used to select a location from the configured `druid.segmentCache.locations` for segment distribution. Possible values are `leastBytesUsed`, `roundRobin`, `random`, or `mostAvailableSize`. |leastBytesUsed| | |||
|`druid.segmentCache.locationSelectorStrategy.type`|The strategy used to select a location from the configured `druid.segmentCache.locations` for segment distribution. Possible values are `leastBytesUsed`, `roundRobin`, `random`, or `mostAvailableSize`. |leastBytesUsed| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a doc fix? I don't see an associated change in the code. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a fix. According to original code, annotation on StorageLocationSelectorStrategy
indicates the name of json property is type
. When using druid.segmentCache.locationSelectorStrategy
, jackson tries to find ctor with a String parameter , which causes the issue.
Hi @jihoonson @suneet-s , here's a clarification for the change of user facing configuration. This PR has no code change with the configuration name, but a rectification of the doc from the wrong configuration item name of selector strategy named as When I first checked the issue, the doc also made me confused and took me sometime on the question which was the right name. Looking at the annotation of
Based upon the code, the right configuration name should I don't know why the
And the doc says the property name should be |
@jihoonson @suneet-s I come up with an idea that requires no change to the existing configuration.
The disadvantage is that all properties extended by implementations of What do u think ? |
@FrankChen021 I'll look through these changes over the next couple of days and get back to you. Thanks for the fix and your patience :) |
The latest CI reports that
I don't know how to handle it. Do you have any idea ? @suneet-s @asdf2014 |
This is because of a mismatch between the version used in Other CI failures look unrelated. I just restarted them. |
@@ -54,9 +54,6 @@ | |||
@JsonProperty("numBootstrapThreads") | |||
private Integer numBootstrapThreads = null; | |||
|
|||
@JsonProperty("locationSelectorStrategy") | |||
private StorageLocationSelectorStrategy locationSelectorStrategy; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the previous implementation is better since you can use the configuration name druid.segmentCache.locationSelectorStrategy
without type
. Is there a reason that locationSelectorStrategy
cannot be here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because StorageLocationStrategy
depends on SegmentLoaderConfig.locations
. If StorageLocationStrategy
is placed here, when SegmentLoaderConfig
is being deserialized, locations
required by StorageLocationStrategy
can't be found and injected into it.
After moving this property out of SegmentLoaderConfig
, both SegmentLoaderConfig
and StorageLocationStrategy
are deserialized during construction of SegmentLoaderLocalCacheManager
, and jackson could find the locations
objects required by strategy object through google guice framework to create strategy object correctly.
Using the configuration name without type
I think is wrong. Please take a look at this configuration druid.coordinator.balancer.strategy
, or the clarification on the change of configuration name I left above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Then, how about using a similar naming to the coordinator balancer? For example, we can bind StorageLocationSelectorStrategy
to druid.segmentCache.locationSelector
, and use a strategy
property name for StorageLocationSelectorStrategy
instead of type
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late reply.
Your suggestion makes the naming more meaningful. I'll update this PR later this day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. @FrankChen021 thank you!
@suneet-s do you have more comments? |
@@ -52,6 +55,7 @@ public void configure(Binder binder) | |||
{ | |||
JsonConfigProvider.bind(binder, "druid.server", DruidServerConfig.class); | |||
JsonConfigProvider.bind(binder, "druid.segmentCache", SegmentLoaderConfig.class); | |||
JsonConfigProvider.bind(binder, "druid.segmentCache.locationSelector.strategy", StorageLocationSelectorStrategy.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be druid.segmentCache.locationSelector
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry, this is a bug. I remember I tested it in my cluster, maybe I missed something then. The code has been updated in the latest commit.
binder.bind(Properties.class).toInstance(props); | ||
|
||
JsonConfigProvider.bind(binder, "druid.segmentCache", SegmentLoaderConfig.class); | ||
JsonConfigProvider.bind(binder, "druid.segmentCache.locationSelector", StorageLocationSelectorStrategy.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This binding is different from what actual module binds. I think this is why we missed the wrong binding. Can we use the same StorageNodeModule
here? Or can we add a helper method which does the proper binding for both StorageNodeModule
and tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why StorageModule
is not used here is because there're some extra dependencies in StorageModule
, which might introduce lots of injection code in test cases. A helper method is extracted to do the correct binding.
@FrankChen021 Can you describe how you tested this in your test cluster. What was the user-visible behavior before and after this change so that we can update the release notes if needed. I didn't see any integration tests added for this change, so it's possible that someone might break this behavior again in the future - would it be possible to add integration tests? |
Hi @suneet-s , in this PR a log is added in the ctor of
In this way, I know whether the configuration takes effect. As the test cases, I've added some unit test cases to test whether this configuration takes effect by setting |
Hi @jihoonson @suneet-s , CI reports that compaction integration test fails while all other checks are ok. I don't know why, could you help me take a look at it? |
@FrankChen021 It appears these tests are flaky. I re-triggered them and the job passed - could you file an issue for this flaky test please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I tested the changed property working in my local machine.
Hi @suneet-s , do you have any further comments ? |
@FrankChen021 Sorry for the delay in this review. Since there are no integration tests, I'm trying to see what would happen on an upgrade, and whether or not this change would introduce a change in behavior. If you have integration tests that prove nothing has changed on the upgrade path, it would make this review a lot faster. Thanks for your patience |
@suneet-s Got it. I'm not familiar with IT. One thing I don't understand is what we should expect from the integration tests. To verify if the right type of selector object injected for corresponding configuration? |
@FrankChen021 I think the ITs should test that Druid uses the |
I see. It's an end-to-end test case. Usually it is tricky to set up an environment for such a case. I also studied some IT cases, I found that test cases send HTTP requests to nodes to verify whether they run successfully or not. But for historical nodes, there're no such interfaces exposed for us to observe the behavior of selector strategy. As you say, it's a little tricky. |
Hi @suneet-s , Is this PR ready to be merged ? or is there anything I need to do ? |
@FrankChen021 - I got pulled in to some other issues and haven't had the time to look at this yet. I will look at this over the weekend and get back to you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks for the tests and fix @FrankChen021 - sorry for the delay in reviewing this.
I'm happy to merge the fix as soon as the conflict is resolved.
….type to druid.segmentCache.locationSelector.strategy
46862d4
to
90dfa73
Compare
Hi @suneet-s The conflict has been resolved by rebasing the branch onto the latest master. But CI failed, I checked it, and it seems that it's not related to the changes in PR. Could you re-trigger it to see if it passes ? |
Thanks @FrankChen021 - looks like someone else beat me to the re-trigger. |
…che#10363) * fix to allow customer storage location selector strategy * add test cases to check instance of selector strategy * update doc * code format * resolve code review comments * inject StorageLocation * fix CI * fix mismatched license item reported by CI * change property path from druid.segmentCache.locationSelectorStrategy.type to druid.segmentCache.locationSelector.strategy * using a helper method to bind to correct property path
This PR fixes #10348 , which is caused by injection failure of storage selector strategy.
Description
Currently, all 4 implementations of
StorageLocationSelectorStrategy
requires a list ofStorageLocation
objects during construction. AndStorageLocation
differs fromStorageLocationConfig
deserialized from configuration file, and the former is instantiated bySegmentLoaderLocalCacheManager
. This also meansStorageLocation
could not be injected intoStorageLocationSelectorStrategy
when they are being constructed.In this PR,
the ctor of implementations of
StorageLocationSelectorStrategy
are removed, instead, a setter of storage location method is provided in this interface so thatSegmentLoaderLocalCacheManager
could pass the objects to the strategy objectbased on the original code, configuration property should be
druid.segmentCache.locationSelectorStrategy.type
, related docs are also updatedsome unit test cases are added to check whether these strategy objects are correctly instantiated from configuration properties.
This PR has: