Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix type cast issue in transport action call #223

Conversation

dai-chen
Copy link
Contributor

@dai-chen dai-chen commented Jul 13, 2021

Signed-off-by: Chen Dai daichen@amazon.com

Description

Problem Statement

OpenSearch loads each plugin by different class loader separately. When two plugins need to communicate by transport action, it’s common to put request and response class into a module shared by both. However, as per the forum post, this may cause problem because OpenSearch does "optimization to avoid serialization" for local request between plugins on same JVM.

Here is the sample error when Reporting plugin tries to call Notification plugin by common code in common-utils module. You can see the full class names are exactly the same, though type cast happened and failed due to the 2 classes loaded by different FactoryURLClassLoader.

java.lang.ClassCastException: class org.opensearch.commons.notifications.action.SendNotificationRequest 
cannot be cast to class org.opensearch.commons.notifications.action.SendNotificationRequest
(org.opensearch.commons.notifications.action.SendNotificationRequest
  is in unnamed module of loader java.net.FactoryURLClassLoader @6965f207;
 org.opensearch.commons.notifications.action.SendNotificationRequest 
  is in unnamed module of loader java.net.FactoryURLClassLoader @4b54bed2)

The common-utils jar is shared by Reporting and Notification plugin and loaded by different class loaders. When Reporting tries to call send notification API, OpenSearch passes instance of class SendNotificationRequest loaded by reporting class loader. However, Notification plugin expects an instance of same class though loaded by its own class loader. The same problem for SendNotificationResponse class in ActionListener.

classloader

Proposed Solutions

The following solutions are proposed as discussed in opensearch-project/common-utils#37:

  1. Copy common jar to OpenSearch lib at runtime and make each plugin compileOnly depends on it
  2. Move common code to OpenSearch core engine codebase
  3. Extend OpenSearch core engine to support this kind of communication
  4. Force serialization/de-serialization request/response class even for local request [CHOSEN]

The options 4 is chosen although it has small overhead. It is favored because it’s not hacky as the first 2 options and much less complex than the option 3 (design and feasibility is unclear).

Implementation

To force (de-)serialization for request and response class, a type checking and object recreating logic needs to add at the beginning of request/response handling. Actually this logic is already existing in Notification codebase. In the case of Notification plugin, the root cause is the wrong generic type. Concrete request/response class is used in executeRequest(XXXRequest) and onResponse(XXXResponse) respectively. This leads to type cast exception occurred before entering the main logic of the function (which includes the type checking and object recreating logic).

internal class SendNotificationAction @Inject constructor(
    ...
) : PluginBaseAction<SendNotificationRequest, SendNotificationResponse>(
    ...
) {

    override fun doExecute(
        task: Task?,
        request: SendNotificationRequest,  -- root cause!!!
        listener: ActionListener<SendNotificationResponse>
    ) {
        val transformedRequest = request as? SendNotificationRequest
            ?: recreateObject(request) { SendNotificationRequest(it) }
        super.doExecute(task, transformedRequest, listener)
    }

Thus this PR is to change the generic type to base class ActionRequest (and ActionResponse in the other PR in common-utils). This enables the control flow enters the method, does the type check and recreates request/response object if needed.

Issues Resolved

opensearch-project/common-utils#37

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen self-assigned this Jul 13, 2021
@dai-chen dai-chen added the bug Something isn't working label Jul 13, 2021
Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen changed the title Fix classloader issue in transport action call between plugins Fix type cast issue in transport action call Jul 13, 2021
joshuali925
joshuali925 previously approved these changes Jul 14, 2021
@dai-chen dai-chen marked this pull request as ready for review July 14, 2021 17:29
@dai-chen dai-chen requested a review from zhongnansu July 14, 2021 17:29
zhongnansu
zhongnansu previously approved these changes Jul 14, 2021
@dblock
Copy link
Member

dblock commented Jul 14, 2021

Does this require opensearch-project/common-utils#38? Can you please explain in this and the other PR what's going on? Also a good blog post idea as this is getting technically complicated and fun ;)

@dai-chen
Copy link
Contributor Author

Does this require opensearch-project/common-utils#38? Can you please explain in this and the other PR what's going on? Also a good blog post idea as this is getting technically complicated and fun ;)

Sounds good! Will put more details in the PR description. Thanks!

@dblock
Copy link
Member

dblock commented Jul 14, 2021

Good work!

@dblock
Copy link
Member

dblock commented Jul 14, 2021

This still needs tests, I am worried that nothing was failing before and nothing is succeeding after this change.

@dai-chen
Copy link
Contributor Author

This still needs tests, I am worried that nothing was failing before and nothing is succeeding after this change.

Sure, will check if UT is missing or not.

Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen dismissed stale reviews from zhongnansu and joshuali925 via dde3fe6 July 19, 2021 21:46
dai-chen added 5 commits July 19, 2021 15:24
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen
Copy link
Contributor Author

This still needs tests, I am worried that nothing was failing before and nothing is succeeding after this change.

Sure, will check if UT is missing or not.

@dblock I've added all missing UT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants