Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance Bug] Repeated deserialization of headers for a direct channel request #2724

Closed
parasjain1 opened this issue Apr 26, 2023 · 9 comments · Fixed by #2765
Closed
Assignees
Labels
bug Something isn't working triaged Issues labeled as 'Triaged' have been reviewed and are deemed actionable. v2.9.0 v2.9.0

Comments

@parasjain1
Copy link
Contributor

What is the issue?
During a direct channel request i.e. a request which originated and is handled by the same node, the security plugin kicks in and deserializes userHeader and originalRemoteAddress objects again hence consuming more time to process the request.

How can one reproduce the bug?
A simple bulk request on a multi node cluster should reproduce the issue.

What is the expected behavior?
Do not deserialize the objects again for direct channel requests, and re-use the previously deserialized values.

What is your host/environment?

  • OS v2.3
  • Security Plugin

Do you have any additional context?
SecurityIntercepter.sendMessageDecorate stashes the current thread context due to which the transient headers are cleared. SecurityRequestHandler.messageReceivedDecorate deserializes the userHeader and originalRemoteAddress objects just to put them in thread context's transient headers.

For a direct channel request (channelType == "direct"), SecurityIntercepter.sendMessageDecorate may retain these values in transient headers and SecurityRequestHandler.messageReceivedDecorate will no longer need to deserialize these again.

@parasjain1 parasjain1 added bug Something isn't working untriaged Require the attention of the repository maintainers and may need to be prioritized labels Apr 26, 2023
@stephen-crawford
Copy link
Contributor

[Triage] Hi @parasjain1, thank you for filing this issue. If you would be interested in opening a PR this may be the fastest way to correct the behavior you documented. Please reach out to @cwperks with any additional context.

@stephen-crawford stephen-crawford added triaged Issues labeled as 'Triaged' have been reviewed and are deemed actionable. and removed untriaged Require the attention of the repository maintainers and may need to be prioritized labels May 1, 2023
@DarshitChanpura
Copy link
Member

This might not be a straightforward if-else fix as sendRequestDecorate doesn't have direct access to determine whether request is being sent to same node. I'm going to look dive deep further and update my findings here.

Serializing:

Background: Transport service uses direct channelType via localNodeConnection object upon determination that the current DiscoveryNode is a local node when trying to open a connection via openConnection method.

When the request flow reaches sendRequestDecorate, it doesn't have access to channelType or isLocalNode to identify whether the request is going to be handled by same node. Need to take a deeper look into Connection.java to understand how this value can be made accessible to SecurityInterceptor.

Deserializing:

Since messageReceivedDecorate has access to TransportChannel object and already has a check in place for direct channelType we can just remove the call to deserialize userHeader and originalRemoteAddress

@DarshitChanpura
Copy link
Member

Update:

One possible solution to solve the serializing approach is to compare the ephemeralIds of localNode and connection Node. This is derived for DiscoveryNode.java's equals() method to compare two nodes.

I will raise a PR for this.

@DarshitChanpura DarshitChanpura self-assigned this May 10, 2023
@parasjain1
Copy link
Contributor Author

parasjain1 commented May 11, 2023

HeaderHelper.isDirectRequest() has the logic to determine if the current request is a direct (local node) request. It relies on a header OPENDISTRO_SECURITY_CHANNEL_TYPE in ThreadContext. IMO it might solve the issue of identifying whether the request is direct.

@mgodwan
Copy link
Member

mgodwan commented May 11, 2023

@parasjain1

  1. OPENDISTRO_SECURITY_CHANNEL_TYPE header is populated based on transport channel when message is received. Is the header populated while sending request as well?

  2. While sending message, given the interceptors may run before the transport channel is created, we may not know the channel type when security interceptor is invoked. is it not the case?

@parasjain1
Copy link
Contributor Author

parasjain1 commented May 11, 2023

Checked this, and this indeed is the flow. The method name HeaderHelper.isDirectRequest() lead me to believe that this would work anywhere in the security plugin flow. Thanks for pointing out @mgodwan.

@DarshitChanpura
Copy link
Member

@parasjain1 @mgodwan I have raised a PR for this fix. Please review: #2765

@DarshitChanpura
Copy link
Member

Post fix observations:


Profiling Security Transport Interceptors

Prerequisites:

Pre #2765

Setup:

Steps:

  • Get the opensearch process id: ps aux | grep ‘opensearch’
  • Run the profiler: /profiler.sh -e wall -t -f without-changes.html -d 20 -i 50us -I 'Security.' -I 'java/' -X 'Unsafe.park' 60626
  • Run bulk-requests. python3 buk-index.py

Results:

  • Open without-changes.html

Image

sendMessageDecorate takes .18% CPU (.16% is on the thread shown above)

Image (1)

messageReceivedDecorate takes 1.12% of the CPU

Post the fix: #2765

Setup:

  • Fetch latest changes from main in OpenSearch and Security repos
  • Following DEVELOPER_GUIDE get OpenSearch up and running with security plugin installed

Steps:

  • Get the opensearch process id: ps aux | grep ‘opensearch’
  • Run the profiler: /profiler.sh -e wall -t -f with-changes.html -d 20 -i 50us -I 'Security.' -I 'java/' -X 'Unsafe.park' 60626
  • Run bulk-requests. python3 buk-index.py

Results:

  • Open with-changes.html

Image (2)

sendMessageDecorate takes .04% CPU (.03% is on the thread shown above)

Image (3)

messageReceivedDecorate consumes 1.02% CPU

Conclusion:

We can infer that for smaller number of bulk requests (100k) we see a decrease of .15% of CPU usage. Although not highly visible this number would rise when number of requests increase significantly.


Appendix:

bulk-index.py
from opensearchpy import OpenSearch
from opensearchpy import helpers
import random as r
import requests
import json
import base64

proto = 'https'
host = 'localhost'
port = *9200
*auth = ('admin', 'admin') # For testing only. Don't store credentials in code.

# Create the client with SSL/TLS enabled, but hostname verification disabled.
client = OpenSearch(
    hosts = [{'host': host, 'port': port}],
    http_compress = *True*, # enables gzip compression for request bodies
    http_auth = auth,
    use_ssl = *True*,
    verify_certs = *False*,
    ssl_assert_hostname = *False*,
    ssl_show_warn = *False
*)

# Create an index with non-default settings.
index_name = 'movies'
docs_to_index = *100000
*

# Uncomment to create many indices

# for i in range(100):
#   response = client.indices.create(f'movies-{i}')
#   print('\nCreating index:')
#   print(response)
#   document = {
#     'title': 'Moneyball',
#     'director': 'Bennett Miller',
#     'year': '2011'
#   }

#   response = client.index(
#       index = f'movies-{i}',
#       body = document,
#       id = '1',
#       refresh = True
#   )


title_choices = ['Titanic', 'Jurassic Park', 'Star Wars']
year_choices = ['2013', '1992', '2023', '2001', '1985']

lorem = [
    'Lorem ipsum dolor sit amet, consectetur adipiscing elit.',
    'Quisque vitae varius ex, eu volutpat orci.',
    'Aenean ullamcorper orci et vulputate fermentum.',
    'Cras erat dui, finibus vel lectus ac, pharetra dictum odio.',
    'Nullam tempus scelerisque purus, sed mattis elit condimentum nec.',
    'Etiam risus sapien, auctor eu volutpat sit amet, porta in nunc.',
    'Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas.',
    'Proin ipsum purus, laoreet quis dictum a, laoreet sed ligula.',
    'Integer ultricies malesuada quam.',
    'Cras vel elit sed mi placerat pharetra eget vel odio.',
    'Duis ac nulla varius diam ultrices rutrum.'
]

basic_auth_token = base64.b64encode('admin:admin'.encode()).decode()

headers = {
    'Content-Type': 'application/json',
    'Authorization': f'Basic *{*basic_auth_token*}*'
}


bulk = []
for i in range(docs_to_index):
    document = {
        'title': r.choice(title_choices),
        'year': r.choice(year_choices),
        'description': r.choice(lorem)
    }

    bulk.append({"_index": index_name, "_id": f"*{*i*}*", "_source": document})

    if (i+*1*) % *100* == *0*:
        print(f'*\n*Adding documents *{*i - *99**}* - *{*i*}*')
        try:
            helpers.bulk(client, bulk)
        except Exception as e:
            print(f"An exception occurred while indexing: *{*e*}*")
        bulk = []

@DarshitChanpura
Copy link
Member

A CI bug was introduced in the original PR: #2765 which caused flaky CI. This fix was addressed via: #3066

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triaged Issues labeled as 'Triaged' have been reviewed and are deemed actionable. v2.9.0 v2.9.0
Projects
None yet
5 participants