-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimise ESF #148
Optimise ESF #148
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work ! couple of comments, no blockers.
docs/README-AWS.md
Outdated
**Notes:** | ||
|
||
`inputs.[].json_content_type` can be defined as a string with on the of the following values: | ||
- *single*: indicates that the content of a single entry in the input payload is a single JSON object. The content can either be on a single line or spanning multiple lines. In this case the whole content of the payload is decoded as JSON object, with no limit on the number of lines the JSON object is spanning on. | ||
- *ndjson*: indicates that the content of a single entry in the input payload is a valid NDJSON format. In NDJSON format multiple single JSON objects formatted on a single line each are separated by a newline delimiter. In this case each line will be decoded as JSON object, improving the parsing performance. | ||
- *disabled*: instructs the Elastic Server Forwarder to not attempt any JSON content automatic discovery and threat the content as plain text, improving the performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couple of small typos here: Server = Serverless, threat = treat.
docs/README-AWS.md
Outdated
@@ -470,11 +470,13 @@ In case of JSON objects spanning multiple lines a limit of 1000 lines is applied | |||
|
|||
Sometimes relaying on the Elastic Serverless Forwarder JSON content auto-discovery feature might have a huge impact on performance, or you have a known payload content of a single JSON object spanning more than 1000 lines. In this case you can provide in the input configuration and hint on the nature of the JSON content: this will change the parsing logic applied and improve performance or overcome the 1000 lines limit. | |||
|
|||
This setting allows also to disable at all any attempt of JSON content automatic discovery, in case of known plain text content. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it would good to move the reason you might do this up to this section. currently we mention it lower down - "improving performance", but i think we could expand on it a little here.
from typing import Any, Union | ||
|
||
import boto3 | ||
from botocore.client import BaseClient as BotoBaseClient | ||
from ujson import JSONDecodeError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be good if we could swap out the implementation library at will without any additional code changes here (see above comment)
fixes #151 |
Enhancement
What does this PR do?
expand_event_list_from_field
and json dumperjson_content_type
not passed as argument to storage factoriesdisabled
value forjson_content_type
in order to totally skip json auto discoveryWhy is it important?
We need to optimise as much as possible handling json content, since the most performance impacting code in the forwarder.
Having benchmarks available for the different use cases will help decide which json library to use and in case even switching to different libraries according to the matrix of
expand_event_list_from_field
andjson_content_type
setting provided by the userI've run the new benchmark and identified ujson` as the most performing json package
Here the outcome of several iterations of optimisation:
Checklist
CHANGELOG.md
Author's Checklist
How to test this PR locally
Related issues
Use cases
Screenshots
Logs