-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drop event batch when get HTTP status 413 from ES #29368
Conversation
To prevent infinite loops when having `http.max_content_length` set too low or `bulk_max_size` too high we now handle this status code separately and drop the whole event batch producing a detailed error message on the console.
This pull request does not have a backport label. Could you fix it @rdner? 🙏
NOTE: |
💚 Build Succeeded
Expand to view the summary
Build stats
Test stats 🧪
💚 Flaky test reportTests succeeded. 🤖 GitHub commentsTo re-run your PR in the CI, just comment with:
|
I'm also going to add a unit test a bit later but figured the earlier we start to discuss this the better. |
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
@faec please have a look when you have time, I think it's ready for a review. |
Adding @elastic/elastic-agent-data-plane as reviewers for awareness |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, thanks!
Hi @rdner
Kibana was running on default configuration.
We create a
We installed filebeat and run the provided command
We added the required configuration and restarted elasticsearch.
file.log and ras.sh shared below: Data was added to file.log through We even run However we observed no Data under Discover tab: Please let us know if we are missing anything. Thanks |
Hello @amolnater-qasource
Looks like you didn't wait until this step has finished in filebeat, it can take up to a few minutes. You should be looking at the logs making sure they stop coming and the running filebeat is idle and waiting for data to process. The way it works, when you configure the filebeat for the first time it creates a lot of different things in Elasticsearch, some of them can exceed the set limit (100KB) and if you set the parameters: http:
max_content_length: 100KB it would not be able to create those things including the template the error message on the screenshot refers to. There a few things you can try (1 or 2):
echo "[DEBUG] $(date) $(./ras.sh 512100)" >> /path/to/your/file.log Let me know if either helped. |
Thanks for looking into this @rdner
When I ran the command for the first time it ran successfully till the end. It created indices successfully.
We again ran the same command as shared in step 6-
We followed exactly the same.
UPDATE: Thanks again! |
@amolnater-qasource what happens when you try the second option? |
@rdner We attempted the second way too, using below configuration elasticsearch.yml
And we have run
We even re-attempted the whole test to avoid any gaps, however the results were still same. Thanks |
@amolnater-qasource I tried to follow the steps described in the description of this PR using 8.1 and it worked fine as I initially described. But I never tried to follow these steps on Windows. There are several issues with your setup:
NOTE You're not supposed to see any data in Kibana after Also, Kibana MUST be If you are still not able to make it work with your current setup, you might want to try a different approach using Docker, here is how it's done:
filebeat.inputs:
- type: log
paths:
- "path/to/your/input.log"
output:
elasticsearch:
hosts: ["http://localhost:9200"]
username: "admin"
password: "testing"
setup.kibana:
host: "localhost:5601" you need to change "path/to/your/input.log" to your actual filename
environment:
- "ES_JAVA_OPTS=-Xms1g -Xmx1g"
- "network.host="
- "transport.host=127.0.0.1"
- "http.host=0.0.0.0"
- "http.max_content_length=100KB"
- "xpack.security.enabled=true"
# We want something as unlimited compilation rate, but 'unlimited' is not valid.
- "script.max_compilations_rate=100000/1m"
- "action.destructive_requires_name=false"
# Disable geoip updates to prevent golden file test failures when the database
# changes and prevent race conditions between tests and database updates.
- "ingest.geoip.downloader.enabled=false"
{"log.level":"error","@timestamp":"2022-02-24T13:27:52.997+0100","log.logger":"elasticsearch","log.origin":{"file.name":"elasticsearch/client.go","file.line":240},"message":"failed to perform any bulk index operations: the bulk payload is too large for the server. Consider to adjust `http.max_content_length` parameter in Elasticsearch or `bulk_max_size` in the beat. The batch has been dropped","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2022-02-24T13:27:54.504+0100","log.logger":"publisher_pipeline_output","log.origin":{"file.name":"pipeline/client_worker.go","file.line":176},"message":"failed to publish events: the bulk payload is too large for the server. Consider to adjust `http.max_content_length` parameter in Elasticsearch or `bulk_max_size` in the beat. The batch has been dropped","service.name":"filebeat","ecs.version":"1.6.0"}
Let me know whether you're able to fix your current environment or the new approach worked for you. |
Hi @rdner Build details:
Steps followed:
file.log is attached below: We even ran below set of commands and we observed the relevant data under discover tab.
We have run the same without Thanks |
@amolnater-qasource according to the step 21 that I wrote earlier, you should have seen the expected error message in the logs (console output) not in Discover. This is error is not displayed in any way in Kibana Discovered. The messages that are too long are just dropped and you should see the error message (step 21) according to that. |
Hi @rdner
We have observed the expected error logs under filebeat console: Build details:
Please let us know if anything else is required from our end. Thanks! |
What does this PR do?
To prevent infinite loops when having
http.max_content_length
settoo low or
bulk_max_size
too high we now handle this status codeseparately and drop the whole event batch producing a detailed error
message on the console.
Why is it important?
Because without this change beats are forever stuck once they get the first 413 response from Elasticsearch.
Checklist
I have made corresponding changes to the documentationI have made corresponding change to the default configuration filesCHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.How to test this PR locally
your_filebeat.yml
Notice, the filename must be changed to your location
filebeat setup -e -c your_filebeat.yml
Now it's easier to cause 413 in Elasticsearch because of the lower limit, however, we had to run
filebeat setup
before we set this limit in order to create required indices out of templates, otherwise the templates would be too large.filebeat -e -c your_filebeat.yml
ras.sh
This will write a message with 10 characters to the log file:
This should work just fine but this one should cause an error:
Before this PR you would see this error and filebeat would go in an infinite loop:
After this PR you would see and no infinite loop:
You should see 5 more 10 character messages in the index (via Kibana Discover) after the last command and the big message should be dropped. If there are no
sleep
commands in between, the dropped batch might include short messages as well. For test purposes it's better to keep them.Related issues
Closes #14350