Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The plugin does not retry on 502 errors from the server #67

Closed
im-dim opened this issue Sep 10, 2021 · 9 comments · Fixed by #68
Closed

The plugin does not retry on 502 errors from the server #67

im-dim opened this issue Sep 10, 2021 · 9 comments · Fixed by #68
Assignees

Comments

@im-dim
Copy link

im-dim commented Sep 10, 2021

Does plugin do transmission on failure?

@im-dim
Copy link
Author

im-dim commented Sep 10, 2021

I meant re-transmission...

@andrzej-stencel
Copy link
Contributor

According to the docs, the plugin retries after a number of seconds, as described in the sleep_before_requeue configuration option in README.

Digging deeper, I found this line of code:

if response.code == 429 || response.code == 503 || response.code == 504

which would mean the plugin retries on 429, 503 and 504 errors from server, but not on 502 errors that you have been hitting.

Honestly I don't see why the plugin shouldn't retry on 502 as well, what do you think?

@im-dim
Copy link
Author

im-dim commented Sep 15, 2021

I'm going back and forth with Sumo Logic support...
I'm reading below logs as we are loosing data where there is a mix of 429 and 502...
Any advise?

[2021-09-15T08:26:03,373][INFO ][logstash.outputs.elasticsearch][elastiflow_out] retrying failed action with response code: 429 ({"type"=>"circuit_breaking_exception", "reason"=>"[parent] Data too large, data for [<transport_request>] would be [4058089810/3.7gb], which is larger than the limit of [4047097036/3.7gb], real usage: [4057899904/3.7gb], new bytes reserved: [189906/185.4kb], usages [request=72/72b, fielddata=11489703/10.9mb, in_flight_requests=764276/746.3kb, accounting=108807531/103.7mb]", "bytes_wanted"=>4058089810, "bytes_limit"=>4047097036, "durability"=>"PERMANENT"})
[2021-09-15T08:26:03,374][INFO ][logstash.outputs.elasticsearch][elastiflow_out] retrying failed action with response code: 429 ({"type"=>"circuit_breaking_exception", "reason"=>"[parent] Data too large, data for [<transport_request>] would be [4058089810/3.7gb], which is larger than the limit of [4047097036/3.7gb], real usage: [4057899904/3.7gb], new bytes reserved: [189906/185.4kb], usages [request=72/72b, fielddata=11489703/10.9mb, in_flight_requests=764276/746.3kb, accounting=108807531/103.7mb]", "bytes_wanted"=>4058089810, "bytes_limit"=>4047097036, "durability"=>"PERMANENT"})
[2021-09-15T08:26:03,374][INFO ][logstash.outputs.elasticsearch][elastiflow_out] Retrying individual bulk actions that failed or were rejected by the previous bulk request. {:count=>42}
[2021-09-15T08:29:34,758][ERROR][logstash.outputs.sumologic][sumo_out] request rejected {:token=>173, :code=>502, :headers=>{"X-Sumo-Client"=>"logstash-output-sumologic", "X-Sumo-Category"=>"elastiflow", "X-Sumo-Host"=>"ct-centos001", "X-Sumo-Name"=>"elastiflow", "Content-Type"=>"text/plain"}, :contet=>"{"event":{"type":"net"}
[2021-09-15T08:30:32,581][ERROR][logstash.outputs.sumologic][sumo_out] request rejected {:token=>94, :code=>502, :headers=>{"X-Sumo-Client"=>"logstash-output-sumologic", "X-Sumo-Category"=>"elastiflow", "X-Sumo-Host"=>"ct-centos001", "X-Sumo-Name"=>"elastiflow", "Content-Type"=>"text/plain"}, :contet=>"{"event":{"type":"ipf"}
[2021-09-15T08:30:35,188][ERROR][logstash.outputs.sumologic][sumo_out] request rejected {:token=>50, :code=>502, :headers=>{"X-Sumo-Client"=>"logstash-output-sumologic", "X-Sumo-Category"=>"elastiflow", "X-Sumo-Host"=>"ct-centos001", "X-Sumo-Name"=>"elastiflow", "Content-Type"=>"text/plain"}, :contet=>"{"event":{"type":"ipf"}

@andrzej-stencel andrzej-stencel changed the title Question - transmission The plugin does not retry on 502 errors from the server Sep 24, 2021
@kasia-kujawa kasia-kujawa linked a pull request Sep 27, 2021 that will close this issue
@andrzej-stencel
Copy link
Contributor

Hey @im-dim,

I've just released new version v1.4.0, which retries on 502 error codes.

I've reached out internally in Sumo to the team responsible for log ingestion about the 502 errors. They replied they monitor 502 errors and their alerting system wasn't triggered, which I think means the problem was intermittent. In that case the retry logic should help here.

@im-dim
Copy link
Author

im-dim commented Sep 27, 2021

Thank you Andrzej. We'll try the new version. Would there be a specific log message if retry fails? In other words, we'd like to know how much data we are missing...

About intermittent... In average, we get between 10,000 and 20,000 502 codes per day meaning that we miss thousands of records so I'm not sure if that can be called intermittent...

@andrzej-stencel
Copy link
Contributor

Looking at the code, it looks like the plugin retries infinitely, so there is no such thing as a failed retry. A failed retry will result in another retry, and so on.

Thanks for the numbers on 502 errors. I will pass them on to the ingestion team.

@im-dim
Copy link
Author

im-dim commented Sep 28, 2021

If there is no configurable limit on re-try, then there is a chance of memory consumption indefinite growth in cases where Sumo server has a problem and responds with 502. Right? Can re-try count be added with a separate error "X re-tries failed" or something like that?

@andrzej-stencel
Copy link
Contributor

I haven't run any tests, but again - looking at the code :) - there's a Ruby class SizedQueue used internally by the plugin, and it is size-capped, meaning that when the queue fills up, it stops accepting new data, making the enqueue operations to be blocked until space is freed (see docs for the push method).

@im-dim
Copy link
Author

im-dim commented Oct 8, 2021

Thank you for the reply.
How do I configure non_block=true to raise ThreadError upon queue overfill?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants