-
Notifications
You must be signed in to change notification settings - Fork 460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[proofpoint_on_demand]: Datastreams do not recover after websocket error #11816
Comments
Pinging @elastic/security-service-integrations (Team:Security-Service Integrations) |
I'm not sure about the message that failed to index but it looks like the websocket input does not attempt to reconnect on disconnect by default (see: https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-streaming.html) and it looks like the integration doesnt set the retry behavior. I'm not familiar enough with the input to know why this may have been chosen as the default. It may be as simple as adding the retry configuration or making it configurable via the integration:
|
We have observed similar behavior, and noticed that restarting the elastic agent is the only way to resume collecting Proofpoint data. In addition to addressing this issue, it would be great to add more detailed error logging, to explain why the "Unregistering" event occurs. |
The retry options were added in 8.16.0 via elastic/beats#40271. They are not enabled by default. @ShourieG, @efd6, For the Without retry enabled by default, we do need update each of the integrations that use it to perform the retry for robustness. |
I think so. |
@zacharycox-tamu @andrewkroh, Configurable retry options with default values were added recently in the integration via this PR. This is available if you have the base 8.16.0 stack. Default values were added at the input level in this PR and back-ported to 8.16 & 8.17, though that is not available until 8.16.3 & 8.17.1. The integration version upgrade should solve the issue for now. |
Integration Name
Proofpoint On Demand [proofpoint_on_demand]
Dataset Name
proofpoint_on_demand.audit, proofpoint_on_demand.messages
Integration Version
1.0.1
Agent Version
8.15.2
Agent Output Type
elasticsearch
Elasticsearch Version
8.15.2
OS Version and Architecture
RHEL 9.5
Software/API Version
No response
Error Message
2024-11-13T17:54:29.833 ERROR Input 'websocket' failed with: input websocket-proofpoint_on_demand.message-6d3cb712-1ad6-48fe-9f6d-9fbca2fbff84 failed: websocket: close 1006 (abnormal closure): unexpected EOF
2024-11-13T17:54:29.835 INFO Input 'websocket' starting
2024-11-13T17:54:29.836 ERROR add_cloud_metadata: received error failed requesting GCP metadata: Get "http://169.254.169.254/computeMetadata/v1/?recursive=true&alt=json": dial tcp 169.254.169.254:80: i/o timeout
2024-11-13T17:54:29.834 WARN EXPERIMENTAL: The websocket input is experimental
2024-11-13T17:54:31.804 INFO Input 'websocket' starting
2024-11-13T17:54:31.804 INFO add_cloud_metadata: hosting provider type not detected.
2024-11-13T17:54:36.615 INFO Connecting to backoff(elasticsearch(https://bc785b55a36c4bbaaa4732eba04467e7.us-east-2.aws.elastic-cloud.com:443))
2024-11-13T17:54:36.736 INFO Attempting to connect to Elasticsearch version 8.15.2
2024-11-13T17:54:37.182 INFO Connection to backoff(elasticsearch(https://bc785b55a36c4bbaaa4732eba04467e7.us-east-2.aws.elastic-cloud.com:443)) established
2024-11-13T17:59:36.230 ERROR WebSocket connection closed
2024-11-13T17:59:36.230 INFO Unregistering
2024-11-13T21:31:42.764 ERROR Input 'websocket' failed with: input websocket-proofpoint_on_demand.audit-6d3cb712-1ad6-48fe-9f6d-9fbca2fbff84 failed: websocket: close 1001 (going away): java.util.concurrent.TimeoutException: Idle timeout expired: 300000/300000 ms
2024-11-13T21:32:22.326 WARN Cannot index event (status=400): dropping event! Look at the event log to view the event and cause.
Event Original
Last data_stream.dataset: proofpoint_on_demand.message before break
Last data_stream.dataset: proofpoint_on_demand.audit before breaking
What did you do?
Integration was added through "Browse Integrations". Websocket authentication credentials work successfully and logs ingest from all three datastreams until the websocket encountered an unrecoverable error. To regain ingest, the integration has to be manually disabled and re-enabled.
What did you see?
Ingestion on all three datastreams proceeded as expected until eventually breaking and not recovering from an issue resulting from websocket component.
( websocket: close 1006 (abnormal closure): unexpected EOF).
What did you expect to see?
Ingestion proceeding uninterrupted.
Anything else?
No response
The text was updated successfully, but these errors were encountered: