Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[proofpoint_on_demand]: Datastreams do not recover after websocket error #11816

Open
zacharycox-tamu opened this issue Nov 21, 2024 · 6 comments
Assignees
Labels
bug Something isn't working, use only for issues Integration:proofpoint_on_demand Proofpoint On Demand Team:Security-Service Integrations Security Service Integrations Team [elastic/security-service-integrations]

Comments

@zacharycox-tamu
Copy link

Integration Name

Proofpoint On Demand [proofpoint_on_demand]

Dataset Name

proofpoint_on_demand.audit, proofpoint_on_demand.messages

Integration Version

1.0.1

Agent Version

8.15.2

Agent Output Type

elasticsearch

Elasticsearch Version

8.15.2

OS Version and Architecture

RHEL 9.5

Software/API Version

No response

Error Message

2024-11-13T17:54:29.833 ERROR Input 'websocket' failed with: input websocket-proofpoint_on_demand.message-6d3cb712-1ad6-48fe-9f6d-9fbca2fbff84 failed: websocket: close 1006 (abnormal closure): unexpected EOF

2024-11-13T17:54:29.835 INFO Input 'websocket' starting

2024-11-13T17:54:29.836 ERROR add_cloud_metadata: received error failed requesting GCP metadata: Get "http://169.254.169.254/computeMetadata/v1/?recursive=true&alt=json": dial tcp 169.254.169.254:80: i/o timeout

2024-11-13T17:54:29.834 WARN EXPERIMENTAL: The websocket input is experimental

2024-11-13T17:54:31.804 INFO Input 'websocket' starting

2024-11-13T17:54:31.804 INFO add_cloud_metadata: hosting provider type not detected.

2024-11-13T17:54:36.615 INFO Connecting to backoff(elasticsearch(https://bc785b55a36c4bbaaa4732eba04467e7.us-east-2.aws.elastic-cloud.com:443))

2024-11-13T17:54:36.736 INFO Attempting to connect to Elasticsearch version 8.15.2

2024-11-13T17:54:37.182 INFO Connection to backoff(elasticsearch(https://bc785b55a36c4bbaaa4732eba04467e7.us-east-2.aws.elastic-cloud.com:443)) established

2024-11-13T17:59:36.230 ERROR WebSocket connection closed

2024-11-13T17:59:36.230 INFO Unregistering

2024-11-13T21:31:42.764 ERROR Input 'websocket' failed with: input websocket-proofpoint_on_demand.audit-6d3cb712-1ad6-48fe-9f6d-9fbca2fbff84 failed: websocket: close 1001 (going away): java.util.concurrent.TimeoutException: Idle timeout expired: 300000/300000 ms

2024-11-13T21:32:22.326 WARN Cannot index event (status=400): dropping event! Look at the event log to view the event and cause.

Event Original

Last data_stream.dataset: proofpoint_on_demand.message before break

{
  "connection": {
    "country": "US",
    "helo": "[redacted-helo]",
    "host": "[redacted-host]",
    "ip": "[redacted-ip]",
    "protocol": "smtp:smtp",
    "resolveStatus": "ok",
    "sid": "[redacted-sid]",
    "tls": {
      "inbound": {
        "cipher": "ECDHE-RSA-AES256-GCM-SHA384",
        "cipherBits": 256,
        "version": "TLSv1.2"
      }
    }
  },
  "envelope": {
    "from": "[redacted-from]",
    "rcpts": ["[redacted-rcpt]"]
  },
  "filter": {
    "actions": [
      {
        "action": "annotate-text",
        "module": "access",
        "rule": "[redacted-rule]"
      },
      {
        "action": "continue",
        "isFinal": true,
        "module": "access",
        "rule": "[redacted-rule]"
      },
      {
        "action": "add-header",
        "module": "av",
        "rule": "clean"
      },
      {
        "action": "continue",
        "module": "av",
        "rule": "clean"
      },
      {
        "action": "add-header",
        "module": "spam",
        "rule": "notspam"
      }
    ],
    "delivered": {
      "rcpts": ["[redacted-rcpt]"]
    },
    "disposition": "continue",
    "durationSecs": 0.232333,
    "modules": {
      "spam": {
        "authority": {
          "analysis": "[redacted-analysis]",
          "cartVersion": "[redacted-version]",
          "isComplete": true,
          "isTruncated": false,
          "resultAttributeSet": [
            {
              "attribute": "context attribute",
              "values": ["c_pps"]
            }
          ],
          "score": 0,
          "sigs": [
            {
              "engine": 117,
              "isPresent": false,
              "signature": "[redacted-signature]"
            }
          ]
        },
        "langs": ["en", "pt", "es"],
        "scores": {
          "classifiers": {
            "adult": 0,
            "bulk": 0,
            "impostor": 0,
            "lowpriority": 0,
            "malware": 0,
            "mlx": 0,
            "phish": 0,
            "spam": 0,
            "suspect": 0
          },
          "overall": 0
        },
        "version": {
          "engine": "[redacted-engine]"
        }
      }
    },
    "msgSizeBytes": 11420,
    "qid": "[redacted-qid]",
    "routeDirection": "internal",
    "routes": ["allow_relay", "default_inbound"],
    "verified": {
      "rcpts": ["[redacted-rcpt]"]
    }
  },
  "guid": "[redacted-guid]",
  "metadata": {
    "origin": {
      "data": {
        "agent": "[redacted-agent]",
        "cid": "[redacted-cid]",
        "version": "[redacted-version]"
      }
    }
  },
  "msg": {
    "header": {
      "from": ["[redacted-from]"],
      "message-id": ["[redacted-message-id]"],
      "return-path": ["[redacted-return-path]"],
      "subject": ["[redacted-subject]"],
      "to": ["[redacted-to]"]
    },
    "lang": "en",
    "sizeBytes": 8565
  },
  "msgParts": [
    {
      "dataBase64": "[redacted-data]",
      "detectedMime": "text/html",
      "md5": "[redacted-md5]",
      "sha256": "[redacted-sha256]",
      "urls": [
        {
          "isRewritten": true,
          "url": "[redacted-url]"
        }
      ]
    }
  ],
  "ts": "2024-11-14T03:28:51.317080-0600"
}

Last data_stream.dataset: proofpoint_on_demand.audit before breaking

{
  "audit": {
    "action": "read",
    "level": "INFO",
    "resourceName": "[redacted-resource-name]",
    "resourceType": "smart_search",
    "tags": [
      {
        "name": "eventSubCategory",
        "value": "quarantine"
      },
      {
        "name": "eventDetails",
        "value": "GUID: [redacted-guid]"
      },
      {
        "name": "read.quarantine",
        "value": "true"
      }
    ],
    "user": {
      "email": "[redacted-email]",
      "id": "[redacted-user-id]",
      "ipAddress": "[redacted-ip]"
    }
  },
  "guid": "[redacted-guid]",
  "metadata": {
    "customerId": "[redacted-customer-id]",
    "origin": {
      "data": {
        "agent": "[redacted-agent]",
        "cid": "[redacted-cid]",
        "version": "[redacted-version]"
      },
      "schemaVersion": "1.0",
      "type": "cadmin-api-gateway"
    }
  },
  "ts": "2024-11-13T23:01:46.988006+0000"
}

What did you do?

Integration was added through "Browse Integrations". Websocket authentication credentials work successfully and logs ingest from all three datastreams until the websocket encountered an unrecoverable error. To regain ingest, the integration has to be manually disabled and re-enabled.

What did you see?

Ingestion on all three datastreams proceeded as expected until eventually breaking and not recovering from an issue resulting from websocket component.

( websocket: close 1006 (abnormal closure): unexpected EOF).

What did you expect to see?

Ingestion proceeding uninterrupted.

Anything else?

No response

@zacharycox-tamu zacharycox-tamu changed the title [Integration Name]: Brief description of the issue [proofpoint_on_demand]: Datastreams do not recover after websocket error Nov 21, 2024
@andrewkroh andrewkroh added Integration:proofpoint_on_demand Proofpoint On Demand Team:Security-Service Integrations Security Service Integrations Team [elastic/security-service-integrations] labels Nov 21, 2024
@elasticmachine
Copy link

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)

@strawgate
Copy link
Contributor

strawgate commented Nov 27, 2024

I'm not sure about the message that failed to index but it looks like the websocket input does not attempt to reconnect on disconnect by default (see: https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-streaming.html) and it looks like the integration doesnt set the retry behavior. I'm not familiar enough with the input to know why this may have been chosen as the default.

It may be as simple as adding the retry configuration or making it configurable via the integration:

  retry:
    max_attempts: 5
    wait_min: 1s
    wait_max: 10s

@arshein123
Copy link

We have observed similar behavior, and noticed that restarting the elastic agent is the only way to resume collecting Proofpoint data. In addition to addressing this issue, it would be great to add more detailed error logging, to explain why the "Unregistering" event occurs.

@andrewkroh
Copy link
Member

andrewkroh commented Nov 27, 2024

The retry options were added in 8.16.0 via elastic/beats#40271. They are not enabled by default.

@ShourieG, @efd6, For the streaming input, are there situations where the input should not retry a connection? At first glance it seems like we would want to always retry indefinitely (with a good sized backoff). So perhaps the default behavior should be to retry? I also think there should be way to retry indefinitely (infinite max_attempts).

Without retry enabled by default, we do need update each of the integrations that use it to perform the retry for robustness.

@efd6
Copy link
Contributor

efd6 commented Nov 27, 2024

So perhaps the default behavior should be to retry?

I think so.

@andrewkroh andrewkroh added bug Something isn't working, use only for issues and removed needs:triage labels Nov 27, 2024
@ShourieG
Copy link
Contributor

ShourieG commented Jan 6, 2025

@zacharycox-tamu @andrewkroh, Configurable retry options with default values were added recently in the integration via this PR. This is available if you have the base 8.16.0 stack. Default values were added at the input level in this PR and back-ported to 8.16 & 8.17, though that is not available until 8.16.3 & 8.17.1.

The integration version upgrade should solve the issue for now.

@ShourieG ShourieG self-assigned this Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working, use only for issues Integration:proofpoint_on_demand Proofpoint On Demand Team:Security-Service Integrations Security Service Integrations Team [elastic/security-service-integrations]
Projects
None yet
Development

No branches or pull requests

7 participants