Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persistence queue #861

Merged
merged 31 commits into from
Sep 6, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
7d75c48
fix: Add persistent queue
wojtekzyla Apr 18, 2023
0429ea5
Merge branch 'main' into persistent-queue-updated
atoulme May 5, 2023
57e3fbb
fix: rename splunkSendingQueue to splunkPlatformSendingQueue, add a c…
wojtekzyla May 9, 2023
c47660b
Merge branch 'main' into persistent-queue-updated
wojtekzyla May 11, 2023
2a4c183
Update helm-charts/splunk-otel-collector/values.yaml
wojtekzyla May 11, 2023
bb3588c
fix: describe where persistent queue is currently supported
wojtekzyla May 22, 2023
c76ed52
fix: Add persistent buffering for cluster receiver, agent and gateway
Jun 28, 2023
9fe3bb0
Merge remote-tracking branch 'origin/main' into persistent-queue
Jun 28, 2023
c281b62
Fix pre-commit
Jun 28, 2023
81f9c57
Merge branch 'main' into persistent-queue-updated
VihasMakwana Jun 29, 2023
2f640b3
Merge remote-tracking branch 'origin/main' into wojciech/persistent-q…
Jul 19, 2023
2c81de2
Exclude persistent buffering for eks/fargate and gke/autopilot
Jul 27, 2023
a330d6c
Merge remote-tracking branch 'origin/main' into persistent-queue-updated
Jul 27, 2023
e79bbf4
remove persistence queue for gateway
Jul 27, 2023
efaf983
fix: have granular control while adding persistent queue
Jul 28, 2023
ab14b57
fix: get rid of "persistentQueueEnabled" helper and update changelog
Jul 28, 2023
e2894d0
chore: add docs
Aug 1, 2023
79aa19a
chore: add examples
Aug 1, 2023
81c3eb1
chore: add persistent buffering for traces
Aug 1, 2023
825af68
Update docs/advanced-configuration.md
VihasMakwana Aug 23, 2023
e93fc17
Merge branch 'main' into persistent-queue-updated
jvoravong Aug 24, 2023
117137a
chore: remove persistent buffering for cluster receiver and add note
Aug 29, 2023
10c7488
fix: pre-commit
Aug 29, 2023
7e85307
FIX: test case failure
Aug 29, 2023
91e309e
fix: linting
Aug 29, 2023
5fdf7ba
fix: improve readability
Aug 30, 2023
36ae510
Update helm-charts/splunk-otel-collector/values.yaml
VihasMakwana Aug 30, 2023
231277a
chore: remove unnecessary details
Sep 2, 2023
52aebc5
Merge branch 'main' into persistent-queue-updated
Sep 2, 2023
90b802d
chore: add functional test cases covering persistent queue
Sep 3, 2023
f95f343
Merge branch 'main' into persistent-queue-updated
dmitryax Sep 6, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

### Added

- Configuration of persistent buffering for agent [861](https://github.com/signalfx/splunk-otel-collector-chart/pull/861)
- Add option to disable Openshift SecurityContextConstraint resource [#843](https://github.com/signalfx/splunk-otel-collector-chart/pull/843)

## [0.83.0] - 2023-08-18
Expand Down
60 changes: 60 additions & 0 deletions docs/advanced-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -720,3 +720,63 @@ rbac:
```
helm install my-splunk-otel-collector -f my_values.yaml splunk-otel-collector-chart/splunk-otel-collector
```

## Data Persistence

By default, without any configuration, data is queued in memory only. When data cannot be sent it is retried a few times (up to 5 mins. by default) and then dropped.

If for any reason, the collector is restarted in this period, the queued data will be gone.

If you want the queue to be persisted on disk across collector restarts, set `splunkPlatform.sendingQueue.persistentQueue.enabled` to enable support for logs, metrics and traces.

By default, data is persisted in `/var/addon/splunk/exporter_queue` directory.
Override this behaviour by setting `splunkPlatform.sendingQueue.persistentQueue.storagePath` option.

Check [Data Persistence in the OpenTelemetry Collector
](https://community.splunk.com/t5/Community-Blog/Data-Persistence-in-the-OpenTelemetry-Collector/ba-p/624583) for detailed explantion.

Note: Data Persistence is only applicable for agent daemonset.

Use following in values.yaml to disable data persistense for logs or metrics or traces:

```yaml
agent:
config:
exporters:
splunk_hec/platform_logs:
sending_queue:
storage: null
```
or
```yaml
agent:
config:
exporters:
splunk_hec/platform_metrics:
sending_queue:
storage: null
```
or
```yaml
agent:
config:
exporters:
splunk_hec/platform_traces:
sending_queue:
storage: null
```

### Support for persistent queue

* `GKE/Autopilot` and `EKS/Fargate` support
* Both of the above distributions doesn't allow volume mounts, as they are kind of `serverless` and we don't manage the underlying infrastructure.
* Persistent buffering is not supported for them, as directory needs to be mounted via `hostPath`.
* Refer [aws/fargate](https://docs.aws.amazon.com/eks/latest/userguide/fargate.html) and [gke/autopilot](https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-security#built-in-security).
* Gateway support
* The filestorage extention acquires an exclusive lock for the queue directory.
* It is not possible to run the persistent buffering if there are multiple replicas of a pod and `gateway` runs 3 replicas by default.
* Even if support is somehow provided, only one of the pods will be able to acquire the lock and run, while the others will be blocked and unable to operate.
* Cluster Receiver support
* Cluster receiver is a 1-replica deployment of Open-temlemetry collector.
* As any available node can be selected by the Kubernetes control plane to run the cluster receiver pod (unless we explicitly specify the `clusterReceiver.nodeSelector` to pin the pod to a specific node), `hostPath` or `local` volume mounts wouldn't work for such envrionments.
* Data Persistence is currently not applicable to the k8s cluster metrics and k8s events.
7 changes: 7 additions & 0 deletions examples/disable-persistence-queue-traces/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Example of chart configuration

## Disable Persistent Queue for traces only

This example will show how to disable data persistence for traces data.

Refer to: https://github.com/signalfx/splunk-otel-collector-chart/blob/main/docs/advanced-configuration.md#data-persistence
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
clusterName: CHANGEME
splunkPlatform:
endpoint: CHANGEME
token: CHANGEME
index: CHANGEME
metricsIndex: CHANGEME
metricsEnabled: true
tracesEnabled: true
logsEnabled: true
sendingQueue:
persistentQueue:
enabled: true

logsEngine: otel

agent:
config:
exporters:
splunk_hec/platform_traces:
sending_queue:
storage: null
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
---
# Source: splunk-otel-collector/templates/clusterRole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: default-splunk-otel-collector
labels:
app.kubernetes.io/name: splunk-otel-collector
helm.sh/chart: splunk-otel-collector-0.83.0
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/instance: default
app.kubernetes.io/version: "0.83.0"
app: splunk-otel-collector
chart: splunk-otel-collector-0.83.0
release: default
heritage: Helm
rules:
- apiGroups:
- ""
resources:
- events
- namespaces
- namespaces/status
- nodes
- nodes/spec
- nodes/stats
- nodes/proxy
- pods
- pods/status
- persistentvolumeclaims
- persistentvolumes
- replicationcontrollers
- replicationcontrollers/status
- resourcequotas
- services
verbs:
- get
- list
- watch
- apiGroups:
- apps
resources:
- daemonsets
- deployments
- replicasets
- statefulsets
verbs:
- get
- list
- watch
- apiGroups:
- extensions
resources:
- daemonsets
- deployments
- replicasets
verbs:
- get
- list
- watch
- apiGroups:
- batch
resources:
- jobs
- cronjobs
verbs:
- get
- list
- watch
- apiGroups:
- autoscaling
resources:
- horizontalpodautoscalers
verbs:
- get
- list
- watch
- nonResourceURLs:
- /metrics
verbs:
- get
- list
- watch
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
---
# Source: splunk-otel-collector/templates/clusterRoleBinding.yaml
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: default-splunk-otel-collector
labels:
app.kubernetes.io/name: splunk-otel-collector
helm.sh/chart: splunk-otel-collector-0.83.0
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/instance: default
app.kubernetes.io/version: "0.83.0"
app: splunk-otel-collector
chart: splunk-otel-collector-0.83.0
release: default
heritage: Helm
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: default-splunk-otel-collector
subjects:
- kind: ServiceAccount
name: default-splunk-otel-collector
namespace: default
Loading
Loading