Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persistence queue #861

Merged
merged 31 commits into from
Sep 6, 2023
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
7d75c48
fix: Add persistent queue
wojtekzyla Apr 18, 2023
0429ea5
Merge branch 'main' into persistent-queue-updated
atoulme May 5, 2023
57e3fbb
fix: rename splunkSendingQueue to splunkPlatformSendingQueue, add a c…
wojtekzyla May 9, 2023
c47660b
Merge branch 'main' into persistent-queue-updated
wojtekzyla May 11, 2023
2a4c183
Update helm-charts/splunk-otel-collector/values.yaml
wojtekzyla May 11, 2023
bb3588c
fix: describe where persistent queue is currently supported
wojtekzyla May 22, 2023
c76ed52
fix: Add persistent buffering for cluster receiver, agent and gateway
Jun 28, 2023
9fe3bb0
Merge remote-tracking branch 'origin/main' into persistent-queue
Jun 28, 2023
c281b62
Fix pre-commit
Jun 28, 2023
81f9c57
Merge branch 'main' into persistent-queue-updated
VihasMakwana Jun 29, 2023
2f640b3
Merge remote-tracking branch 'origin/main' into wojciech/persistent-q…
Jul 19, 2023
2c81de2
Exclude persistent buffering for eks/fargate and gke/autopilot
Jul 27, 2023
a330d6c
Merge remote-tracking branch 'origin/main' into persistent-queue-updated
Jul 27, 2023
e79bbf4
remove persistence queue for gateway
Jul 27, 2023
efaf983
fix: have granular control while adding persistent queue
Jul 28, 2023
ab14b57
fix: get rid of "persistentQueueEnabled" helper and update changelog
Jul 28, 2023
e2894d0
chore: add docs
Aug 1, 2023
79aa19a
chore: add examples
Aug 1, 2023
81c3eb1
chore: add persistent buffering for traces
Aug 1, 2023
825af68
Update docs/advanced-configuration.md
VihasMakwana Aug 23, 2023
e93fc17
Merge branch 'main' into persistent-queue-updated
jvoravong Aug 24, 2023
117137a
chore: remove persistent buffering for cluster receiver and add note
Aug 29, 2023
10c7488
fix: pre-commit
Aug 29, 2023
7e85307
FIX: test case failure
Aug 29, 2023
91e309e
fix: linting
Aug 29, 2023
5fdf7ba
fix: improve readability
Aug 30, 2023
36ae510
Update helm-charts/splunk-otel-collector/values.yaml
VihasMakwana Aug 30, 2023
231277a
chore: remove unnecessary details
Sep 2, 2023
52aebc5
Merge branch 'main' into persistent-queue-updated
Sep 2, 2023
90b802d
chore: add functional test cases covering persistent queue
Sep 3, 2023
f95f343
Merge branch 'main' into persistent-queue-updated
dmitryax Sep 6, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,7 @@ This Splunk OpenTelemetry Collector for Kubernetes release adopts the [Splunk Op
### Added

- Option to use lightprometheus receiver through a feature gate for metrics collection from discovered Prometheus endpoints [757](https://github.com/signalfx/splunk-otel-collector-chart/pull/757)
- Configuration of persistent buffering for agent [753](https://github.com/signalfx/splunk-otel-collector-chart/pull/753)

### Changed

Expand Down
20 changes: 20 additions & 0 deletions helm-charts/splunk-otel-collector/templates/_helpers.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -441,6 +441,26 @@ Whether clusterReceiver should be enabled
{{- end -}}


{{/*
Whether persistentQueue should be enabled
*/}}
{{- define "splunk-otel-collector.persistentQueueEnabledLogs" -}}
{{ $gatewayDisabled := ne (include "splunk-otel-collector.gatewayEnabled" .) "true" }}
{{- and $gatewayDisabled (.Values.splunkPlatform.sendingQueue.persistentQueueEnabled.logs) (and (ne (include "splunk-otel-collector.distribution" .) "eks/fargate") (ne (include "splunk-otel-collector.distribution" .) "gke/autopilot")) -}}
{{- end -}}
dmitryax marked this conversation as resolved.
Show resolved Hide resolved


{{- define "splunk-otel-collector.persistentQueueEnabledMetrics" -}}
{{ $gatewayDisabled := ne (include "splunk-otel-collector.gatewayEnabled" .) "true" }}
{{- and $gatewayDisabled (.Values.splunkPlatform.sendingQueue.persistentQueueEnabled.metrics) (and (ne (include "splunk-otel-collector.distribution" .) "eks/fargate") (ne (include "splunk-otel-collector.distribution" .) "gke/autopilot")) -}}
{{- end -}}

VihasMakwana marked this conversation as resolved.
Show resolved Hide resolved

{{- define "splunk-otel-collector.persistentQueueEnabled" -}}
VihasMakwana marked this conversation as resolved.
Show resolved Hide resolved
{{- or (eq (include "splunk-otel-collector.persistentQueueEnabledLogs" .) "true") (eq (include "splunk-otel-collector.persistentQueueEnabledMetrics" .) "true") }}
{{- end -}}


{{/*
Build the securityContext for Linux and Windows
*/}}
Expand Down
26 changes: 26 additions & 0 deletions helm-charts/splunk-otel-collector/templates/config/_common.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,26 @@ filter/logs:
value: "true"
{{- end }}

{{- define "splunk-otel-collector.persistentQueueLogs" -}}
file_storage/persistent_queue_logs:
{{- if .forAgent }}
directory: {{ .Values.splunkPlatform.sendingQueue.persistentQueueEnabled.storagePath }}/agent/logs
{{- else }}
directory: {{ .Values.splunkPlatform.sendingQueue.persistentQueueEnabled.storagePath }}/clusterReceiver/logs
{{- end }}
timeout: 0
{{- end }}

{{- define "splunk-otel-collector.persistentQueueMetrics" -}}
file_storage/persistent_queue_metrics:
{{- if .forAgent }}
directory: {{ .Values.splunkPlatform.sendingQueue.persistentQueueEnabled.storagePath }}/agent/metrics
{{- else }}
directory: {{ .Values.splunkPlatform.sendingQueue.persistentQueueEnabled.storagePath }}/clusterReceiver/metrics
{{- end }}
timeout: 0
{{- end }}
VihasMakwana marked this conversation as resolved.
Show resolved Hide resolved

{{/*
Splunk Platform Logs exporter
*/}}
Expand Down Expand Up @@ -271,6 +291,9 @@ splunk_hec/platform_logs:
enabled: {{ .Values.splunkPlatform.sendingQueue.enabled }}
num_consumers: {{ .Values.splunkPlatform.sendingQueue.numConsumers }}
queue_size: {{ .Values.splunkPlatform.sendingQueue.queueSize }}
{{- if (eq (include "splunk-otel-collector.persistentQueueEnabledLogs" .) "true") }}
storage: file_storage/persistent_queue_logs
{{- end }}
{{- end }}

{{/*
Expand Down Expand Up @@ -308,6 +331,9 @@ splunk_hec/platform_metrics:
enabled: {{ .Values.splunkPlatform.sendingQueue.enabled }}
num_consumers: {{ .Values.splunkPlatform.sendingQueue.numConsumers }}
queue_size: {{ .Values.splunkPlatform.sendingQueue.queueSize }}
{{- if (eq (include "splunk-otel-collector.persistentQueueEnabledMetrics" .) "true") }}
storage: file_storage/persistent_queue_metrics
{{- end }}
{{- end }}

{{/*
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,11 @@ extensions:
directory: {{ .Values.logsCollection.checkpointPath }}
{{- end }}

{{- if (eq (include "splunk-otel-collector.persistentQueueEnabled" .) "true") }}
VihasMakwana marked this conversation as resolved.
Show resolved Hide resolved
{{- include "splunk-otel-collector.persistentQueueLogs" (dict "Values" .Values "forAgent" true) | nindent 2 }}
{{- include "splunk-otel-collector.persistentQueueMetrics" (dict "Values" .Values "forAgent" true) | nindent 2 }}
{{- end }}

memory_ballast:
size_mib: ${SPLUNK_BALLAST_SIZE_MIB}

Expand Down Expand Up @@ -648,6 +653,10 @@ service:
{{- if and (eq (include "splunk-otel-collector.logsEnabled" .) "true") (eq .Values.logsEngine "otel") }}
- file_storage
{{- end }}
{{- if (eq (include "splunk-otel-collector.persistentQueueEnabled" .) "true") }}
- file_storage/persistent_queue_logs
- file_storage/persistent_queue_metrics
{{- end }}
- health_check
- k8s_observer
- memory_ballast
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,11 @@ The values can be overridden in .Values.clusterReceiver.config
{{- define "splunk-otel-collector.clusterReceiverConfig" -}}
{{ $clusterReceiver := fromYaml (include "splunk-otel-collector.clusterReceiver" .) -}}
extensions:
{{- if (eq (include "splunk-otel-collector.persistentQueueEnabled" .) "true") }}
{{- include "splunk-otel-collector.persistentQueueLogs" (dict "Values" .Values "forAgent" false) | nindent 2 }}
{{- include "splunk-otel-collector.persistentQueueMetrics" (dict "Values" .Values "forAgent" false) | nindent 2 }}
VihasMakwana marked this conversation as resolved.
Show resolved Hide resolved
{{- end }}

VihasMakwana marked this conversation as resolved.
Show resolved Hide resolved
health_check:

memory_ballast:
Expand Down Expand Up @@ -198,11 +203,16 @@ service:
telemetry:
metrics:
address: 0.0.0.0:8889
{{- if eq (include "splunk-otel-collector.distribution" .) "eks/fargate" }}
extensions: [health_check, memory_ballast, k8s_observer]
{{- else }}
extensions: [health_check, memory_ballast]
{{- end }}
extensions:
- health_check
- memory_ballast
{{- if eq (include "splunk-otel-collector.distribution" .) "eks/fargate" }}
VihasMakwana marked this conversation as resolved.
Show resolved Hide resolved
- k8s_observer
{{- end }}
VihasMakwana marked this conversation as resolved.
Show resolved Hide resolved
{{- if (eq (include "splunk-otel-collector.persistentQueueEnabled" .) "true") }}
VihasMakwana marked this conversation as resolved.
Show resolved Hide resolved
- file_storage/persistent_queue_metrics
- file_storage/persistent_queue_logs
{{- end }}
pipelines:
{{- if or (eq (include "splunk-otel-collector.o11yMetricsEnabled" $) "true") (eq (include "splunk-otel-collector.platformMetricsEnabled" $) "true") }}
# k8s metrics pipeline
Expand Down
40 changes: 40 additions & 0 deletions helm-charts/splunk-otel-collector/templates/daemonset.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,18 @@ spec:
then
setfacl -n -Rm d:m::rx,m::rx,d:g:{{ $agent.securityContext.runAsGroup | default 999 }}:rx,g:{{ $agent.securityContext.runAsGroup | default 999 }}:rx {{ .Values.logsCollection.journald.directory }};
fi;
{{- end }}
{{- if (eq (include "splunk-otel-collector.persistentQueueEnabledLogs" .) "true") }}
mkdir -p {{ .Values.splunkPlatform.sendingQueue.persistentQueueEnabled.storagePath }}/agent/logs;
chown -Rv {{ $agent.securityContext.runAsUser | default 999 }}:{{ $agent.securityContext.runAsGroup | default 999 }} {{ .Values.splunkPlatform.sendingQueue.persistentQueueEnabled.storagePath }}/agent/logs;
chmod -v g+rwxs {{ .Values.splunkPlatform.sendingQueue.persistentQueueEnabled.storagePath }}/agent/logs;
setfacl -n -Rm d:m::rx,m::rx,d:g:{{ $agent.securityContext.runAsGroup | default 999 }}:rx,g:{{ $agent.securityContext.runAsGroup | default 999 }}:rx {{ .Values.splunkPlatform.sendingQueue.persistentQueueEnabled.storagePath }}/agent/logs;
{{- end }}
{{- if (eq (include "splunk-otel-collector.persistentQueueEnabledMetrics" .) "true") }}
mkdir -p {{ .Values.splunkPlatform.sendingQueue.persistentQueueEnabled.storagePath }}/agent/metrics;
chown -Rv {{ $agent.securityContext.runAsUser | default 999 }}:{{ $agent.securityContext.runAsGroup | default 999 }} {{ .Values.splunkPlatform.sendingQueue.persistentQueueEnabled.storagePath }}/agent/metrics;
chmod -v g+rwxs {{ .Values.splunkPlatform.sendingQueue.persistentQueueEnabled.storagePath }}/agent/metrics;
setfacl -n -Rm d:m::rx,m::rx,d:g:{{ $agent.securityContext.runAsGroup | default 999 }}:rx,g:{{ $agent.securityContext.runAsGroup | default 999 }}:rx {{ .Values.splunkPlatform.sendingQueue.persistentQueueEnabled.storagePath }}/agent/metrics;
{{- end }}']
securityContext:
runAsUser: 0
Expand All @@ -178,6 +190,14 @@ spec:
- name: journaldlogs
mountPath: {{ .Values.logsCollection.journald.directory }}
{{- end }}
{{- if (eq (include "splunk-otel-collector.persistentQueueEnabledLogs" .) "true") }}
- name: persistent-queue-logs
mountPath: {{ .Values.splunkPlatform.sendingQueue.persistentQueueEnabled.storagePath }}/agent/logs
{{- end }}
{{- if (eq (include "splunk-otel-collector.persistentQueueEnabledMetrics" .) "true") }}
- name: persistent-queue-metrics
mountPath: {{ .Values.splunkPlatform.sendingQueue.persistentQueueEnabled.storagePath }}/agent/metrics
{{- end }}
{{- end }}
{{- end }}
{{- end }}
Expand Down Expand Up @@ -398,6 +418,14 @@ spec:
{{- end }}
- name: checkpoint
mountPath: {{ .Values.logsCollection.checkpointPath }}
{{- if (eq (include "splunk-otel-collector.persistentQueueEnabledLogs" .) "true") }}
- name: persistent-queue-logs
mountPath: {{ .Values.splunkPlatform.sendingQueue.persistentQueueEnabled.storagePath }}/agent/logs
{{- end }}
{{- if (eq (include "splunk-otel-collector.persistentQueueEnabledMetrics" .) "true") }}
- name: persistent-queue-metrics
mountPath: {{ .Values.splunkPlatform.sendingQueue.persistentQueueEnabled.storagePath }}/agent/metrics
{{- end }}
{{- if .Values.logsCollection.journald.enabled }}
- mountPath: {{.Values.logsCollection.journald.directory}}
name: journaldlogs
Expand Down Expand Up @@ -464,6 +492,18 @@ spec:
hostPath:
path: {{ .Values.logsCollection.checkpointPath }}
type: DirectoryOrCreate
{{- if (eq (include "splunk-otel-collector.persistentQueueEnabledLogs" .) "true") }}
- name: persistent-queue-logs
hostPath:
path: {{ .Values.splunkPlatform.sendingQueue.persistentQueueEnabled.storagePath }}/agent/logs
type: DirectoryOrCreate
{{- end }}
{{- if (eq (include "splunk-otel-collector.persistentQueueEnabledMetrics" .) "true") }}
- name: persistent-queue-metrics
hostPath:
path: {{ .Values.splunkPlatform.sendingQueue.persistentQueueEnabled.storagePath }}/agent/metrics
type: DirectoryOrCreate
{{- end }}
{{- if .Values.logsCollection.journald.enabled }}
- name: journaldlogs
hostPath:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,11 @@ spec:
app: {{ template "splunk-otel-collector.name" . }}
component: otel-k8s-cluster-receiver
release: {{ .Release.Name }}
{{- if (eq (include "splunk-otel-collector.persistentQueueEnabled" .) "true") }}
strategy:
rollingUpdate:
maxUnavailable: 1
{{- end }}
template:
metadata:
labels:
Expand Down Expand Up @@ -81,8 +86,9 @@ spec:
securityContext:
{{- include "splunk-otel-collector.securityContext" (dict "isWindows" .Values.isWindows "securityContext" $clusterReceiver.securityContext) | nindent 8 }}
{{- end }}
{{- if eq (include "splunk-otel-collector.distribution" .) "eks/fargate" }}
{{- if or (eq (include "splunk-otel-collector.distribution" .) "eks/fargate") (.Values.splunkPlatform.sendingQueue.persistentQueueEnabled.logs) (.Values.splunkPlatform.sendingQueue.persistentQueueEnabled.metrics) }}
VihasMakwana marked this conversation as resolved.
Show resolved Hide resolved
initContainers:
{{- if eq (include "splunk-otel-collector.distribution" .) "eks/fargate" }}
- name: cluster-receiver-node-discoverer
image: public.ecr.aws/amazonlinux/amazonlinux:latest
imagePullPolicy: IfNotPresent
Expand All @@ -103,6 +109,36 @@ spec:
mountPath: /splunk-messages
- mountPath: /conf
name: collector-configmap
{{- end }}
{{- if (eq (include "splunk-otel-collector.persistentQueueEnabled" .) "true") }}
- name: patch-log-dirs
image: {{ template "splunk-otel-collector.image.initPatchLogDirs" . }}
imagePullPolicy: {{ .Values.image.initPatchLogDirs.pullPolicy }}
command: ['sh', '-c', '
{{- if (eq (include "splunk-otel-collector.persistentQueueEnabledLogs" .) "true") }}
mkdir -p {{ .Values.splunkPlatform.sendingQueue.persistentQueueEnabled.storagePath }}/clusterReceiver/logs;
chown -Rv {{ $clusterReceiver.securityContext.runAsUser | default 999 }}:{{ $clusterReceiver.securityContext.runAsGroup | default 999 }} {{ .Values.splunkPlatform.sendingQueue.persistentQueueEnabled.storagePath }}/clusterReceiver/logs;
chmod -v g+rwxs {{ .Values.splunkPlatform.sendingQueue.persistentQueueEnabled.storagePath }}/clusterReceiver/logs;
setfacl -n -Rm d:m::rx,m::rx,d:g:{{ $clusterReceiver.securityContext.runAsGroup | default 999 }}:rx,g:{{ $clusterReceiver.securityContext.runAsGroup | default 999 }}:rx {{ .Values.splunkPlatform.sendingQueue.persistentQueueEnabled.storagePath }}/clusterReceiver/logs;
{{- end }}
{{- if (eq (include "splunk-otel-collector.persistentQueueEnabledMetrics" .) "true") }}
mkdir -p {{ .Values.splunkPlatform.sendingQueue.persistentQueueEnabled.storagePath }}/clusterReceiver/metrics;
chown -Rv {{ $clusterReceiver.securityContext.runAsUser | default 999 }}:{{ $clusterReceiver.securityContext.runAsGroup | default 999 }} {{ .Values.splunkPlatform.sendingQueue.persistentQueueEnabled.storagePath }}/clusterReceiver/metrics;
chmod -v g+rwxs {{ .Values.splunkPlatform.sendingQueue.persistentQueueEnabled.storagePath }}/clusterReceiver/metrics;
setfacl -n -Rm d:m::rx,m::rx,d:g:{{ $clusterReceiver.securityContext.runAsGroup | default 999 }}:rx,g:{{ $clusterReceiver.securityContext.runAsGroup | default 999 }}:rx {{ .Values.splunkPlatform.sendingQueue.persistentQueueEnabled.storagePath }}/clusterReceiver/metrics;
{{- end }}']
securityContext:
runAsUser: 0
volumeMounts:
{{- if (eq (include "splunk-otel-collector.persistentQueueEnabledMetrics" .) "true") }}
- name: persistent-queue-metrics
mountPath: {{ .Values.splunkPlatform.sendingQueue.persistentQueueEnabled.storagePath }}/clusterReceiver/metrics
{{- end }}
{{- if (eq (include "splunk-otel-collector.persistentQueueEnabledLogs" .) "true") }}
- name: persistent-queue-logs
mountPath: {{ .Values.splunkPlatform.sendingQueue.persistentQueueEnabled.storagePath }}/clusterReceiver/logs
{{- end }}
{{- end }}
{{- end }}
containers:
- name: otel-collector
Expand Down Expand Up @@ -194,6 +230,14 @@ spec:
- mountPath: /splunk-messages
name: messages
{{- end }}
{{- if (eq (include "splunk-otel-collector.persistentQueueEnabledMetrics" .) "true") }}
- mountPath: {{ .Values.splunkPlatform.sendingQueue.persistentQueueEnabled.storagePath }}/clusterReceiver/metrics
name: persistent-queue-metrics
{{- end }}
{{- if (eq (include "splunk-otel-collector.persistentQueueEnabledLogs" .) "true") }}
- mountPath: {{ .Values.splunkPlatform.sendingQueue.persistentQueueEnabled.storagePath }}/clusterReceiver/logs
name: persistent-queue-logs
{{- end }}
- mountPath: /usr/lib/splunk-otel-collector/agent-bundle/run/collectd
name: run-collectd
readOnly: false
Expand Down Expand Up @@ -227,6 +271,18 @@ spec:
- name: messages
emptyDir: {}
{{- end }}
{{- if (eq (include "splunk-otel-collector.persistentQueueEnabledMetrics" .) "true") }}
- name: persistent-queue-metrics
hostPath:
VihasMakwana marked this conversation as resolved.
Show resolved Hide resolved
path: {{ .Values.splunkPlatform.sendingQueue.persistentQueueEnabled.storagePath }}/clusterReceiver/metrics
type: DirectoryOrCreate
{{- end }}
{{- if (eq (include "splunk-otel-collector.persistentQueueEnabledLogs" .) "true") }}
- name: persistent-queue-logs
hostPath:
path: {{ .Values.splunkPlatform.sendingQueue.persistentQueueEnabled.storagePath }}/clusterReceiver/logs
type: DirectoryOrCreate
{{- end }}
{{- if $clusterReceiver.extraVolumes }}
{{- toYaml $clusterReceiver.extraVolumes | nindent 6 }}
{{- end }}
Expand Down
14 changes: 14 additions & 0 deletions helm-charts/splunk-otel-collector/values.schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,20 @@
},
"queueSize": {
"type": "integer"
},
"persistentQueueEnabled": {
"type": "object",
"properties": {
"logs": {
"type": "boolean"
},
"metrics": {
"type": "boolean"
},
"storagePath": {
"type": "string"
}
}
}
}
}
Expand Down
12 changes: 12 additions & 0 deletions helm-charts/splunk-otel-collector/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,18 @@ splunkPlatform:
# requests_per_second is the average number of requests per seconds.
queueSize: 5000

# When enabled, it uses file_storage extension to persist the queue data.
# This can be used as an alternative redundancy mechanism for data being exported.
VihasMakwana marked this conversation as resolved.
Show resolved Hide resolved
# NOTE: The File Storage extension will persist state to the node's local file system.
# While using the persistent queue it is advised to increase memory limit for agent (agent.resources.limits.memory)
# to 1Gi. Persistent queue is currently supported only for logs and metrics of the agent.
VihasMakwana marked this conversation as resolved.
Show resolved Hide resolved
persistentQueueEnabled:
VihasMakwana marked this conversation as resolved.
Show resolved Hide resolved
# Specifies whether to persist log data.
logs: false
# Specifies whether to persist metric data.
metrics: false
storagePath: "/var/addon/splunk/persist"
VihasMakwana marked this conversation as resolved.
Show resolved Hide resolved

################################################################################
# Splunk Observability configuration
################################################################################
Expand Down
Loading