🚧 The profiler feature is experimental.
The splunk-otel-java
agent includes a continuous profiler that can be enabled with a configuration
setting. This profiler periodically captures the call stack state for all JVM threads and
sends these to the Splunk Observability Cloud. You can then view a flamegraph of application
call stacks and inspect individual code-level call stacks for relevant traces.
The profiler leverages the JVM's JDK Flight Recorder (JFR) to perform periodic call stack sampling. For every recording period, a new JFR recording file is flushed to disk and events are replayed.
In order to associate jdk.ThreadDump
events from JFR with OpenTelemetry spans, a custom
OpenTelemetry ContextStorage
implementation is used to emit otel.ContextAttached
events each time a span context change occurs. These context changes are then tracked
during replay so that a call stack can be associated with the appropriate span.
Stack trace data is embedded as a string inside of an OTLP logs payload. The Splunk OpenTelemetry Connector will detect this profiling data inside of OTLP logs and will help it along its ingest path.
- Java 8 above 8u262, or Java 11+
- 100 MB of free disk space
- Splunk OpenTelemetry Connector version 0.33.1 or higher. Sending profiling data directly to ingest is not supported at this time.
- Profiler is enabled at startup (disabled by default, see the Configuration section)
To enable the profiler, pass the -Dsplunk.profiler.enabled=true
argument to your JVM or set the SPLUNK_PROFILER_ENABLED
environment variable to your JVM process.
Like all other OpenTelemetry Java configurations, the following settings can be used as a system property or as environment variables (by making them uppercase and replacing dots with underscores).
We strongly recommend using defaults for the following settings.
Setting | Default | Description |
---|---|---|
splunk.profiler.enabled |
false | set to true to enable the profiler |
splunk.profiler.directory |
"." | location of jfr files |
splunk.profiler.recording.duration |
20s | recording unit duration |
splunk.profiler.keep-files |
false | leave JFR files on disk id true |
splunk.profiler.logs-endpoint |
otel.exporter.otlp.endpoint or http://localhost:4317 |
where to send OTLP logs |
splunk.profiler.call.stack.interval |
10000ms | how often to sample call stacks |
splunk.profiler.memory.enabled |
false | set to true to enable all other memory profiling options unless explicitly disabled |
splunk.profiler.tlab.enabled |
splunk.profiler.memory.enabled |
set to true to enable TLAB events even if splunk.profiler.memory.enabled is false |
splunk.profiler.memory.sampler.interval |
1 | set to 2 or larger to enable sampling every Nth allocation event where N is the value of this property |
splunk.profiler.include.internal.stacks |
false | set to true to include stack traces of agent internal threads and stack traces with only JDK internal frames |
splunk.profiler.tracing.stacks.only |
false | set to true to include only stack traces that are linked to a span context |
splunk.profiler.period.{eventName} |
n/a | DEPRECATED. Use splunk.profiler.call.stack.interval instead. |
If the splunk.profiler.enabled
option is not enabled, all profiling features are disabled. For
example, setting splunk.profiler.memory.enabled
to true
has no effect if
splunk.profiler.enabled
is set to false
. Similarly, there is no separate toggle for periodic
collection of call stacks (thread dumps), as this feature is enabled only when the profiler is
enabled.
The profiler limits its own behavior under two conditions:
- Free disk space is too low (< 100MB free)
- More than 5 minutes worth of JFR files are backed up on disk
If a recording is already running when this condition is met, the recording stops. When the conditions are no longer applicable, the recording and the profiler resume operation.
At startup, the agent will log the string "JFR profiler is active" at INFO
. You can grep for this in your logs to see
something like this:
[otel.javaagent 2021-09-28 18:17:04:246 +0000] [main] INFO com.splunk.opentelemetry.profiler.JfrActivator - JFR profiler is active.
The agent logs the profiling configuration at INFO
during startup. You can grep for the string
com.splunk.opentelemetry.profiler.ConfigurationLogger
to see entries like the following:
[otel.javaagent 2021-09-28 18:17:04:237 +0000] [main] INFO <snip> - -----------------------
[otel.javaagent 2021-09-28 18:17:04:237 +0000] [main] INFO <snip> - Profiler configuration:
[otel.javaagent 2021-09-28 18:17:04:238 +0000] [main] INFO <snip> - splunk.profiler.enabled : true
[otel.javaagent 2021-09-28 18:17:04:239 +0000] [main] INFO <snip> - splunk.profiler.directory : .
[otel.javaagent 2021-09-28 18:17:04:244 +0000] [main] INFO <snip> - splunk.profiler.recording.duration : 20s
[otel.javaagent 2021-09-28 18:17:04:244 +0000] [main] INFO <snip> - splunk.profiler.keep-files : false
[otel.javaagent 2021-09-28 18:17:04:245 +0000] [main] INFO <snip> - splunk.profiler.logs-endpoint : null
[otel.javaagent 2021-09-28 18:17:04:245 +0000] [main] INFO <snip> - otel.exporter.otlp.endpoint : http://collector:4317
[otel.javaagent 2021-09-28 18:17:04:245 +0000] [main] INFO <snip> - splunk.profiler.tlab.enabled : false
[otel.javaagent 2021-09-28 18:17:04:246 +0000] [main] INFO <snip> - splunk.profiler.period.jdk.threaddump : null
[otel.javaagent 2021-09-28 18:17:04:246 +0000] [main] INFO <snip> - -----------------------
If the escape hatch becomes active, it will log with com.splunk.opentelemetry.profiler.RecordingEscapeHatch
(you can grep for this in the logs). You may also look for "** THIS WILL RESULT IN LOSS OF PROFILING DATA **"
as a big hint that things are not well.
You may need to free up some disk space and/or give the JVM more resources.
If your JVM does not support JFR, the profiler logs a warning at startup with the
message JDK Flight Recorder (JFR) is not available in this JVM. Profiling is disabled
.
If you want to use the profiler and see this in your logs, you must upgrade
your JVM version to 8u262 or later.
Collector configuration issues may prevent logs from being exported and profiling data from showing in Splunk Observability Cloud.
Check for the following common issues:
- Look at the values of the agent's configuration:
splunk.profiler.logs-endpoint
andotel.exporter.otlp.endpoint
. Hint: they are logged at startup (see above). - Verify that a collector is actually running at that endpoint and that the application host/container can resolve any hostnames and actually connect to the given OTLP port (default: 4317)
- Make sure you are running the Splunk OpenTelemetry Connector and that the version is 0.33.1 or greater. Other collector distributions may not be able to route the log data containing profiles correctly.
- Make sure that the collector is configured correctly to handle profiling data. By default, the
Splunk OpenTelemetry Connector handles this, but
a custom configuration might have overridden some settings. Make sure that an OTLP receiver is configured in the collector
and that an exporter is configured for
splunk_hec
export. Ensure that thetoken
andendpoint
fields are correctly configured. Lastly, double check that the logs pipeline is configured to use the OTLP receiver andsplunk_hec
exporter.
Not yet. Some JVM internal threads are automatically omitted, but there is no user-serviceable mechanism for omitting or filtering threads. If you need this, file an issue.