This project has moved to: https://github.com/StephenOTT/Camunda-Monitoring
Camunda Process Engine Plugin that implements a Prometheus Client HTTP Server, Collectors for the Camunda Metric system, and a Groovy based custom collector system allowing yml based configuration of custom collectors that are based on groovy scripts.
See the usage in ./docker/Dockerfile
Add JitPack as a repository source in your build file.
If you are using Maven, then add the following to your pom.xml
<project>
...
<repositories>
<repository>
<id>jitpack.io</id>
<url>https://jitpack.io</url>
</repository>
</repositories>
...
This snippet will enable Maven dependency download directly from Github.com
Then add the following dependency:
...
<dependency>
<groupId>com.github.StephenOTT</groupId>
<artifactId>camunda-prometheus-process-engine-plugin</artifactId>
<version>v0.0.0-Replace-This-With-Real-Version</version>
</dependency>
...
❗ See the Releases for the latest version number ❗
<!-- engine plugins -->
<property name="processEnginePlugins">
<list>
...
<bean id="prometheusPlugin" class="io.digitalstate.camunda.prometheus.PrometheusProcessEnginePlugin">
<property name="port" value="9999" />
<property name="camundaReportingIntervalInSeconds" value="5"/>
<property name="collectorYmlFilePath" value="/camunda-prometheus/prometheus-metrics.yml"</property>
</bean>
...
</list>
</property>
The port is the port that the HTTP Server that Prometheus will use to access the metrics.
Each collector is configured through a single yaml file. the Yaml file is set in the PLugins xml configuation such as:
<bean id="prometheusPlugin" class="io.digitalstate.camunda.prometheus.PrometheusProcessEnginePlugin">
...
<property name="collectorYmlFilePath" value="/camunda-prometheus/prometheus-metrics.yml"</property>
</bean>
The System collectors use the following configuration
system:
- collector: io.digitalstate.camunda.prometheus.collectors.camunda.BpmnExecution
enable: true
startDate: 2015-10-03T17:59:38+00:00
endDate: now
startDelay: 0
frequency: 5000
Where:
collector
(class name) is the fully qualified java class name of the collector. The default Camunda metrics are implemented in the following classes:io.digitalstate.camunda.prometheus.collectors.camunda.BpmnExecution
,io.digitalstate.camunda.prometheus.collectors.camunda.DmnExecution
,io.digitalstate.camunda.prometheus.collectors.camunda.JobExecutor
.enable
(boolean) allows you to enable and disable the collector.startDate
(ISO8601 String Date Format) start date that the metrics should be queried from using the Camunda metrics java api.endDate
(ISO8601 String Date Format, and special keyword ofnow
) end date that the metrics should be queried from using the Camunda metrics java api. The end date allows the special keyword ofnow
which tells the engine to always return the most recent data.startDelay
(long) the amount of time in milliseconds that the collector delay it self from starting on the first startup. This is useful if you have other plugins or systems that you want to wait to finish starting before you execute the collector for the first time.frequency
(long) the amount of time in milliseconds between executions of the collector.
The Custom collectors use the following configuration
custom:
- collector: /customcollectors/UserTasks.groovy
enable: true
startDelay: 0
frequency: 5000
custom:
- collector: classpath:/prometheus/customcollectors/IncidentMetrics.groovy
enable: true
startDelay: 0
frequency: 5000
Where:
collector
(string, file path) is the file path to the groovy script file which will be used for collector execution. The script is pre-compiled on timer creation; if changes are made to the groovy during runtime, the engine will need to be restarted for changes to come into effect. If the path is prepended withclasspath:
then the collector location will assume a classpath path to the resource.enable
(boolean) allows you to enable and disable the collector.startDelay
(long) the amount of time in milliseconds that the collector delay it self from starting on the first startup. This is useful if you have other plugins or systems that you want to wait to finish starting before you execute the collector for the first time.frequency
(long) the amount of time in milliseconds between executions of the collector.config
(object/map) (:exclamation: EXPERIMENTAL) a key/value (<String, Object>
) map for storing custom configurations to be used in the script execution. Config can be accessed in the groovy script execution using theconfig.getConfig()
method, whereconfig
is the CustomMetricsConfig.class which is being exposed to the script through bindings, and.getConfig()
is returning the map in theconfig
property of the collector.
A series of custom collectors are included.
See: ./src/main/resources/prometheus/customcollectors
# Camunda Prometheus Metrics configuration
# each object is a timer configuration
---
system:
- collector: io.digitalstate.camunda.prometheus.collectors.camunda.BpmnExecution
enable: true
startDate: 2015-10-03T17:59:38+00:00
endDate: now
startDelay: 0
frequency: 5000
- collector: io.digitalstate.camunda.prometheus.collectors.camunda.DmnExecution
enable: true
startDate: 2015-10-03T17:59:38+00:00
endDate: now
startDelay: 0
frequency: 5000
- collector: io.digitalstate.camunda.prometheus.collectors.camunda.JobExecutor
enable: true
startDate: 2015-10-03T17:59:38+00:00
endDate: now
startDelay: 0
frequency: 5000
custom:
- collector: /customcollectors/UserTasks.groovy
enable: true
startDelay: 0
frequency: 5000
- collector: /customcollectors/BpmnProcessDefinition.groovy
enable: true
startDelay: 0
frequency: 5000
- collector: /customcollectors/EventsMetrics.groovy
enable: true
startDelay: 0
frequency: 5000
- collector: classpath:/customcollectors/IdentityServiceMetrics.groovy
enable: true
startDelay: 0
frequency: 5000
- collector: classpath:/customcollectors/IncidentMetrics.groovy
enable: true
startDelay: 0
frequency: 5000
- collector: /customcollectors/ProcessInstances.groovy
enable: true
startDelay: 0
frequency: 5000
- collector: /customcollectors/TimerMetrics.groovy
enable: true
startDelay: 0
frequency: 5000
- Prometheus/Grafana Setup:
./docker/prometheus-grafana
: runUSERNAME=admin PASSWORD=admin docker-compose up
- Camunda 7.9.0 Setup:
./docker
: rundocker-compose up
Prometheus will attempt to scape the Camunda metrics through the exposed endpoint from the plugin.
Make sure that Camunda and Prometheus are part of the same network / Prometheus is able to access the metrics http endpoint being exposed on the Camunda server.
See the examples in the ./docker folder of this project.
A default Grafana dashboard with common queries is provided:
See folder: ./grafana/dashboards
- Current working template for Generic Metrics:
Camunda Metrics-1.json
- Current working template for Process and Activity Duration Tracking:
Camunda Duration Tracking-1.json
The scripts execute without any class restrictions, and provide the following bindings for easy access:
config
(package io.digitalstate.camunda.prometheus.config.yaml.CustomMetricsConfig) the CustomMetricsConfig object containing all data from the yaml config of the specific collector.processEngine
(org.camunda.bpm.engine.ProcessEngine) contains the process engine object allowing full access to process engine services.LOGGER
(org.slf4j.Logger) a logger to be specifically used by script executions. Implemented as:LoggerFactory.getLogger("CamundaCustomMetrics-ScriptLOGGER");
Take into consideration the execution times of your metric collectors. Each metric collector is run as a standalone timer thread execution, but the more collectors you add, and the large the data processing and/or database query time/load the collector uses per execution, it can create large performance impacts on the engine.
Simple but reusable metrics are provided for ease of use by BPMN process builders.
- SimpleGaugeMetric (io.digitalstate.camunda.prometheus.collectors.SimpleGaugeMetric)
- SimpleCounterMetric (io.digitalstate.camunda.prometheus.collectors.SimpleCounterMetric)
- SimpleHistogramMetric (io.digitalstate.camunda.prometheus.collectors.SimpleHistorgramMetric)
- SimpleSummaryMetric (io.digitalstate.camunda.prometheus.collectors.SimpleSummaryMetric)
See the Test folder for further usage, and see the metric classes.
They are generally simplifications over the existing metrics API. They are designed to remove "extras" and simplify usage.
the camunda
namespace is given to all metrics generated by the Simple Metric classes.
The Default registry is used.
Labels are supported and are generally implemented as a optional parameter in the method. See examples above.
Groovy script Examples:
import io.digitalstate.camunda.prometheus.collectors.SimpleGaugeMetric;
def openCases = new SimpleGaugeMetric('open_cases', 'Number of Open Cases, labeled by Case Type', ['type'])
openCases.increment(['standard'])
import io.digitalstate.camunda.prometheus.collectors.SimpleHistogramMetric
def httpRequest = new SimpleHistogramMetric('legacy_system_123_request', 'Connection duration time, labeled by HTTP Method', null, ['method'])
httpRequest.startTimer(['POST'])
sleep(Math.abs(new Random().nextInt() % 5000) + 650) // Simulates a delay
httpRequest.observeDuration()
import io.digitalstate.camunda.prometheus.collectors.SimpleGaugeMetric
def money = new SimpleGaugeMetric('money_collected', 'dollar values collected, labeled by form of payment', ['payment_form'])
def amount = Math.abs(new Random().nextDouble() % 284.03) + 23.54 // Random dollar value
money.increment(amount, ['credit-card'])
import io.digitalstate.camunda.prometheus.collectors.SimpleGaugeMetric
def openCases = new SimpleGaugeMetric('open_cases')
openCases.decrement(['standard'])
def closedCases = new SimpleGaugeMetric('closed_cases', 'Number of Open Cases, labeled by Case Type', ['type'])
closedCases.increment(['standard'])
Notes:
- All default metrics are configured through the plugin properties of
pollingFrequencyMills
andpollingStartDelayMills
. - All default metrics use a
engine_name
label which is used to identity the unique engine collecting the metrics.
There is two default metrics that are loaded:
All custom metrics as defined in the Camunda Metrics documentation are implemented:
LINK to Camunda Metrics Docs are located here.
Metric names follow the pattern of:
metric_[metric name using underscores]
Example: Using the Camunda metric activity-instance-start
, the metric would be created as
metric_activity_instance_start
, and would appear in Prometheus / Grafana as camunda_metric_activity_instance_start
,
where the camunda_
is the namespace of the metric
Reusable/configurable groovy scripts are provided within the classpath of the jar for common usage cases:
You can use these scripts in production, or as a basis for your own re-usable script for a different metric within the engine.
Tracking external tasks can be configured for "per-workerId" and "per-topicName":
custom:
- collector: classpath:prometheus/customcollectors/ExternalTasksCustomTopics.groovy
enable: true
startDelay: 0
frequency: 10000
config:
topics:
- myTopic1
- myTopic2
- someCustomTopic
- collector: classpath:prometheus/customcollectors/ExternalTasksCustomTopics.groovy
enable: true
startDelay: 0
frequency: 8000
config:
topics:
- myTopic3
- myTopic4
- someOtherCustomTopic
- collector: classpath:prometheus/customcollectors/ExternalTasksCustomWorkers.groovy
enable: true
startDelay: 0
frequency: 2000
config:
workers:
- someWorkerID123
- someOtherWorkerId567890
- someLegacyWorker001
Take note of the following:
- The use of the
config
property allowing you to define a array oftopics
orworkers
. - The same collector .groovy file is used multiple times: this allows you to run the script under different timer configurations: this is often needed for use-cases where not-call workers or topics need to be queried frequently due to low usage.
Gathers Activity instance counts (finished and active) for each activity for a process definition ID.
custom:
- collector: classpath:prometheus/customcollectors/HistoricActivityStatisticsPerProcessDefinition.groovy
enable: true
startDelay: 0
frequency: 5000
config:
processDefinitionKeys:
- myTestProcess
The same collector .groovy file can be used multiple times: this allows you to run the script under different timer configurations for different process definition keys: this is often needed for use-cases where not-call processes need to be queried frequently due to low usage.
The ability to track Instance Durations using Prometheus Histograms for Activity Duration and Process Duration.
Use of Duration Tracking is handled through a Transaction Listener that executes once the Transaction has
Committed into the database and thus the Duration of the activity or process instance has become calculated and "confirmed".
The cached value is used during the duration lookup to ensure speed and thus no additional DB queries are required/used.
To enable the Instance Duration Tracking, the Parse Listener must be activated. The Parse Listener will Parse all relevant BPMN Activities and Processes during BPMN Deployment, and add a End-Listener that will add a Transaction Listener to collect the specific duration once the data has been confirmed as Committed..
In the plugin xml set the bpmnDurationParseListener
property to "true"
.
...
<bean id="prometheusPlugin" class="io.digitalstate.camunda.prometheus.PrometheusProcessEnginePlugin">
<property name="port" value="9999" />
<property name="camundaReportingIntervalInSeconds" value="5"/>
<property name="collectorYmlFilePath" value="src/test/resources/prometheus-metrics.yml"/>
<property name="bpmnDurationParseListener" value="true"/>
</bean>
...
Once the parse listener is active, you can configure the YAML and BPMN.
❗ Duration Reporting is not subject to the camundaReportingIntervalInSeconds
property.
Activity Durations are reported in real-time/as they are observed/collected.
In the yaml file (as defined in the collectorYmlFilePath
property of the plugin's xml configuration),
you can add a durationTracking
section:
...
durationTracking:
activity_instance_duration:
help: "Core activity instance duration tracking. Used to track all activity instances."
buckets: [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 2.0, 3.0]
process_instance_duration:
help: "The generic process instance duration buckets"
buckets: [1, 5, 10, 15, 20, 30, 60, 120, 300, 600, 1200, 2400]
mycustom_metric_duration:
help: "Some custom metric i am tracking"
buckets: [1, 5, 10, 20, 50]
some_userTask_metric:
help: "Tracking the duration of specific user tasks: 1m, 2m, 3m, 4m, 5m, 10m, 15m, 30m, 60m, 8h, 24h."
buckets: [60, 120, 180, 240, 300, 600, 900, 1800, 3600, 28800, 86400]
...
The activity_instance_duration
is used by the "core" activity duration tracker.
If the global activity duration tracker is activated at the BPMN level, then it will look for the
activity_instance_duration
object, and the deployment will fail to parse if the object is not found.
You can fully configure the help and bucket properties as needed, but the activity_instance_duration
key is required if you are using Process Wide Activity Tracking.
The process_instance_duration
is used as the fallback default process instance duration tracker.
If a process instance duration tracker is active on a BPMN and does not have a metric name defined then it will
default to process_instance_duration
.
You can fully configure the help and bucket properties as needed, but the process_instance_duration
key is required
if you are using Process Instance Duration tracking without defining metric names in the tracker parameters.
Duration tracking configuration is set with 3 configurations:
- Histogram Name (the key of the object. See Prometheus metric naming rules)
- Help Text (the
help
property) - List of buckets to use in the Histogram (the
buckets
property. Each bucket represent number of seconds in<double>
.
For factions of seconds (milliseconds) you an use decimals such as 0.1 (100 milliseconds), 0.01 (10 milliseconds), 0.001 (1 millisecond))
Note that histogram names will have their names "cleaned" or "invalid characters" before being sent to Prometheus.
This means that you can Configure your YAML and BPMN with something like "my-metric-name", but when the metric is
created and sent to prometheus, any invalid characters will be replaced with a underscore (-
).
Once the YAML has been configured, you can access these configurations within the BPMN.
Multiple levels of duration tracking can be activated:
- BPMN Wide Activity Durations
- Activity Definition Specific
- BPMN Process Instance Duration
BPMN Wide duration tracking is managed through a configuration at the BPMN level.
Example:
prometheus.track:{type:'activity-duration', metric:'activity_instance_duration'}
This value is placed into the "Element Documentation" field of the BPMN Process. The value is space-sensitive, and thus must be exactly as the example.
❗ The Element Documentation
field is used due to limitations in the BPMN Parse Listener of the Camunda Engine.
The BPMN extension properties are not available on a per Activity Element parsing basis, but the built in Properties
of the Process Definition are. Thus the BPMN's "documentation" property is available across all BPMN Elements.
Activating this form of tracking will track duration across all relevant BPMN Activities.
❗ the appendPdId=true
property (which is supported in Activity Definition Specific) is not currently supported for process wide tracking, but will be added soon.
In the mean time, you can still track the Process Definition Id through the metric's label process_definition_id
Activity Definition specific tracking is the specific selection of BPMN activities to track with the Duration tracker.
On a specific BPMN element you can use the Camunda Extension Properties to add:
Name: prometheus.track
Value: {type:'activity-duration', metric:'mycustom_metric_duration'}
A alternate value can be used of {type:'activity-duration', metric:'mycustom_metric_duration', appendPdId=true}
.
the appendPdId
property means "append the Process Definition Id to the Metric Name". If this value is true, then in the
case where the metric name is mycustom_metric_duration
, the resulting name would be mycustom_metric_duration_someProcessDefinitionId
,
BPMN Process Instance Duration tracking is managed through a configuration at the BPMN level.
Example: Camunda Extension Property at the BPMN level:
Name: prometheus.track
Value: {type:'process-duration'}
or {type:'process-duration', appendPdId=true}
, or {type:'process-duration', appendPdId=true, metric:'someMetricName'}
This value is placed into the "Extension Properties" at the BPMN level.
Only the type
parameter is required. appendPdId
and metric
are optional. If metric
is omitted then the metric
name will default to process_instance_duration
, and you will be required to have this durationTracking config in your Yaml file.
You can have multiple instances of this property in the extension properties, allowing you to track process instance durations against multiple metrics configured in the Yaml file.
Note that using appendPdId=true
is typically for advanced / special-case usage, as the Process Definition ID is
already appended as a label and available for filtering within Prometheus.
For activity duration tracking, the follow labels are applied:
engine_name
: the name of the engineelement_type
: the element type such as startEvent, userTask, endEvent, etc.process_definition_id
: the specific process definition Idactivity_id
: the activity Id (not the activity instance id)deployment_id
: the Deployment Id of the process definitionprocess_definition_version
: the version number of the process definition assigned at time of deploymentprocess_definition_version_tag
: the version tag value of the process definition. If no tag was provided in the BPMN, the value defaults to a empty string (""
)
For process duration tracking the following labels are applied:
engine_name
: the name of the engineprocess_definition_id
: the specific process definition Iddeployment_id
: the Deployment Id of the process definitionprocess_definition_version
: the version number of the process definition assigned at time of deploymentprocess_definition_version_tag
: the version tag value of the process definition. If no tag was provided in the BPMN, the value defaults to a empty string (""
)
Additional labels are not current configurable through the BPMN or YAML.
Metrics defined in the YAML file under the durationTracking
section are only initialized as a Histogram metric once they are used for the first time.
This means that if duration metrics are being collected on a process definition that has never run, then the metrics will not be reporting anything.
Further, if a specific Activity has never executed, then it will not appear in the metrics until it has executed for the first time.
Prometheus Histograms have specific usage that should be understood before making assumptions on how to read the results.
Specifically look at:
- https://www.robustperception.io/why-are-prometheus-histograms-cumulative
- https://prometheus.io/docs/concepts/metric_types/#histogram
- https://prometheus.io/docs/practices/histograms/
- http://linuxczar.net/blog/2017/06/15/prometheus-histogram-2/
It is most important to understand that Prometheus Histograms are "Cumulative". See the links above for further details.
Note that it is possible to have multiple duration tracking.
- It is possible to have the BPMN wide tracking active, and add specific Activity duration trackers.
- It is possible to have the BPMN Wide tracking active, and also activate the same tracker on a per activity basis. This is generally considered a error on the configuration side. It can also have impacts for performance, so pay attention!
- It is possible to track the duration of Process Instances against multiple metrics
- DONE
Ability to enable BPMN Wide Tracking, but disable the tracker on specific activities - Ability to enable other types of tracking using Camunda Extension Properties.
- Ability to Enable/Disable specific trackers using a boolean rather than having to remove the extension property.
- Have a Idea? Please post in the Issue Queue!!!
This plugin contains the ability to generate Grafana Annotations using the Grafana REST API.
When a BPMN is deployed and parsed, a Parse Listener is in place allowing for a Grafana Anotation to be created upon successful deployment of the BPMN process.
<bean id="prometheusPlugin" class="io.digitalstate.camunda.prometheus.PrometheusProcessEnginePlugin">
<property name="port" value="9999" />
<property name="camundaReportingIntervalInSeconds" value="5"/>
<property name="collectorYmlFilePath" value="src/test/resources/prometheus-metrics.yml"/>
<property name="bpmnDurationParseListener" value="true"/>
<property name="grafanaAnnotationReporting" value="true"/>
<property name="grafanaServer" value="http://localhost:3000"/>
<property name="grafanaAuthTokenPath" value="./target/test-classes/grafana-token.txt"/>
</bean>
grafanaAnnotationReporting
: Boolean to enable or disable Annotation reporting.grafanaServer
: URI of Grafana Server for Annotation usage. Defaults tohttp://localhost:3000
.grafanaBasicAuthTokenPath
: File System Path to a text based file that contains the Bearer Token.
Upon successful deployment of a BPMN process, the plugin will perform a Grafana Annotation creation to the configured Grafana Server.
Key
: The process definition keyId
: The process definition idVersion
: The process definition version (not to be confused with Version Tag)
camunda
: fixed string to indicate this is a camunda related annotation.bpmn
: fixed string to indicate this is a bpmn related annotationdeployment
: fixed string to indicate this is a deployment related annoationengine:[engineName]
: dynamic string where[engineName]
will be the specific engine name configured in the Prociess Engine Configuration.processDefKey:[processDefinitionKey]
: dynamic string where[processDefinitionKey]
is the specific key defined in the BPMN xml that is deployed.
Grafana has Authentication enabled by default; as a result you must generate a Auth token with at least the editor
role level in order for the Grafana Annotation HTTP API to be used by the Camunda Prometheus Metrics plugin for generating a Grafana Annotation.
In order to setup a Auth Token follow the steps in the UI of Grafana as per: http://docs.grafana.org/http_api/auth/#create-api-token. Copy the generated token into a text file and point the grafanaAuthTokenPath
plugin configuration value to that path.
./mvnw clean package
./mvnw clean test
-
Rate of Process Start Per Process Def (Per Hour):
3600 * rate(sum(processInstanceStartCount{processDefKey="someKey"[1h]}))
-
New Users Per Day / Week / Month / Year