Skip to content

feat(core): Add more Prometheus metrics (experimental) #5187

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 19, 2023

Conversation

csuermann
Copy link
Contributor

@csuermann csuermann commented Jan 18, 2023

This PR adds more configurable metrics in Prometheus exposition format to the /metrics endpoint.

Which metric groups and labels to expose can be configured via environment variables (to some extent).

Below example output of the /metrics endpoint was generated with the following env vars configured:

Environment variables:

N8N_METRICS=true
N8N_METRICS_PREFIX=n8n_
N8N_METRICS_INCLUDE_DEFAULT_METRICS=true
N8N_METRICS_INCLUDE_WORKFLOW_ID_LABEL=true
N8N_METRICS_INCLUDE_NODE_TYPE_LABEL=true
N8N_METRICS_INCLUDE_CREDENTIAL_TYPE_LABEL=true
N8N_METRICS_INCLUDE_API_ENDPOINTS=true
N8N_METRICS_INCLUDE_API_PATH_LABEL=true
N8N_METRICS_INCLUDE_API_METHOD_LABEL=true
N8N_METRICS_INCLUDE_API_STATUS_CODE_LABEL=true
N8N_METRICS_INCLUDE_API_METHOD_LABEL=true

Example output of GET /metrics

# HELP process_cpu_user_seconds_total Total user CPU time spent in seconds.
# TYPE process_cpu_user_seconds_total counter
process_cpu_user_seconds_total 1.611351

# HELP process_cpu_system_seconds_total Total system CPU time spent in seconds.
# TYPE process_cpu_system_seconds_total counter
process_cpu_system_seconds_total 0.261071

# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 1.872422

# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1674121593

# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 257355776

# HELP nodejs_eventloop_lag_seconds Lag of event loop in seconds.
# TYPE nodejs_eventloop_lag_seconds gauge
nodejs_eventloop_lag_seconds 0.005210362

# HELP nodejs_eventloop_lag_min_seconds The minimum recorded event loop delay.
# TYPE nodejs_eventloop_lag_min_seconds gauge
nodejs_eventloop_lag_min_seconds 0.00868352

# HELP nodejs_eventloop_lag_max_seconds The maximum recorded event loop delay.
# TYPE nodejs_eventloop_lag_max_seconds gauge
nodejs_eventloop_lag_max_seconds 0.460849151

# HELP nodejs_eventloop_lag_mean_seconds The mean of the recorded event loop delays.
# TYPE nodejs_eventloop_lag_mean_seconds gauge
nodejs_eventloop_lag_mean_seconds 0.011563190651685392

# HELP nodejs_eventloop_lag_stddev_seconds The standard deviation of the recorded event loop delays.
# TYPE nodejs_eventloop_lag_stddev_seconds gauge
nodejs_eventloop_lag_stddev_seconds 0.006139661652063662

# HELP nodejs_eventloop_lag_p50_seconds The 50th percentile of the recorded event loop delays.
# TYPE nodejs_eventloop_lag_p50_seconds gauge
nodejs_eventloop_lag_p50_seconds 0.012001279

# HELP nodejs_eventloop_lag_p90_seconds The 90th percentile of the recorded event loop delays.
# TYPE nodejs_eventloop_lag_p90_seconds gauge
nodejs_eventloop_lag_p90_seconds 0.012148735

# HELP nodejs_eventloop_lag_p99_seconds The 99th percentile of the recorded event loop delays.
# TYPE nodejs_eventloop_lag_p99_seconds gauge
nodejs_eventloop_lag_p99_seconds 0.012197887

# HELP nodejs_active_handles Number of active libuv handles grouped by handle type. Every handle type is C++ class name.
# TYPE nodejs_active_handles gauge
nodejs_active_handles{type="WriteStream"} 2
nodejs_active_handles{type="ReadStream"} 1
nodejs_active_handles{type="Server"} 1
nodejs_active_handles{type="Socket"} 2

# HELP nodejs_active_handles_total Total number of active handles.
# TYPE nodejs_active_handles_total gauge
nodejs_active_handles_total 6

# HELP nodejs_active_requests Number of active libuv requests grouped by request type. Every request type is C++ class name.
# TYPE nodejs_active_requests gauge

# HELP nodejs_active_requests_total Total number of active requests.
# TYPE nodejs_active_requests_total gauge
nodejs_active_requests_total 0

# HELP nodejs_heap_size_total_bytes Process heap size from Node.js in bytes.
# TYPE nodejs_heap_size_total_bytes gauge
nodejs_heap_size_total_bytes 137408512

# HELP nodejs_heap_size_used_bytes Process heap size used from Node.js in bytes.
# TYPE nodejs_heap_size_used_bytes gauge
nodejs_heap_size_used_bytes 129931048

# HELP nodejs_external_memory_bytes Node.js external memory size in bytes.
# TYPE nodejs_external_memory_bytes gauge
nodejs_external_memory_bytes 2207464

# HELP nodejs_heap_space_size_total_bytes Process heap space size total from Node.js in bytes.
# TYPE nodejs_heap_space_size_total_bytes gauge
nodejs_heap_space_size_total_bytes{space="read_only"} 176128
nodejs_heap_space_size_total_bytes{space="old"} 106844160
nodejs_heap_space_size_total_bytes{space="code"} 3776512
nodejs_heap_space_size_total_bytes{space="map"} 4202496
nodejs_heap_space_size_total_bytes{space="large_object"} 20779008
nodejs_heap_space_size_total_bytes{space="code_large_object"} 581632
nodejs_heap_space_size_total_bytes{space="new_large_object"} 0
nodejs_heap_space_size_total_bytes{space="new"} 1048576

# HELP nodejs_heap_space_size_used_bytes Process heap space size used from Node.js in bytes.
# TYPE nodejs_heap_space_size_used_bytes gauge
nodejs_heap_space_size_used_bytes{space="read_only"} 170944
nodejs_heap_space_size_used_bytes{space="old"} 101076768
nodejs_heap_space_size_used_bytes{space="code"} 3421824
nodejs_heap_space_size_used_bytes{space="map"} 3306456
nodejs_heap_space_size_used_bytes{space="large_object"} 20534376
nodejs_heap_space_size_used_bytes{space="code_large_object"} 542048
nodejs_heap_space_size_used_bytes{space="new_large_object"} 0
nodejs_heap_space_size_used_bytes{space="new"} 887696

# HELP nodejs_heap_space_size_available_bytes Process heap space size available from Node.js in bytes.
# TYPE nodejs_heap_space_size_available_bytes gauge
nodejs_heap_space_size_available_bytes{space="read_only"} 0
nodejs_heap_space_size_available_bytes{space="old"} 3857736
nodejs_heap_space_size_available_bytes{space="code"} 108928
nodejs_heap_space_size_available_bytes{space="map"} 821552
nodejs_heap_space_size_available_bytes{space="large_object"} 0
nodejs_heap_space_size_available_bytes{space="code_large_object"} 0
nodejs_heap_space_size_available_bytes{space="new_large_object"} 1031072
nodejs_heap_space_size_available_bytes{space="new"} 143376

# HELP nodejs_version_info Node.js version info.
# TYPE nodejs_version_info gauge
nodejs_version_info{version="v16.15.0",major="16",minor="15",patch="0"} 1

# HELP nodejs_gc_duration_seconds Garbage collection duration by kind, one of major, minor, incremental or weakcb.
# TYPE nodejs_gc_duration_seconds histogram
nodejs_gc_duration_seconds_bucket{le="0.001",kind="minor"} 8
nodejs_gc_duration_seconds_bucket{le="0.01",kind="minor"} 10
nodejs_gc_duration_seconds_bucket{le="0.1",kind="minor"} 10
nodejs_gc_duration_seconds_bucket{le="1",kind="minor"} 10
nodejs_gc_duration_seconds_bucket{le="2",kind="minor"} 10
nodejs_gc_duration_seconds_bucket{le="5",kind="minor"} 10
nodejs_gc_duration_seconds_bucket{le="+Inf",kind="minor"} 10
nodejs_gc_duration_seconds_sum{kind="minor"} 0.011158959000371397
nodejs_gc_duration_seconds_count{kind="minor"} 10
nodejs_gc_duration_seconds_bucket{le="0.001",kind="incremental"} 5
nodejs_gc_duration_seconds_bucket{le="0.01",kind="incremental"} 6
nodejs_gc_duration_seconds_bucket{le="0.1",kind="incremental"} 6
nodejs_gc_duration_seconds_bucket{le="1",kind="incremental"} 6
nodejs_gc_duration_seconds_bucket{le="2",kind="incremental"} 6
nodejs_gc_duration_seconds_bucket{le="5",kind="incremental"} 6
nodejs_gc_duration_seconds_bucket{le="+Inf",kind="incremental"} 6
nodejs_gc_duration_seconds_sum{kind="incremental"} 0.002568940999917686
nodejs_gc_duration_seconds_count{kind="incremental"} 6
nodejs_gc_duration_seconds_bucket{le="0.001",kind="major"} 0
nodejs_gc_duration_seconds_bucket{le="0.01",kind="major"} 1
nodejs_gc_duration_seconds_bucket{le="0.1",kind="major"} 3
nodejs_gc_duration_seconds_bucket{le="1",kind="major"} 3
nodejs_gc_duration_seconds_bucket{le="2",kind="major"} 3
nodejs_gc_duration_seconds_bucket{le="5",kind="major"} 3
nodejs_gc_duration_seconds_bucket{le="+Inf",kind="major"} 3
nodejs_gc_duration_seconds_sum{kind="major"} 0.03339293400011956
nodejs_gc_duration_seconds_count{kind="major"} 3
nodejs_gc_duration_seconds_bucket{le="0.001",kind="weakcb"} 4
nodejs_gc_duration_seconds_bucket{le="0.01",kind="weakcb"} 4
nodejs_gc_duration_seconds_bucket{le="0.1",kind="weakcb"} 4
nodejs_gc_duration_seconds_bucket{le="1",kind="weakcb"} 4
nodejs_gc_duration_seconds_bucket{le="2",kind="weakcb"} 4
nodejs_gc_duration_seconds_bucket{le="5",kind="weakcb"} 4
nodejs_gc_duration_seconds_bucket{le="+Inf",kind="weakcb"} 4
nodejs_gc_duration_seconds_sum{kind="weakcb"} 0.000053920999867841604
nodejs_gc_duration_seconds_count{kind="weakcb"} 4

# HELP n8n_version_info n8n version info.
# TYPE n8n_version_info gauge
n8n_version_info{version="0.211.1",major="0",minor="211",patch="1"} 1

# HELP http_request_duration_seconds duration histogram of http responses labeled with: status_code, method, path
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{le="0.003",status_code="401",method="POST",path="/api/v1/audit"} 0
http_request_duration_seconds_bucket{le="0.03",status_code="401",method="POST",path="/api/v1/audit"} 1
http_request_duration_seconds_bucket{le="0.1",status_code="401",method="POST",path="/api/v1/audit"} 1
http_request_duration_seconds_bucket{le="0.3",status_code="401",method="POST",path="/api/v1/audit"} 1
http_request_duration_seconds_bucket{le="1.5",status_code="401",method="POST",path="/api/v1/audit"} 1
http_request_duration_seconds_bucket{le="10",status_code="401",method="POST",path="/api/v1/audit"} 1
http_request_duration_seconds_bucket{le="+Inf",status_code="401",method="POST",path="/api/v1/audit"} 1
http_request_duration_seconds_sum{status_code="401",method="POST",path="/api/v1/audit"} 0.009538856
http_request_duration_seconds_count{status_code="401",method="POST",path="/api/v1/audit"} 1
http_request_duration_seconds_bucket{le="0.003",status_code="401",method="POST",path="/api/v1/workflows"} 1
http_request_duration_seconds_bucket{le="0.03",status_code="401",method="POST",path="/api/v1/workflows"} 1
http_request_duration_seconds_bucket{le="0.1",status_code="401",method="POST",path="/api/v1/workflows"} 1
http_request_duration_seconds_bucket{le="0.3",status_code="401",method="POST",path="/api/v1/workflows"} 1
http_request_duration_seconds_bucket{le="1.5",status_code="401",method="POST",path="/api/v1/workflows"} 1
http_request_duration_seconds_bucket{le="10",status_code="401",method="POST",path="/api/v1/workflows"} 1
http_request_duration_seconds_bucket{le="+Inf",status_code="401",method="POST",path="/api/v1/workflows"} 1
http_request_duration_seconds_sum{status_code="401",method="POST",path="/api/v1/workflows"} 0.001645051
http_request_duration_seconds_count{status_code="401",method="POST",path="/api/v1/workflows"} 1
http_request_duration_seconds_bucket{le="0.003",status_code="200",method="POST",path="/rest/workflows/run"} 0
http_request_duration_seconds_bucket{le="0.03",status_code="200",method="POST",path="/rest/workflows/run"} 1
http_request_duration_seconds_bucket{le="0.1",status_code="200",method="POST",path="/rest/workflows/run"} 2
http_request_duration_seconds_bucket{le="0.3",status_code="200",method="POST",path="/rest/workflows/run"} 2
http_request_duration_seconds_bucket{le="1.5",status_code="200",method="POST",path="/rest/workflows/run"} 2
http_request_duration_seconds_bucket{le="10",status_code="200",method="POST",path="/rest/workflows/run"} 2
http_request_duration_seconds_bucket{le="+Inf",status_code="200",method="POST",path="/rest/workflows/run"} 2
http_request_duration_seconds_sum{status_code="200",method="POST",path="/rest/workflows/run"} 0.071720752
http_request_duration_seconds_count{status_code="200",method="POST",path="/rest/workflows/run"} 2

# HELP n8n_workflow_started_total Total number of n8n.workflow.started events.
# TYPE n8n_workflow_started_total counter
n8n_workflow_started_total{workflow_id="1"} 2

# HELP n8n_node_started_total Total number of n8n.node.started events.
# TYPE n8n_node_started_total counter
n8n_node_started_total{node_type="base_start"} 2
n8n_node_started_total{node_type="base_set"} 2
n8n_node_started_total{node_type="base_code"} 2

# HELP n8n_node_finished_total Total number of n8n.node.finished events.
# TYPE n8n_node_finished_total counter
n8n_node_finished_total{node_type="base_start"} 2
n8n_node_finished_total{node_type="base_set"} 2
n8n_node_finished_total{node_type="base_code"} 2

# HELP n8n_workflow_success_total Total number of n8n.workflow.success events.
# TYPE n8n_workflow_success_total counter
n8n_workflow_success_total{workflow_id="1"} 2

@n8n-assistant n8n-assistant bot added core Enhancement outside /nodes-base and /editor-ui n8n team Authored by the n8n team labels Jan 18, 2023
@csuermann csuermann changed the title refactor(core): Add Prometheus labels to relevant metrics feat(core): Add more Prometheus metrics (experimental) Jan 19, 2023
@csuermann csuermann merged commit c8f820b into ENG-25-adds-Prometheus-event-counter Jan 19, 2023
@csuermann csuermann deleted the ENG-25-refactor branch January 19, 2023 10:43
csuermann added a commit that referenced this pull request Jan 19, 2023
… (experimental) (#5177)

* create prometheus metrics from events

* feat(core): Add more Prometheus metrics (experimental) (#5187)

* refactor(core): Add Prometheus labels to relevant metrics

* feat(core): Add more Prometheus metrics (experimental)

* add 'v' prefix to value of version label

Co-authored-by: Cornelius Suermann <cornelius@n8n.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Enhancement outside /nodes-base and /editor-ui n8n team Authored by the n8n team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants