Metrics #18

novoj · 2023-02-24T22:11:52Z

Introduce metrics into the evitaDB. The servlet for metric should start as separate API on different port (or part of a system API). Although we are used to Prometheus API, we should analyze different options - namely Open Telemetry.

Metrics proposals

consider removing metrics type from name
add tracing to evitaDB core on more places and not only QueryPlan

System metrics

Storage metrics

Transactions

Storage

Per collection

Per catalog

Per instance

Engine metrics

Queries

Per instance

query process time (tag: catalog, collection) - ∑ of catalogs
query complexity (tag: catalog, collection) - ∑ of catalogs
query records returned (tag: catalog, collection) - ∑ of catalogs
active sessions (tag: catalog) - ∑ of catalogs
executor threads
executor used threads (tag: process name, catalog)
executor thread execution time (tag: process name, catalog)

Cache

Web API metrics

smejdil · 2023-09-20T07:21:04Z

I will be happy to help and build Zabbix template "evitaDB by Prom"

novoj · 2023-09-20T08:36:19Z

I will be happy to help and build Zabbix template "evitaDB by Prom"

We'll get in touch before we start working on this issue. ETA is the December 23 / January 24.

novoj · 2023-12-07T09:54:23Z

We should also investigate this approach:

novoj · 2023-12-12T15:38:55Z

Interesting slide - three pillars of observability:

I think it might be beneficial to provide a basic access to all three of them in evitaLab.

novoj · 2023-12-13T07:06:03Z

I'd suggest creating a prototype where:

try to implement example JFR events according to blog post: https://www.morling.dev/blog/rest-api-monitoring-with-custom-jdk-flight-recorder-events/
create event that covers evitaDB QueryPlan execution
try to record / stream events and visualize them
try to filter them by predefined template (e.g. collection type for example)
try to integrate them to metrics: https://opentelemetry.io/docs/instrumentation/java/ and measure the slowdown of the system (I'd like to integrate directly with OpenTelemetry and avoid MicroProfile Metrics)
open servlets for Prometheus scraping
create example dashboard in Grafana

novoj · 2023-12-13T07:15:16Z

It would be interesting also to test https://www.jaegertracing.io/ and its integration into https://grafana.com/docs/grafana/latest/datasources/jaeger/ - it's somehow similar to our #148 and we should discuss whether it makes sense to move toward some standard instead of our proprietary solution (the principle should be very similar so it shouldn't be hard to migrate).

novoj · 2023-12-13T10:46:39Z

This should help us too: https://plugins.jetbrains.com/plugin/20937-java-jfr-profiler

smejdil · 2023-12-16T13:34:11Z

I'd suggest creating a prototype where:

* try to implement example JFR events according to blog post: https://www.morling.dev/blog/rest-api-monitoring-with-custom-jdk-flight-recorder-events/

* create event that covers evitaDB `QueryPlan` execution

* try to record / stream events and visualize them

* try to filter them by predefined template (e.g. collection type for example)

* try to integrate them to metrics: https://opentelemetry.io/docs/instrumentation/java/ and measure the slowdown of the system (I'd like to integrate directly with OpenTelemetry and avoid MicroProfile Metrics)

* open servlets for Prometheus scraping

* create example dashboard in Grafana

If EvitaDB will be able to use Prometheus format, it is possible to get metrics into Zabbix using https://www.zabbix.com/documentation/current/en/manual/config/items/itemtypes/prometheus it is also possible to use LLD - Low lever Discovery technique for some schema instances etc.

novoj · 2024-01-04T09:52:01Z

Notes from first prototype showdown and what needs to be added into the prototype:

how to filter JFR events to be generated = stored
- maybe prepare an alternative to JvmMetrics for evitaMetrics
- check how to make JFR Event enabled work properly
try to see if some JVM events can be "disabled" - e.g. the flamegraph must be quite demanding?!
find out how to plug in the OpenTelemetry abstraction
metrics will have their own API

novoj · 2024-01-05T13:32:27Z

We'we been recommended by Láďa Prskavec to stick to Prometheus metrics and don't use OTEL for database monitoring purposes. The recommendation was:

expose metrics via Prometheus endpoint
log data including "tracing information" - i.e. client id + request id to logs in standardized format

The OTEL is then used on SRE side to integrate multiple vendors together.

As localhost tracing viewer we've been recommended to use https://github.com/CtrlSpice/otel-desktop-viewer and for shared Grafana service https://grafana.com/oss/tempo/

… WIP traces

…ules which could be enabled via linking the libraries to the project

Refactoring during team status.

…ded order to ExternalApiProviderRegistrar.java to ensure that ObservabilityProviderRegistrar loads before GrpcProviderRegistrar

Introduce metrics into the evitaDB. The servlet for metric should start as separate API on different port (or part of a system API). Although we are used to Prometheus API, we should analyze different options - namely Open Telemetry.

novoj · 2024-02-06T08:13:42Z

Very minimal set of metrics published at: http://demo.evitadb.io:5557/observability/metrics ... when we get the prototype up and running, we'll expand the metrics list according to the set defined in the issue header.

lukashornych · 2024-02-06T09:30:03Z

We need to update the Monitor docs https://evitadb.io/documentation/operate/monitor with the new IDs.

novoj · 2024-02-06T09:42:01Z

We probably don't need to suffix metrics with type - it's visible in the metrics as comment:

# HELP io_evitadb_core_metric_event_query_plan_step_executed_event_timegauge Time taken
# TYPE io_evitadb_core_metric_event_query_plan_step_executed_event_timegauge gauge
io_evitadb_core_metric_event_query_plan_step_executed_event_timegauge 72500.0

Unfortunately, Prometheus doesn't propagate native histograms in text format - see prometheus/prometheus#11265, only in Protobuff format, and this is not easily scrapeable.

…tion flow

novoj · 2024-05-27T15:01:38Z

Initial version of metrics is done and released in 2024.7 release:

The issue won't be closed since there are still some metrics missing and also we need to properly document them.

(cherry picked from commit 07453ab)

feat(#18): GraphQL API JFR events and metrics

…on metrics fixes and adjustments

feat(#18): REST and GraphQL API metrics improvements

…uests

feat(#18): add OpenAPI operation ID to REST metrics to distinguish requests

feat(#18): count REST endpoints metric

novoj · 2024-06-24T07:27:13Z

Most of the metrics is done by now. We have also pretty looking dashboard in Grafana that's getting useful. I postpone closing this issue to be finalized later. We still need to:

visualize cache metrics, but this is related to Cache inconsistency #37 being solved
visualize thread pools usage
finalize documentation (descriptions etc.) - base is already available at https://evitadb.io/documentation/operate/observe?lang=evitaql#metrics
extract and document Grafana dashboard to JSON and make available for downloading - we also know, that current filters will not match requirements by K8S pod selection, so we need to investigate this as well

novoj self-assigned this Feb 24, 2023

novoj added the enhancement New feature or request label Feb 24, 2023

novoj added this to the Alpha milestone Feb 24, 2023

novoj removed this from the Alpha milestone Jul 18, 2023

novoj assigned lukashornych and Khertys Dec 18, 2023

Khertys pushed a commit that referenced this issue Jan 23, 2024

feat(#18): prometheus endpoint for metrics publishing, JFR recording,…

e008fee

… WIP traces

Khertys pushed a commit that referenced this issue Jan 23, 2024

feat(#18): tracing integration

531fea9

Khertys pushed a commit that referenced this issue Jan 30, 2024

feat(#18): refactor, observability capabilities moved to separate mod…

818f616

…ules which could be enabled via linking the libraries to the project

Khertys pushed a commit that referenced this issue Jan 30, 2024

feat(#18): refactor, added missing docs

8e821d6

novoj added a commit that referenced this issue Jan 31, 2024

feat(#18): Metrics

e0bb23e

Refactoring during team status.

Khertys pushed a commit that referenced this issue Feb 1, 2024

feat(#18): refactor, docs test fixes + generated missing examples, ad…

3fb110f

…ded order to ExternalApiProviderRegistrar.java to ensure that ObservabilityProviderRegistrar loads before GrpcProviderRegistrar

Khertys pushed a commit that referenced this issue Feb 1, 2024

fix(#18): revert localhost environment in UserDocumentationTest

f10fdb3

novoj added a commit that referenced this issue Feb 6, 2024

feat(#18): added documentation and minor cleaning

cc61278

novoj added a commit that referenced this issue Feb 6, 2024

feat(#18): removed unknown classes reference

6bdf47c

lukashornych mentioned this issue Feb 6, 2024

Send clientId and requestId automatically with every request to evitaDB instance lukashornych/evitalab#72

Closed

novoj added a commit that referenced this issue May 26, 2024

fix(#18): NPE correction

f0674d7

novoj added a commit that referenced this issue May 26, 2024

fix(#18): closing event was not committed

3df3843

novoj added a commit that referenced this issue May 26, 2024

fix(#18): optimized access to currently opened sessions

2f1d503

novoj added a commit that referenced this issue May 26, 2024

fix(#18): renamed kill event

4f732d4

novoj added a commit that referenced this issue May 26, 2024

fix(#18): session event rename

e0aeb42

novoj added a commit that referenced this issue May 26, 2024

fix(#18): corrected thresholds and naming

98c1673

novoj added a commit that referenced this issue May 27, 2024

fix(#18): attempt to switch to native histograms

8c5c5cc

novoj added a commit that referenced this issue May 27, 2024

fix(#18): updated naming

e0dfe2a

novoj added a commit that referenced this issue May 27, 2024

fix(#18): another attempt for native histograms

39dde78

novoj added a commit that referenced this issue May 27, 2024

fix(#18): switching back to classic histograms

5b03ba7

Unfortunately, Prometheus doesn't propagate native histograms in text format - see prometheus/prometheus#11265, only in Protobuff format, and this is not easily scrapeable.

novoj added a commit that referenced this issue May 27, 2024

fix(#18): corrected observed errors in metrics

241ce09

novoj added a commit that referenced this issue May 27, 2024

fix(#18): avoiding CompressionKeyUnknownException in standard applica…

4a55833

…tion flow

novoj added a commit that referenced this issue May 27, 2024

fix(#18): corrected gRPC metrics

27b66e7

novoj added a commit that referenced this issue May 28, 2024

fix(#18): storage should be updated upon catalog removal

f6c072a

novoj added a commit that referenced this issue Jun 5, 2024

fix(#18): corrected execution metrics

07453ab

lukashornych added a commit that referenced this issue Jun 6, 2024

fix(#18): new GraphQL endpoint execution event and metrics

728eef4

novoj added a commit that referenced this issue Jun 7, 2024

fix(#18): corrected execution metrics

12a53fe

(cherry picked from commit 07453ab)

lukashornych added a commit that referenced this issue Jun 11, 2024

fix(#18): new JFR event and metrics for building GQL schemas/instances

f1dbe07

lukashornych added a commit that referenced this issue Jun 11, 2024

Merge pull request #602 from FgForrest/18-web-api-metrics

9f4fc88

feat(#18): GraphQL API JFR events and metrics

lukashornych added a commit that referenced this issue Jun 12, 2024

fix(#18): REST API request metrics

72a9373

lukashornych added a commit that referenced this issue Jun 17, 2024

fix(#18): missing metrics in schema data fetchers

92f5db1

lukashornych added a commit that referenced this issue Jun 17, 2024

fix(#18): support REST instance creation metrics, GQL instance creati…

fe9be80

…on metrics fixes and adjustments

lukashornych added a commit that referenced this issue Jun 17, 2024

Merge pull request #604 from FgForrest/18-web-api-metrics

cc6276a

feat(#18): REST and GraphQL API metrics improvements

lukashornych added a commit that referenced this issue Jun 17, 2024

fix(#18): add OpenAPI operation ID to REST metrics to distinguish req…

886330d

…uests

lukashornych added a commit that referenced this issue Jun 17, 2024

Merge pull request #605 from FgForrest/18-web-api-metrics

aa3740d

feat(#18): add OpenAPI operation ID to REST metrics to distinguish requests

lukashornych added a commit that referenced this issue Jun 17, 2024

fix(#18): count REST endpoints metric

841cf72

lukashornych added a commit that referenced this issue Jun 17, 2024

Merge pull request #606 from FgForrest/18-web-api-metrics

2b31ac4

feat(#18): count REST endpoints metric

lukashornych added a commit that referenced this issue Jun 17, 2024

docs(#18): metrics and JFR events docs update

e823bc9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metrics #18

Metrics #18

novoj commented Feb 24, 2023 •

edited by lukashornych

Loading

smejdil commented Sep 20, 2023

novoj commented Sep 20, 2023

novoj commented Dec 7, 2023 •

edited

Loading

novoj commented Dec 12, 2023

novoj commented Dec 13, 2023

novoj commented Dec 13, 2023

novoj commented Dec 13, 2023

smejdil commented Dec 16, 2023

novoj commented Jan 4, 2024

novoj commented Jan 5, 2024

novoj commented Feb 6, 2024

lukashornych commented Feb 6, 2024

novoj commented Feb 6, 2024

novoj commented May 27, 2024

novoj commented Jun 24, 2024

Metrics #18

Metrics #18

Comments

novoj commented Feb 24, 2023 • edited by lukashornych Loading

Metrics proposals

System metrics

Storage metrics

Transactions

Storage

Engine metrics

Queries

Cache

Web API metrics

smejdil commented Sep 20, 2023

novoj commented Sep 20, 2023

novoj commented Dec 7, 2023 • edited Loading

novoj commented Dec 12, 2023

novoj commented Dec 13, 2023

novoj commented Dec 13, 2023

novoj commented Dec 13, 2023

smejdil commented Dec 16, 2023

novoj commented Jan 4, 2024

novoj commented Jan 5, 2024

novoj commented Feb 6, 2024

lukashornych commented Feb 6, 2024

novoj commented Feb 6, 2024

novoj commented May 27, 2024

novoj commented Jun 24, 2024

novoj commented Feb 24, 2023 •

edited by lukashornych

Loading

novoj commented Dec 7, 2023 •

edited

Loading