Skip to content

Conversation

josecelano
Copy link
Member

@josecelano josecelano commented Mar 25, 2025

Relates to: #1263 (comment)

It adds a new REST API endpoint with a new extendable format for metrics.

In this PR I will only add the metrics for:

  • Performance
  • Announce requests
  • Scrape requests

See the complete list here.

It requires emitting the new metrics from the:

  • HTTP Tracker Core package
  • UDP Tracker Core package
  • UDP Tracker Server package

The remaining metrics (torrents and peers) require adding metrics to the tracker core package.

JSON format

URL: http://0.0.0.0:1212/api/v1/metrics?token=MyAccessToken

Sample response:

{
  "metrics":[
    {
      "kind":"counter",
      "name":"http_tracker_core_announce_requests_received_total",
      "samples":[
        {
          "value":1,
          "recorded_at":"2025-04-02T00:00:00+00:00",
          "labels":[
            {
              "name":"server_binding_ip",
              "value":"0.0.0.0"
            },
            {
              "name":"server_binding_port",
              "value":"7070"
            },
            {
              "name":"server_binding_protocol",
              "value":"http"
            }
          ]
        }
      ]
    },
    {
      "kind":"gauge",
      "name":"udp_tracker_server_performance_avg_announce_processing_time_ns",
      "samples":[
        {
          "value":1.0,
          "recorded_at":"2025-04-02T00:00:00+00:00",
          "labels":[
            {
              "name":"server_binding_ip",
              "value":"0.0.0.0"
            },
            {
              "name":"server_binding_port",
              "value":"7070"
            },
            {
              "name":"server_binding_protocol",
              "value":"http"
            }
          ]
        }
      ]
    }
  ]
}

Prometheus format

URL: http://0.0.0.0:1212/api/v1/metrics?token=MyAccessToken&format=prometheus

http_tracker_core_announce_requests_received_total{server_binding_ip="0.0.0.0",server_binding_port="7070",server_binding_protocol="http"} 1
udp_tracker_server_performance_avg_announce_processing_time_ns{server_binding_ip="0.0.0.0",server_binding_port="7070",server_binding_protocol="http"} 1

Manual Tests

You can increase the number of announce or scrape requests by using the console tracker client.

HTTP:

cargo run -p torrust-tracker-client --bin http_tracker_client announce http://127.0.0.1:7070 443c7602b4fde83d1154d6d9da48808418b181b6 | jq
cargo run -p torrust-tracker-client --bin http_tracker_client scrape http://127.0.0.1:7070 443c7602b4fde83d1154d6d9da48808418b181b6 | jq

UDP:

cargo run -p torrust-tracker-client --bin udp_tracker_client announce udp://127.0.0.1:6969 443c7602b4fde83d1154d6d9da48808418b181b6 | jq
cargo run -p torrust-tracker-client --bin udp_tracker_client scrape udp://127.0.0.1:6969 443c7602b4fde83d1154d6d9da48808418b181b6 | jq

TODO

  • Move labeled metrics to metrics package.
  • The metric name and label set must be unique in the array of labeled metrics. Enforce constraint.
  • Replace primitive types in the new labeled metrics with new tpyes to enforce constraints. For example, metrics names, labels, metric types, etc.
  • Serialize to Prometheus format.
  • Serialize/Deserialize from JSON. Needed in the Index.
  • Inject the rigth URL scheme. It's hardcoded now.
  • Review what labels we include per metric. For example the "protocol" label can be derived from the URL but it requires parsing the URL and it makes harder to build graphs in Grafana. I guess we should include labels for the type of aggregate data we want to get.
  • Implement versioning. A given API version must contain a set of labeled metrics. Clients expect some labeled metrics to be included in a API version. We need to initialize the array of metrics with all the expected labeled metrics with the initial value.
  • Write more tests.
  • Implement for UDP metrics. It's only implemented for HTTP tracker metrics. This would require merging labeled metrics.
    • Emit metrics from UDP tracker core.
    • Emit metrics from UDP tracker server.

Discarded:

  • Use a f64 for metric values instead of u64.. Using u64 for counter and f64 for gauge.
  • Add a new namespace label to avoid conflict for packages using the same metric name. The metric would not be the same with different context, it would be a different source even if they might represent the same thing. For example, a counter for announce requests can be added at the server level, HTTP core level and tracker core level. I think it's better to use a different metric for them.
    • In Grafana for example you should explicitly exclude labels otherwise you could get duplicate values if two metrics from two different packages have the same name.

Notes:

  • Regarding what I called "versioning", you can describe metrics so they appear in the JSON response even if there are no samples yet. However, for the Prometheus format, they will not be included in the export. A line without a metric value would be an error. We could set the initial value, but it makes sense only for counters.

Future work

  • We will probably need some methods for the MetricCollection to search for metrics, apply aggregated functions, etc. For example:
    • Sum all the samples values for a given counter metric (total). For example: total number of requests regardless what label values they have.
  • We can't deprecate the current http://0.0.0.0:1212/api/v1/stats API endpoint until we have the same information in the new endpoint http://0.0.0.0:1212/api/v1/metrics. We need to add this type of metrics to the tracker core package too because it's the package we can get torrents and peers metrics from. See Overhaul stats: Segregated metrics for each tracker running on a different socket address #1263 (comment)
  • Add tests for the new endpoint.
  • Deploy to the demo tracker and build a new dashboard in Grafana and compare it with the current dashboard to check if metrics are being collected ok.

Copy link

codecov bot commented Mar 25, 2025

Codecov Report

Attention: Patch coverage is 94.54722% with 112 lines in your changes missing coverage. Please review.

Project coverage is 84.56%. Comparing base (3816446) to head (e3b84a4).
Report is 11 commits behind head on develop.

Files with missing lines Patch % Lines
...s/rest-tracker-api-core/src/statistics/services.rs 0.00% 25 Missing ⚠️
...racker-api-server/src/v1/context/stats/handlers.rs 0.00% 24 Missing ⚠️
packages/metrics/src/metric_collection.rs 95.18% 20 Missing and 4 partials ⚠️
packages/metrics/src/label/set.rs 93.24% 14 Missing and 1 partial ⚠️
...acker-api-server/src/v1/context/stats/responses.rs 0.00% 6 Missing ⚠️
...acker-api-server/src/v1/context/stats/resources.rs 0.00% 5 Missing ⚠️
packages/metrics/src/sample.rs 98.02% 1 Missing and 4 partials ⚠️
...ckages/http-tracker-core/src/statistics/metrics.rs 50.00% 3 Missing ⚠️
...ackages/udp-tracker-core/src/statistics/metrics.rs 50.00% 3 Missing ⚠️
packages/metrics/src/sample_collection.rs 99.63% 0 Missing and 1 partial ⚠️
... and 1 more
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1414      +/-   ##
===========================================
+ Coverage    83.45%   84.56%   +1.11%     
===========================================
  Files          234      254      +20     
  Lines        17197    19189    +1992     
  Branches     17197    19189    +1992     
===========================================
+ Hits         14351    16227    +1876     
- Misses        2590     2690     +100     
- Partials       256      272      +16     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@josecelano josecelano force-pushed the 1403-overhaul-stats-start-collecting-stats-per-server-instance-per-socket branch from b766cc7 to c159351 Compare March 26, 2025 09:54
@josecelano josecelano force-pushed the 1403-overhaul-stats-start-collecting-stats-per-server-instance-per-socket branch from c159351 to 4b5fd1b Compare March 31, 2025 14:50
@josecelano josecelano changed the title POC: New API endpoint with extendable metrics New API endpoint with extendable metrics Mar 31, 2025
@josecelano josecelano force-pushed the 1403-overhaul-stats-start-collecting-stats-per-server-instance-per-socket branch from 59cd15c to f0999f0 Compare April 1, 2025 17:50
@josecelano josecelano force-pushed the 1403-overhaul-stats-start-collecting-stats-per-server-instance-per-socket branch from 5b5ecde to 4fc8fa1 Compare April 4, 2025 17:33
@josecelano josecelano force-pushed the 1403-overhaul-stats-start-collecting-stats-per-server-instance-per-socket branch from 384adf9 to 6573a02 Compare April 4, 2025 17:49
@josecelano josecelano force-pushed the 1403-overhaul-stats-start-collecting-stats-per-server-instance-per-socket branch from 6573a02 to db26b61 Compare April 4, 2025 17:51
@josecelano josecelano force-pushed the 1403-overhaul-stats-start-collecting-stats-per-server-instance-per-socket branch from d46044f to e4b4194 Compare April 9, 2025 12:55
@josecelano josecelano force-pushed the 1403-overhaul-stats-start-collecting-stats-per-server-instance-per-socket branch from e4b4194 to b9c8220 Compare April 9, 2025 15:29
@josecelano josecelano force-pushed the 1403-overhaul-stats-start-collecting-stats-per-server-instance-per-socket branch from 73f17a9 to 0dc7b73 Compare April 9, 2025 16:57
@josecelano josecelano marked this pull request as ready for review April 9, 2025 16:57
@josecelano josecelano requested a review from a team as a code owner April 9, 2025 16:57
This package allow creating collection of metrics that can have labels.

It's similar to the `metrics` crate. There are two types of metrics:

- Counter
- Gauge

For example, you can increase a counter with:

```rust
let time = DurationSinceUnixEpoch::from_secs(1_743_552_000);
let label_set: LabelSet = (LabelName::new("label_name"), LabelValue::new("value")).into();

let mut metric_collection = MetricCollection::new(
    // Collection of counter-type metrics
    MetricKindCollection::new(vec![
      Metric::new(
        MetricName::new("test_counter"),
        SampleCollection::new(vec![Sample::new(Counter::new(0), time, label_set.clone())]))
    ]),
    // Empty colelction of gauge-type metrics
    MetricKindCollection::new(vec![])
);

metric_collection.increase_counter(&MetricName::new("test_counter"), &label_set, time);
```

Metric colelctions are serializable into JSON and exportable to
Prometheus format.
…ore and expose in REST API

**URL:** http://0.0.0.0:1212/api/v1/metrics?token=MyAccessToken

**Sample response:**

```json
{
  "metrics":[
    {
      "kind":"counter",
      "name":"http_tracker_core_announce_requests_received_total",
      "samples":[
        {
          "value":1,
          "update_at":"2025-04-02T00:00:00+00:00",
          "labels":[
            {
              "name":"server_binding_ip",
              "value":"0.0.0.0"
            },
            {
              "name":"server_binding_port",
              "value":"7070"
            },
            {
              "name":"server_binding_protocol",
              "value":"http"
            }
          ]
        }
      ]
    },
    {
      "kind":"gauge",
      "name":"udp_tracker_server_performance_avg_announce_processing_time_ns",
      "samples":[
        {
          "value":1.0,
          "update_at":"2025-04-02T00:00:00+00:00",
          "labels":[
            {
              "name":"server_binding_ip",
              "value":"0.0.0.0"
            },
            {
              "name":"server_binding_port",
              "value":"7070"
            },
            {
              "name":"server_binding_protocol",
              "value":"http"
            }
          ]
        }
      ]
    }
  ]
}
```

**URL:** http://0.0.0.0:1212/api/v1/stats?token=MyAccessToken&format=prometheus

```
http_tracker_core_announce_requests_received_total{server_binding_ip="0.0.0.0",server_binding_port="7070",server_binding_protocol="http"} 1
udp_tracker_server_performance_avg_announce_processing_time_ns{server_binding_ip="0.0.0.0",server_binding_port="7070",server_binding_protocol="http"} 1
```
@josecelano josecelano force-pushed the 1403-overhaul-stats-start-collecting-stats-per-server-instance-per-socket branch from 681b8c0 to af8dbfa Compare April 10, 2025 06:48
After discussing with @da2ce7 we don't think this is necessary.
To remove duplicate data. LabelSet is the HashMap key and it was also
included in the HashMap value.
… Sample

The new name is more common in the context of metrics and time-series
data packages like Prometheus.
@josecelano
Copy link
Member Author

ACK e3b84a4

@josecelano
Copy link
Member Author

Hi @da2ce7, I've solved the problem we discussed today here.

@josecelano josecelano merged commit 90259b5 into torrust:develop Apr 10, 2025
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Overhaul stats: Start collecting stats per server instance (per socket)

1 participant