Skip to content

Commit

Permalink
Merge #512: Implement a health check for the tracker container
Browse files Browse the repository at this point in the history
5e0a686 refactor: [#508] extract health check methods (Jose Celano)
bf23479 feat: [#508] add health check for UDP tracker (Jose Celano)
2a05590 refactor: [#508] move UDP tracker client to production code (Jose Celano)
7421306 feat: [#508] add health check enpoint to HTTP tracker (Jose Celano)
ef296f7 feat: [#508] app health check endpoint checks API (Jose Celano)
e1a45a2 feat: [#508] Health Check API but no checks yet (Jose Celano)
48ac64f feat: [#508] add container healthcheck for API (Jose Celano)
0ef4e34 feat: [#508] add new binary HTTP health check (Jose Celano)
f1c7ccc feat: add cargo dependency reqwest (Jose Celano)

Pull request description:

  We need to check the three services provided by the container:

  - [x] API
  - [x] HTTP Tracker (1 or more)
  - [x] UDP Tracker (1 or more)

  And we also need to:

  - [x] Check them only when they are enabled in the configuration.

  ### Implementation

  - [x] High-level health-checker API (in the future management API). It only checks enabled services. It makes a request to the service healthcheck endpoints.
  - [x] Healthcheck API endpoint
  - [x] Healthcheck HTTP Tracker endpoint
  - [x] Healthcheck UDP Tracker request (using the `connect` request endpoint).

  With the default configuration, you can use the endpoint: http://localhost:1313/health_check

  It will return an OK Json response:

  ```json
  {
    "status": "Ok",
    "message": ""
  }
  ```

  or an Error response:

  ```json
  {
    "status": "Error",
    "message": "API is not healthy. Health check endpoint: http://127.0.0.1:1212/health_check"
  }
  ```

  **NOTICE**: health checks are not executed when services use port 0 in the configuration. Service launchers must be changed to return the bound port so the health checker handler can connect.

ACKs for top commit:
  josecelano:
    ACK 5e0a686

Tree-SHA512: d5e3dc788a10654c7b7b11388c4b559aecc899fafeaffc4dce30c9309c1c42feeb671f66db8aa1f474dc3b27dff9dbd5df4e02b2634e433cffb4772bedd2e115
  • Loading branch information
josecelano committed Nov 27, 2023
2 parents 50372d8 + 5e0a686 commit ae8ef29
Show file tree
Hide file tree
Showing 44 changed files with 687 additions and 58 deletions.
2 changes: 2 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
[package]
default-run = "torrust-tracker"
name = "torrust-tracker"
readme = "README.md"

Expand Down Expand Up @@ -50,6 +51,7 @@ r2d2 = "0"
r2d2_mysql = "24"
r2d2_sqlite = { version = "0", features = ["bundled"] }
rand = "0"
reqwest = "0"
serde = { version = "1", features = ["derive"] }
serde_bencode = "0"
serde_json = "1"
Expand Down
9 changes: 7 additions & 2 deletions Containerfile
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ COPY --from=build \
RUN cargo nextest run --workspace-remap /test/src/ --extract-to /test/src/ --no-run --archive-file /test/torrust-tracker.tar.zst
RUN cargo nextest run --workspace-remap /test/src/ --target-dir-remap /test/src/target/ --cargo-metadata /test/src/target/nextest/cargo-metadata.json --binaries-metadata /test/src/target/nextest/binaries-metadata.json

RUN mkdir -p /app/bin/; cp -l /test/src/target/release/torrust-tracker /app/bin/torrust-tracker
RUN mkdir -p /app/bin/; cp -l /test/src/target/release/torrust-tracker /app/bin/torrust-tracker; cp -l /test/src/target/release/http_health_check /app/bin/http_health_check
RUN mkdir -p /app/lib/; cp -l $(realpath $(ldd /app/bin/torrust-tracker | grep "libz\.so\.1" | awk '{print $3}')) /app/lib/libz.so.1
RUN chown -R root:root /app; chmod -R u=rw,go=r,a+X /app; chmod -R a+x /app/bin

Expand All @@ -101,18 +101,21 @@ ARG USER_ID=1000
ARG UDP_PORT=6969
ARG HTTP_PORT=7070
ARG API_PORT=1212
ARG HEALTH_CHECK_API_PORT=1313

ENV TORRUST_TRACKER_PATH_CONFIG=${TORRUST_TRACKER_PATH_CONFIG}
ENV TORRUST_TRACKER_DATABASE_DRIVER=${TORRUST_TRACKER_DATABASE_DRIVER}
ENV USER_ID=${USER_ID}
ENV UDP_PORT=${UDP_PORT}
ENV HTTP_PORT=${HTTP_PORT}
ENV API_PORT=${API_PORT}
ENV HEALTH_CHECK_API_PORT=${HEALTH_CHECK_API_PORT}
ENV TZ=Etc/UTC

EXPOSE ${UDP_PORT}/udp
EXPOSE ${HTTP_PORT}/tcp
EXPOSE ${API_PORT}/tcp
EXPOSE ${HEALTH_CHECK_API_PORT}/tcp

RUN mkdir -p /var/lib/torrust/tracker /var/log/torrust/tracker /etc/torrust/tracker

Expand All @@ -136,5 +139,7 @@ CMD ["sh"]
FROM runtime as release
ENV RUNTIME="release"
COPY --from=test /app/ /usr/
# HEALTHCHECK CMD ["/usr/bin/wget", "--no-verbose", "--tries=1", "--spider", "localhost:${API_PORT}/version"]
HEALTHCHECK --interval=5s --timeout=5s --start-period=3s --retries=3 \
CMD /usr/bin/http_health_check http://localhost:${HEALTH_CHECK_API_PORT}/health_check \
|| exit 1
CMD ["/usr/bin/torrust-tracker"]
1 change: 1 addition & 0 deletions docs/containers.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,7 @@ The following environmental variables can be set:
- `UDP_PORT` - The port for the UDP tracker. This should match the port used in the configuration, (default `6969`).
- `HTTP_PORT` - The port for the HTTP tracker. This should match the port used in the configuration, (default `7070`).
- `API_PORT` - The port for the tracker API. This should match the port used in the configuration, (default `1212`).
- `HEALTH_CHECK_API_PORT` - The port for the Health Check API. This should match the port used in the configuration, (default `1313`).


### Sockets
Expand Down
50 changes: 35 additions & 15 deletions packages/configuration/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -191,40 +191,43 @@
//! The default configuration is:
//!
//! ```toml
//! log_level = "info"
//! mode = "public"
//! announce_interval = 120
//! db_driver = "Sqlite3"
//! db_path = "./storage/tracker/lib/database/sqlite3.db"
//! announce_interval = 120
//! min_announce_interval = 120
//! external_ip = "0.0.0.0"
//! inactive_peer_cleanup_interval = 600
//! log_level = "info"
//! max_peer_timeout = 900
//! min_announce_interval = 120
//! mode = "public"
//! on_reverse_proxy = false
//! external_ip = "0.0.0.0"
//! tracker_usage_statistics = true
//! persistent_torrent_completed_stat = false
//! inactive_peer_cleanup_interval = 600
//! remove_peerless_torrents = true
//! tracker_usage_statistics = true
//!
//! [[udp_trackers]]
//! enabled = false
//! bind_address = "0.0.0.0:6969"
//! enabled = false
//!
//! [[http_trackers]]
//! enabled = false
//! bind_address = "0.0.0.0:7070"
//! ssl_enabled = false
//! enabled = false
//! ssl_cert_path = ""
//! ssl_enabled = false
//! ssl_key_path = ""
//!
//! [http_api]
//! enabled = true
//! bind_address = "127.0.0.1:1212"
//! ssl_enabled = false
//! enabled = true
//! ssl_cert_path = ""
//! ssl_enabled = false
//! ssl_key_path = ""
//!
//! [http_api.access_tokens]
//! admin = "MyAccessToken"
//!
//! [health_check_api]
//! bind_address = "127.0.0.1:1313"
//!```
use std::collections::{HashMap, HashSet};
use std::net::IpAddr;
Expand Down Expand Up @@ -342,7 +345,7 @@ pub struct HttpApi {
/// The address the tracker will bind to.
/// The format is `ip:port`, for example `0.0.0.0:6969`. If you want to
/// listen to all interfaces, use `0.0.0.0`. If you want the operating
/// system to choose a random port, use port `0`.
/// system to choose a random port, use port `0`.
pub bind_address: String,
/// Weather the HTTP API will use SSL or not.
pub ssl_enabled: bool,
Expand All @@ -363,9 +366,7 @@ impl HttpApi {
fn override_admin_token(&mut self, api_admin_token: &str) {
self.access_tokens.insert("admin".to_string(), api_admin_token.to_string());
}
}

impl HttpApi {
/// Checks if the given token is one of the token in the configuration.
#[must_use]
pub fn contains_token(&self, token: &str) -> bool {
Expand All @@ -375,6 +376,17 @@ impl HttpApi {
}
}

/// Configuration for the Health Check API.
#[serde_as]
#[derive(Serialize, Deserialize, PartialEq, Eq, Debug, Clone)]
pub struct HealthCheckApi {
/// The address the API will bind to.
/// The format is `ip:port`, for example `127.0.0.1:1313`. If you want to
/// listen to all interfaces, use `0.0.0.0`. If you want the operating
/// system to choose a random port, use port `0`.
pub bind_address: String,
}

/// Core configuration for the tracker.
#[allow(clippy::struct_excessive_bools)]
#[derive(Serialize, Deserialize, PartialEq, Eq, Debug)]
Expand Down Expand Up @@ -465,6 +477,8 @@ pub struct Configuration {
pub http_trackers: Vec<HttpTracker>,
/// The HTTP API configuration.
pub http_api: HttpApi,
/// The Health Check API configuration.
pub health_check_api: HealthCheckApi,
}

/// Errors that can occur when loading the configuration.
Expand Down Expand Up @@ -529,6 +543,9 @@ impl Default for Configuration {
.cloned()
.collect(),
},
health_check_api: HealthCheckApi {
bind_address: String::from("127.0.0.1:1313"),
},
};
configuration.udp_trackers.push(UdpTracker {
enabled: false,
Expand Down Expand Up @@ -676,6 +693,9 @@ mod tests {
[http_api.access_tokens]
admin = "MyAccessToken"
[health_check_api]
bind_address = "127.0.0.1:1313"
"#
.lines()
.map(str::trim_start)
Expand Down
16 changes: 16 additions & 0 deletions packages/test-helpers/src/configuration.rs
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,10 @@ pub fn ephemeral() -> Configuration {
config.http_api.enabled = true;
config.http_api.bind_address = format!("127.0.0.1:{}", &api_port);

// Ephemeral socket address for Health Check API
let health_check_api_port = 0u16;
config.health_check_api.bind_address = format!("127.0.0.1:{}", &health_check_api_port);

// Ephemeral socket address for UDP tracker
let udp_port = 0u16;
config.udp_trackers[0].enabled = true;
Expand Down Expand Up @@ -140,3 +144,15 @@ pub fn ephemeral_ipv6() -> Configuration {

cfg
}

/// Ephemeral without running any services.
#[must_use]
pub fn ephemeral_with_no_services() -> Configuration {
let mut cfg = ephemeral();

cfg.http_api.enabled = false;
cfg.http_trackers[0].enabled = false;
cfg.udp_trackers[0].enabled = false;

cfg
}
3 changes: 3 additions & 0 deletions share/default/config/tracker.container.mysql.toml
Original file line number Diff line number Diff line change
Expand Up @@ -36,3 +36,6 @@ ssl_key_path = "/var/lib/torrust/tracker/tls/localhost.key"

[http_api.access_tokens]
admin = "MyAccessToken"

[health_check_api]
bind_address = "127.0.0.1:1313"
3 changes: 3 additions & 0 deletions share/default/config/tracker.container.sqlite3.toml
Original file line number Diff line number Diff line change
Expand Up @@ -36,3 +36,6 @@ ssl_key_path = "/var/lib/torrust/tracker/tls/localhost.key"

[http_api.access_tokens]
admin = "MyAccessToken"

[health_check_api]
bind_address = "127.0.0.1:1313"
3 changes: 3 additions & 0 deletions share/default/config/tracker.development.sqlite3.toml
Original file line number Diff line number Diff line change
Expand Up @@ -32,3 +32,6 @@ ssl_key_path = ""

[http_api.access_tokens]
admin = "MyAccessToken"

[health_check_api]
bind_address = "127.0.0.1:1313"
18 changes: 14 additions & 4 deletions src/app.rs
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,11 @@
//! - Loading data from the database when it's needed.
//! - Starting some jobs depending on the configuration.
//!
//! The started jobs may be:
//! Jobs executed always:
//!
//! - Health Check API
//!
//! Optional jobs:
//!
//! - Torrent cleaner: it removes inactive peers and (optionally) peerless torrents.
//! - UDP trackers: the user can enable multiple UDP tracker on several ports.
Expand All @@ -23,13 +27,16 @@ use log::warn;
use tokio::task::JoinHandle;
use torrust_tracker_configuration::Configuration;

use crate::bootstrap::jobs::{http_tracker, torrent_cleanup, tracker_apis, udp_tracker};
use crate::bootstrap::jobs::{health_check_api, http_tracker, torrent_cleanup, tracker_apis, udp_tracker};
use crate::servers::http::Version;
use crate::tracker;

/// # Panics
///
/// Will panic if the socket address for API can't be parsed.
/// Will panic if:
///
/// - Can't retrieve tracker keys from database.
/// - Can't load whitelist from database.
pub async fn start(config: Arc<Configuration>, tracker: Arc<tracker::Tracker>) -> Vec<JoinHandle<()>> {
let mut jobs: Vec<JoinHandle<()>> = Vec::new();

Expand Down Expand Up @@ -78,10 +85,13 @@ pub async fn start(config: Arc<Configuration>, tracker: Arc<tracker::Tracker>) -
jobs.push(tracker_apis::start_job(&config.http_api, tracker.clone()).await);
}

// Remove torrents without peers, every interval
// Start runners to remove torrents without peers, every interval
if config.inactive_peer_cleanup_interval > 0 {
jobs.push(torrent_cleanup::start_job(&config, &tracker));
}

// Start Health Check API
jobs.push(health_check_api::start_job(config).await);

jobs
}
37 changes: 37 additions & 0 deletions src/bin/http_health_check.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
//! Minimal `curl` or `wget` to be used for container health checks.
//!
//! It's convenient to avoid using third-party libraries because:
//!
//! - They are harder to maintain.
//! - They introduce new attack vectors.
use std::{env, process};

#[tokio::main]
async fn main() {
let args: Vec<String> = env::args().collect();
if args.len() != 2 {
eprintln!("Usage: cargo run --bin http_health_check <HEALTH_URL>");
eprintln!("Example: cargo run --bin http_health_check http://127.0.0.1:1212/health_check");
std::process::exit(1);
}

println!("Health check ...");

let url = &args[1].clone();

match reqwest::get(url).await {
Ok(response) => {
if response.status().is_success() {
println!("STATUS: {}", response.status());
process::exit(0);
} else {
println!("Non-success status received.");
process::exit(1);
}
}
Err(err) => {
println!("ERROR: {err}");
process::exit(1);
}
}
}
76 changes: 76 additions & 0 deletions src/bootstrap/jobs/health_check_api.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
//! Health Check API job starter.
//!
//! The [`health_check_api::start_job`](crate::bootstrap::jobs::health_check_api::start_job)
//! function starts the Health Check REST API.
//!
//! The [`health_check_api::start_job`](crate::bootstrap::jobs::health_check_api::start_job)
//! function spawns a new asynchronous task, that tasks is the "**launcher**".
//! The "**launcher**" starts the actual server and sends a message back
//! to the main application. The main application waits until receives
//! the message [`ApiServerJobStarted`]
//! from the "**launcher**".
//!
//! The "**launcher**" is an intermediary thread that decouples the Health Check
//! API server from the process that handles it.
//!
//! Refer to the [configuration documentation](https://docs.rs/torrust-tracker-configuration)
//! for the API configuration options.
use std::net::SocketAddr;
use std::sync::Arc;

use log::info;
use tokio::sync::oneshot;
use tokio::task::JoinHandle;
use torrust_tracker_configuration::Configuration;

use crate::servers::health_check_api::server;

/// This is the message that the "launcher" spawned task sends to the main
/// application process to notify the API server was successfully started.
///
/// > **NOTICE**: it does not mean the API server is ready to receive requests.
/// It only means the new server started. It might take some time to the server
/// to be ready to accept request.
#[derive(Debug)]
pub struct ApiServerJobStarted {
pub bound_addr: SocketAddr,
}

/// This function starts a new Health Check API server with the provided
/// configuration.
///
/// The functions starts a new concurrent task that will run the API server.
/// This task will send a message to the main application process to notify
/// that the API server was successfully started.
///
/// # Panics
///
/// It would panic if unable to send the `ApiServerJobStarted` notice.
pub async fn start_job(config: Arc<Configuration>) -> JoinHandle<()> {
let bind_addr = config
.health_check_api
.bind_address
.parse::<std::net::SocketAddr>()
.expect("Health Check API bind_address invalid.");

let (tx, rx) = oneshot::channel::<ApiServerJobStarted>();

// Run the API server
let join_handle = tokio::spawn(async move {
info!("Starting Health Check API server: http://{}", bind_addr);

let handle = server::start(bind_addr, tx, config.clone());

if let Ok(()) = handle.await {
info!("Health Check API server on http://{} stopped", bind_addr);
}
});

// Wait until the API server job is running
match rx.await {
Ok(_msg) => info!("Torrust Health Check API server started"),
Err(e) => panic!("the Health Check API server was dropped: {e}"),
}

join_handle
}
Loading

0 comments on commit ae8ef29

Please sign in to comment.