diff --git a/website/docs/assets/fluss-quickstart-observability.zip b/website/docs/assets/fluss-quickstart-observability.zip
index 8a1ecda7e7..9d8618e0a6 100644
Binary files a/website/docs/assets/fluss-quickstart-observability.zip and b/website/docs/assets/fluss-quickstart-observability.zip differ
diff --git a/website/docs/install-deploy/overview.md b/website/docs/install-deploy/overview.md
index 9704b8326f..2b3804d588 100644
--- a/website/docs/install-deploy/overview.md
+++ b/website/docs/install-deploy/overview.md
@@ -129,6 +129,7 @@ We have listed them in the table below the figure.
[JMX](maintenance/observability/metric-reporters.md#jmx)
[Prometheus](maintenance/observability/metric-reporters.md#prometheus)
+ [OpenTelemetry](maintenance/observability/metric-reporters.md#opentelemetry)
|
@@ -137,9 +138,7 @@ We have listed them in the table below the figure.
## How to deploy Fluss
Fluss can be deployed in three different ways:
-- [Local Cluster](install-deploy/deploying-local-cluster.md)
-- [Distributed Cluster](install-deploy/deploying-distributed-cluster.md)
-- [Docker run / Docker Compose](install-deploy/deploying-with-docker.md)
-**NOTE**:
-- Local Cluster is for testing purpose only.
\ No newline at end of file
+- [Local Cluster](install-deploy/deploying-local-cluster.md) (for testing purposes only)
+- [Distributed Cluster](install-deploy/deploying-distributed-cluster.md)
+- [Docker](install-deploy/deploying-with-docker.md)
diff --git a/website/docs/maintenance/observability/metric-reporters.md b/website/docs/maintenance/observability/metric-reporters.md
index 48f94770f8..aacd964014 100644
--- a/website/docs/maintenance/observability/metric-reporters.md
+++ b/website/docs/maintenance/observability/metric-reporters.md
@@ -12,7 +12,7 @@ reporters will be instantiated on each CoordinatorServer and TabletServers when
Example reporter configuration that specifies multiple reporters:
```yaml
-metrics.reporters: jmx,prometheus
+metrics.reporters: jmx,opentelemetry
```
## Push vs. Pull
@@ -57,6 +57,39 @@ An example for such a list would be `cluster_id=fluss1,host=localhost,server_id=
The domain thus identifies a metric class, while the key-property list identifies one (or multiple) instances of that metric.
+### OpenTelemetry
+
+Type: push
+
+
+:::info
+
+The OpenTelemetry metric reporter currently supports OTLP/gRPC only.
+
+:::
+
+Parameters:
+
+- `metrics.reporter.opentelemetry.endpoint` - Target to which the OpenTelemetry metric reporter is going to send metrics to.
+- `metrics.reporter.opentelemetry.export-interval` - (optional) Frequency of metric export by the OpenTelemetry metric reporter to the endpoint. Default: 10s.
+- `metrics.reporter.opentelemetry.export-timeout` - (optional) Maximum time the OpenTelemetry metric reporter will wait for each metric export. Default: 10s.
+
+Example configuration:
+
+```yaml
+metrics.reporters: opentelemetry
+metrics.reporter.opentelemetry.endpoint: http://opentelemetry-collector:4317
+```
+
+Fluss metric types are mapped to OpenTelemetry metric types as follows:
+
+| Fluss | OpenTelemetry | Note |
+|-----------|-------------------------|---------------------------------------------------------------|
+| Counter | LONG_SUM | |
+| Gauge | LONG_GAUGE/DOUBLE_GAUGE | Automatically determined by the type of the Gauge. |
+| Histogram | SUMMARY | Quantiles .5, .75, .95, .99 |
+| Meter | LONG_SUM, DOUBLE_GAUGE | Exports the meter's count (LONG_SUM) and rate (DOUBLE_GAUGE). |
+
### Prometheus
Type: pull
@@ -74,9 +107,9 @@ metrics.reporter.prometheus.port: 9250
Fluss metric types are mapped to Prometheus metric types as follows:
-| Fluss | Prometheus | Note |
-| --------- |------------|------------------------------------------|
-| Counter | Gauge |Prometheus counters cannot be decremented.|
-| Gauge | Gauge |Only numbers and booleans are supported. |
-| Histogram | Summary |Quantiles .5, .75, .95, .98, .99 and .999 |
-| Meter | Gauge |The gauge exports the meter's rate. |
+| Fluss | Prometheus | Note |
+|-----------|------------|--------------------------------------------|
+| Counter | Gauge | Prometheus counters cannot be decremented. |
+| Gauge | Gauge | Only numbers and booleans are supported. |
+| Histogram | Summary | Quantiles .5, .75, .95, .98, .99 and .999 |
+| Meter | Gauge | The gauge exports the meter's rate. |
diff --git a/website/docs/maintenance/observability/quickstart.md b/website/docs/maintenance/observability/quickstart.md
deleted file mode 100644
index 8cee873c89..0000000000
--- a/website/docs/maintenance/observability/quickstart.md
+++ /dev/null
@@ -1,216 +0,0 @@
----
-sidebar_label: Quickstart Guides
-title: Observability Quickstart Guides
-sidebar_position: 1
----
-
-# Observability Quickstart Guides
-
-On this page, you can find the following guides to set up an observability stack **based on the instructions in the [Flink quickstart guide](quickstart/flink.md)**:
-
-- [Observability with Prometheus, Loki and Grafana](#observability-with-prometheus-loki-and-grafana)
-
-## Observability with Prometheus, Loki and Grafana
-
-We provide a minimal quickstart configuration for application observability with Prometheus (metric aggregation system), Loki (log aggregation system) and Grafana (dashboard system).
-
-The quickstart configuration comes with 2 metric dashboards.
-
-- `Fluss – overview`: Selected metrics to observe the overall cluster status
-- `Fluss – detail`: Majority of metrics listed in [metrics list](monitor-metrics.md#metrics-list)
-
-Follow the instructions below to add observability capabilities to your setup.
-
-1. Download the observability quickstart configuration and extract the ZIP archive in your working directory.
-After extracting the archive, the contents of the working directory should be as follows.
-
-```
-├── docker-compose.yml # docker compose manifest from quickstart guide
-└── fluss-quickstart-observability # downloaded and extracted ZIP archive
- ├── grafana
- │ ├── grafana.ini
- │ └── provisioning
- │ ├── dashboards
- │ │ ├── default.yml
- │ │ └── fluss
- │ │ └── ...
- │ └── datatsources
- │ └── default.yml
- ├── prometheus
- │ └── prometheus.yml
- └── slf4j
- └── ...
-```
-
-2. Next, you need to configure Fluss to expose logs to Loki. We will use [Loki4j](https://loki4j.github.io/loki-logback-appender/) which uses Logback as logging backend.
-The container manifest below configures Fluss to use Logback and Loki4j. Save it to a file named `fluss-slf4j-logback.Dockerfile` in your working directory.
-
-```dockerfile
-ARG FLUSS_VERSION
-
-FROM apache/fluss:$FLUSS_DOCKER_VERSION$
-
-# remove default logging backend from classpath and add logback to classpath
-RUN rm -rf ${FLUSS_HOME}/lib/log4j-slf4j-impl-*.jar && \
- wget https://repo1.maven.org/maven2/ch/qos/logback/logback-classic/1.2.13/logback-classic-1.2.13.jar -P ${FLUSS_HOME}/lib/ && \
- wget https://repo1.maven.org/maven2/ch/qos/logback/logback-core/1.2.13/logback-core-1.2.13.jar -P ${FLUSS_HOME}/lib/
-
-# add loki4j logback appender to classpath
-RUN wget https://repo1.maven.org/maven2/com/github/loki4j/loki-logback-appender/1.4.2/loki-logback-appender-1.4.2.jar -P ${FLUSS_HOME}/lib/
-
-# logback configuration that exposes metrics to loki
-COPY fluss-quickstart-observability/slf4j/logback-loki-console.xml ${FLUSS_HOME}/conf/logback-console.xml
-```
-
-:::note
-Detailed configuration instructions for Fluss and Logback can be found [here](logging.md#configuring-logback).
-:::
-
-3. Additionally, you need to adapt the `docker-compose.yml` and
-
-- add containers for Prometheus, Loki and Grafana and mount the corresponding configuration directories.
-- build and use the new Fluss image manifest (`fluss-slf4j-logback.Dockerfile`).
-- configure Fluss to expose metrics via Prometheus.
-- add the desired application name that should be used when displaying logs in Grafana as environment variable (`APP_NAME`).
-- configure Flink to expose metrics via Prometheus.
-
-To do this, you can simply copy the manifest below into your `docker-compose.yml`.
-
-```yaml
-services:
- #begin Fluss cluster
- coordinator-server:
- image: fluss-slf4j-logback:$FLUSS_DOCKER_VERSION$
- build:
- args:
- FLUSS_VERSION: $FLUSS_VERSION$
- dockerfile: fluss-slf4j-logback.Dockerfile
- command: coordinatorServer
- depends_on:
- - zookeeper
- environment:
- - |
- FLUSS_PROPERTIES=
- zookeeper.address: zookeeper:2181
- bind.listeners: FLUSS://coordinator-server:9123
- remote.data.dir: /tmp/fluss/remote-data
- datalake.format: paimon
- datalake.paimon.metastore: filesystem
- datalake.paimon.warehouse: /tmp/paimon
- metrics.reporters: prometheus
- metrics.reporter.prometheus.port: 9250
- logback.configurationFile: logback-loki-console.xml
- - APP_NAME=coordinator-server
- tablet-server:
- image: fluss-slf4j-logback:$FLUSS_DOCKER_VERSION$
- build:
- args:
- FLUSS_VERSION: $FLUSS_VERSION$
- dockerfile: fluss-slf4j-logback.Dockerfile
- command: tabletServer
- depends_on:
- - coordinator-server
- environment:
- - |
- FLUSS_PROPERTIES=
- zookeeper.address: zookeeper:2181
- bind.listeners: FLUSS://tablet-server:9123
- data.dir: /tmp/fluss/data
- remote.data.dir: /tmp/fluss/remote-data
- kv.snapshot.interval: 0s
- datalake.format: paimon
- datalake.paimon.metastore: filesystem
- datalake.paimon.warehouse: /tmp/paimon
- metrics.reporters: prometheus
- metrics.reporter.prometheus.port: 9250
- logback.configurationFile: logback-loki-console.xml
- - APP_NAME=tablet-server
- zookeeper:
- restart: always
- image: zookeeper:3.9.2
- #end
- #begin Flink cluster
- jobmanager:
- image: apache/fluss-quickstart-flink:1.20-$FLUSS_DOCKER_VERSION$
- ports:
- - "8083:8081"
- command: jobmanager
- environment:
- - |
- FLINK_PROPERTIES=
- jobmanager.rpc.address: jobmanager
- metrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory
- metrics.reporter.prom.port: 9250
- volumes:
- - shared-tmpfs:/tmp/paimon
- taskmanager:
- image: apache/fluss-quickstart-flink:1.20-$FLUSS_DOCKER_VERSION$
- depends_on:
- - jobmanager
- command: taskmanager
- environment:
- - |
- FLINK_PROPERTIES=
- jobmanager.rpc.address: jobmanager
- taskmanager.numberOfTaskSlots: 10
- taskmanager.memory.process.size: 2048m
- taskmanager.memory.framework.off-heap.size: 256m
- metrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory
- metrics.reporter.prom.port: 9250
- volumes:
- - shared-tmpfs:/tmp/paimon
- #end
- #begin observability
- prometheus:
- image: bitnami/prometheus:2.55.1-debian-12-r0
- ports:
- - "9092:9090"
- volumes:
- - ./fluss-quickstart-observability/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
- loki:
- image: grafana/loki:3.3.2
- ports:
- - "3102:3100"
- grafana:
- image:
- grafana/grafana:11.4.0
- ports:
- - "3002:3000"
- depends_on:
- - prometheus
- - loki
- volumes:
- - ./fluss-quickstart-observability/grafana:/etc/grafana:ro
- #end
-
-volumes:
- shared-tmpfs:
- driver: local
- driver_opts:
- type: "tmpfs"
- device: "tmpfs"
-```
-
-Then run
-
-```shell
-# note the --build flag!
-docker compose up -d --build
-```
-
-to apply the changes.
-
-:::warning
-This recreates `shared-tmpfs` and all data is lost (created tables, running jobs, etc.)
-:::
-
-Make sure that the modified and added containers are up and running using
-
-```shell
-docker container ls -a
-```
-
-4. Now you are all set! You can visit
-
-- Grafana to view Fluss logs with the [log explorer](http://localhost:3002/a/grafana-lokiexplore-app/) and observe metrics of the Fluss and Flink cluster with the [provided dashboards](http://localhost:3002/dashboards) or
-- the [Prometheus Web UI](http://localhost:9092) to directly query Prometheus with [PromQL](https://prometheus.io/docs/prometheus/2.55/getting_started/).
\ No newline at end of file
diff --git a/website/docs/quickstart/flink.md b/website/docs/quickstart/flink.md
index 8faf764a3a..b4d3bab18d 100644
--- a/website/docs/quickstart/flink.md
+++ b/website/docs/quickstart/flink.md
@@ -135,7 +135,6 @@ to check whether all containers are running properly.
You can also visit http://localhost:8083/ to see if Flink is running normally.
:::note
-- If you want to additionally use an observability stack, follow one of the provided quickstart guides [here](maintenance/observability/quickstart.md) and then continue with this guide.
- If you want to run with your own Flink environment, remember to download the [fluss-flink connector jar](/downloads), [flink-connector-faker](https://github.com/knaufk/flink-faker/releases), [paimon-flink connector jar](https://paimon.apache.org/docs/1.0/flink/quick-start/) and then put them to `FLINK_HOME/lib/`.
- All the following commands involving `docker compose` should be executed in the created working directory that contains the `docker-compose.yml` file.
:::
@@ -508,4 +507,4 @@ docker compose down -v
to stop all containers.
## Learn more
-Now that you're up and running with Fluss and Flink, check out the [Apache Flink Engine](engine-flink/getting-started.md) docs to learn more features with Flink or [this guide](/maintenance/observability/quickstart.md) to learn how to set up an observability stack for Fluss and Flink.
+Now that you're up and running with Fluss and Flink, you may want to check out the [Apache Flink Engine](engine-flink/getting-started.md) docs to learn more features with Flink.
diff --git a/website/docs/quickstart/monitoring-observability.md b/website/docs/quickstart/monitoring-observability.md
new file mode 100644
index 0000000000..ca284426e5
--- /dev/null
+++ b/website/docs/quickstart/monitoring-observability.md
@@ -0,0 +1,422 @@
+---
+title: Cluster Monitoring and Observability
+sidebar_position: 4
+---
+
+# Cluster Monitoring and Observability
+
+This guide will show you how to set up a monitoring/observability stack for Fluss.
+It is based on the [Flink Quickstart Guide](flink.md), but the principles for other query engines are the same.
+
+We provide instructions for two different setups:
+
+- [Cluster Monitoring (Metrics, Logs) with Prometheus and Loki](#cluster-monitoring-metrics-logs-with-prometheus-and-loki)
+- [Cluster Monitoring (Metrics, Logs) with OpenTelemetry](#cluster-monitoring-metrics-and-logs-with-opentelemetry)
+
+The setups primarily differ in the way telemetry data is reported to the respective telemetry backends.
+While the first setup is tightly coupled with the chosen stack ([Prometheus](https://prometheus.io/), [Loki](https://grafana.com/oss/loki/)) and directly integrates with the telemetry backends, [OpenTelemetry](https://opentelemetry.io/) is vendor-neutral and uses an intermediate collector that decouples Fluss from telemetry backends.
+
+Both setups use [Grafana](https://grafana.com/oss/grafana/) as a visualization frontend. We provide 2 metric dashboards out-of-the-box:
+
+- `Fluss – overview`: Selected metrics to get insights into the overall cluster status
+- `Fluss – detail`: Majority of metrics listed in [metrics list](maintenance/observability/monitor-metrics.md#metrics-list)
+
+All used components are publicly available as open source software.
+
+:::note
+- This guide aims at getting you up and running with a _minimal_ setup for cluster monitoring/observability. For production use, you need to adapt the setup accordingly, especially with respect to security-related configurations.
+- Only Fluss cluster components are instrumented. If you want to instrument other parts of your setup, e.g., Flink, please refer the corresponding documentation.
+- We highly encourage you to use [OpenTelemetry](https://opentelemetry.io/), as it is vendor-neutral and can scale to large deployments.
+:::
+
+
+
+## Preparation
+
+In this section, you can find the common preparation steps that apply to all setups.
+
+1. Download the observability quickstart configuration.
+2. Extract the ZIP archive in your working directory.
+3. After extracting the archive, the contents of your working directory should be as follows.
+
+```
+├── docker-compose.yml # Docker compose manifest from the Flink quickstart guide
+└── fluss-quickstart-observability # Downloaded and extracted ZIP archive (has to have the exact name shown)
+ ├── grafana
+ │ ├── grafana.ini
+ │ └── provisioning
+ │ ├── dashboards
+ │ │ ├── default.yml
+ │ │ └── fluss
+ ├── fluss-detail.json
+ │ │ └── fluss-overview.json
+ │ └── datatsources
+ │ └── default.yml
+ ├── opentelemetry
+ │ ├── opentelemetry.yml
+ │ └── opentelemetry-javaagent.properties
+ ├── prometheus
+ │ ├── prometheus-direct.yml
+ │ └── prometheus-opentelemetry.yml
+ └── slf4j
+ ├── log4j-opentelemetry-console.properties
+ └── logback-loki-console.xml
+```
+
+
+
+## Cluster Monitoring (Metrics, Logs) with Prometheus and Loki
+
+This section will show you how to monitor your cluster with [Prometheus](https://prometheus.io/) (metric aggregation system) and [Loki](https://grafana.com/oss/loki/) (log aggregation system).
+
+1. First, you need to configure Fluss to expose logs to Loki. We will use [Loki4j](https://loki4j.github.io/loki-logback-appender/) which uses Logback as logging backend.
+The container manifest below configures the Fluss image to use Logback and adds the Loki4j Logback appender to the classpath. Save it to a file named `fluss-slf4j-logback.Dockerfile` in your working directory.
+
+```dockerfile
+FROM apache/fluss:$FLUSS_DOCKER_VERSION$
+
+# remove default logging backend from classpath and add logback to classpath
+RUN rm -rf ${FLUSS_HOME}/lib/log4j-slf4j-impl-*.jar && \
+ wget https://repo1.maven.org/maven2/ch/qos/logback/logback-classic/1.2.13/logback-classic-1.2.13.jar -P ${FLUSS_HOME}/lib/ && \
+ wget https://repo1.maven.org/maven2/ch/qos/logback/logback-core/1.2.13/logback-core-1.2.13.jar -P ${FLUSS_HOME}/lib/
+
+# add loki4j logback appender to classpath
+RUN wget https://repo1.maven.org/maven2/com/github/loki4j/loki-logback-appender/1.4.2/loki-logback-appender-1.4.2.jar -P ${FLUSS_HOME}/lib/
+```
+
+:::note
+Detailed configuration instructions for Fluss and Logback can be found [here](maintenance/observability/logging.md#configuring-logback).
+:::
+
+2. Next, you need to adapt the `docker-compose.yml` manifest and
+
+- build and use the new Fluss image manifest (`fluss-sfl4j-logback.Dockerfile`).
+- configure Fluss to expose metrics via Prometheus.
+- add the desired application name that should be used when displaying Fluss logs in Grafana as environment variable (`APP_NAME`).
+- add containers for Prometheus, Loki and Grafana.
+- mount the corresponding configuration files.
+
+You can simply copy the manifest below into `docker-compose.yml`.
+
+```yaml
+services:
+ #begin Fluss cluster
+ coordinator-server:
+ image: fluss-slf4j-logback:$FLUSS_DOCKER_VERSION$
+ build:
+ dockerfile: fluss-slf4j-logback.Dockerfile
+ command: coordinatorServer
+ depends_on:
+ - zookeeper
+ environment:
+ - |
+ FLUSS_PROPERTIES=
+ zookeeper.address: zookeeper:2181
+ bind.listeners: FLUSS://coordinator-server:9123
+ remote.data.dir: /tmp/fluss/remote-data
+ datalake.format: paimon
+ datalake.paimon.metastore: filesystem
+ datalake.paimon.warehouse: /tmp/paimon
+ metrics.reporters: prometheus
+ metrics.reporter.prometheus.port: 9250
+ - APP_NAME=coordinator-server
+ volumes:
+ - ./fluss-quickstart-observability/slf4j/logback-loki-console.xml:/opt/fluss/conf/logback-console.xml:ro
+ tablet-server:
+ image: fluss-slf4j-logback:$FLUSS_DOCKER_VERSION$
+ build:
+ dockerfile: fluss-slf4j-logback.Dockerfile
+ command: tabletServer
+ depends_on:
+ - coordinator-server
+ environment:
+ - |
+ FLUSS_PROPERTIES=
+ zookeeper.address: zookeeper:2181
+ bind.listeners: FLUSS://tablet-server:9123
+ data.dir: /tmp/fluss/data
+ remote.data.dir: /tmp/fluss/remote-data
+ kv.snapshot.interval: 0s
+ datalake.format: paimon
+ datalake.paimon.metastore: filesystem
+ datalake.paimon.warehouse: /tmp/paimon
+ metrics.reporters: prometheus
+ metrics.reporter.prometheus.port: 9250
+ - APP_NAME=tablet-server
+ volumes:
+ - ./fluss-quickstart-observability/slf4j/logback-loki-console.xml:/opt/fluss/conf/logback-console.xml:ro
+ zookeeper:
+ restart: always
+ image: zookeeper:3.9.2
+ #end
+ #begin Flink cluster
+ jobmanager:
+ image: apache/fluss-quickstart-flink:1.20-$FLUSS_DOCKER_VERSION$
+ ports:
+ - "8083:8081"
+ command: jobmanager
+ environment:
+ - |
+ FLINK_PROPERTIES=
+ jobmanager.rpc.address: jobmanager
+ volumes:
+ - shared-tmpfs:/tmp/paimon
+ taskmanager:
+ image: apache/fluss-quickstart-flink:1.20-$FLUSS_DOCKER_VERSION$
+ depends_on:
+ - jobmanager
+ command: taskmanager
+ environment:
+ - |
+ FLINK_PROPERTIES=
+ jobmanager.rpc.address: jobmanager
+ taskmanager.numberOfTaskSlots: 10
+ taskmanager.memory.process.size: 2048m
+ taskmanager.memory.framework.off-heap.size: 256m
+ volumes:
+ - shared-tmpfs:/tmp/paimon
+ #end
+ #begin monitoring
+ prometheus:
+ image: bitnami/prometheus:2.55.1-debian-12-r0
+ ports:
+ - "9092:9090"
+ volumes:
+ - ./fluss-quickstart-observability/prometheus/prometheus-direct.yml:/etc/prometheus/prometheus.yml:ro
+ loki:
+ image: grafana/loki:3.3.2
+ ports:
+ - "3102:3100"
+ grafana:
+ image:
+ grafana/grafana:11.4.0
+ ports:
+ - "3002:3000"
+ depends_on:
+ - prometheus
+ - loki
+ volumes:
+ - ./fluss-quickstart-observability/grafana:/etc/grafana:ro
+ #end
+
+volumes:
+ shared-tmpfs:
+ driver: local
+ driver_opts:
+ type: "tmpfs"
+ device: "tmpfs"
+```
+
+3. Run
+
+```shell
+# note the --build flag!
+docker compose up -d --build
+```
+
+to apply the changes.
+
+:::warning
+This recreates `shared-tmpfs` and all data is lost (created tables, running jobs, etc.)
+:::
+
+Make sure that the modified and added containers are up and running using
+
+```shell
+docker container ls -a
+```
+
+4. Now you are all set! You can visit
+
+- Grafana to view Fluss logs with the [log explorer](http://localhost:3002/a/grafana-lokiexplore-app/) and observe metrics of the Fluss and Flink cluster with the [provided dashboards](http://localhost:3002/dashboards) or
+- the [Prometheus Web UI](http://localhost:9092) to directly query Prometheus with [PromQL](https://prometheus.io/docs/prometheus/2.55/getting_started/).
+
+
+
+## Cluster Monitoring (Metrics and Logs) with OpenTelemetry
+
+This section will show you how to monitor your cluster with [OpenTelemetry](https://opentelemetry.io/).
+
+OpenTelemetry is a vendor-neutral collection of APIs, SDKs and tools that allows to you to instrument your application to emit telemetry data.
+However, OpenTelemetry does not come with an integrated observability stack.
+Instead, it lets you choose any vendor that has [support for OpenTelemetry](https://opentelemetry.io/ecosystem/vendors/).
+For demonstration purposes, we will use [Prometheus](https://prometheus.io/) (metric aggregation system) and [Loki](https://grafana.com/oss/loki/) (log aggregation system).
+
+
+1. First, you need to download the [opentelemetry-javaagent](https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/download/v2.17.0/opentelemetry-javaagent.jar) into your working directory.
+
+The Java Agent offers zero-code instrumentation of telemetry data for many [popular libraries and frameworks](https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/docs/supported-libraries.md) without requiring any changes to the application code.
+We use the Java Agent to automatically emit Log4j **logs** to the OpenTelemetry collector.
+For **metrics**, we use the dedicated [Fluss OpenTelemetry metric reporter](maintenance/observability/metric-reporters.md#opentelemetry).
+
+2. Next, you need to adapt the `docker-compose.yml` manifest and
+
+- configure Fluss to expose metrics via OpenTelemetry.
+- set the corresponding [configuration options for the OpenTelemetry Java Agent](https://opentelemetry.io/docs/zero-code/java/agent/configuration/) and attach the agent to the Fluss application.
+- add containers for the OpenTelemetry Collector, Prometheus, Loki and Grafana.
+- mount the corresponding configuration files.
+
+**Note:** OpenTelemetry can be deployed in different [modes](https://opentelemetry.io/docs/collector/deployment/).
+In this guide, we will use the [agent collector deployment pattern](https://opentelemetry.io/docs/collector/deployment/agent/).
+
+You can simply copy the manifest below into `docker-compose.yml`.
+
+```yaml
+services:
+ #begin Fluss cluster
+ coordinator-server:
+ image: apache/fluss:$FLUSS_DOCKER_VERSION$
+ command: coordinatorServer
+ depends_on:
+ - zookeeper
+ environment:
+ - |
+ FLUSS_PROPERTIES=
+ zookeeper.address: zookeeper:2181
+ bind.listeners: FLUSS://coordinator-server:9123
+ remote.data.dir: /tmp/fluss/remote-data
+ datalake.format: paimon
+ datalake.paimon.metastore: filesystem
+ datalake.paimon.warehouse: /tmp/paimon
+ metrics.reporters: opentelemetry
+ metrics.reporter.opentelemetry.endpoint: http://opentelemetry-collector:4317
+ metrics.reporter.opentelemetry.service.name: coordinator-server
+ metrics.reporter.opentelemetry.service.version: $FLUSS_DOCKER_VERSION$
+ - OTEL_SERVICE_NAME=coordinator-server
+ - OTEL_SERVICE_VERSION=$FLUSS_DOCKER_VERSION$
+ - OTEL_JAVAAGENT_CONFIGURATION_FILE=/etc/otel/opentelemetry-javaagent.properties
+ - JAVA_TOOL_OPTIONS="-javaagent:/opt/opentelemetry-javaagent.jar"
+ volumes:
+ - ./fluss-quickstart-observability/opentelemetry/opentelemetry-javaagent.properties:/etc/otel/opentelemetry-javaagent.properties:ro
+ - ./fluss-quickstart-observability/slf4j/log4j-opentelemetry-console.properties:/opt/fluss/conf/log4j-console.properties:ro
+ - ./opentelemetry-javaagent.jar:/opt/opentelemetry-javaagent.jar:ro
+ tablet-server:
+ image: apache/fluss:$FLUSS_DOCKER_VERSION$
+ command: tabletServer
+ depends_on:
+ - coordinator-server
+ environment:
+ - |
+ FLUSS_PROPERTIES=
+ zookeeper.address: zookeeper:2181
+ bind.listeners: FLUSS://tablet-server:9123
+ data.dir: /tmp/fluss/data
+ remote.data.dir: /tmp/fluss/remote-data
+ kv.snapshot.interval: 0s
+ datalake.format: paimon
+ datalake.paimon.metastore: filesystem
+ datalake.paimon.warehouse: /tmp/paimon
+ metrics.reporters: opentelemetry
+ metrics.reporter.opentelemetry.endpoint: http://opentelemetry-collector:4317
+ metrics.reporter.opentelemetry.service.name: tablet-server
+ metrics.reporter.opentelemetry.service.version: $FLUSS_DOCKER_VERSION$
+ - OTEL_SERVICE_NAME=tablet-server
+ - OTEL_SERVICE_VERSION=$FLUSS_DOCKER_VERSION$
+ - OTEL_JAVAAGENT_CONFIGURATION_FILE=/etc/otel/opentelemetry-javaagent.properties
+ - JAVA_TOOL_OPTIONS="-javaagent:/opt/opentelemetry-javaagent.jar"
+ volumes:
+ - ./fluss-quickstart-observability/opentelemetry/opentelemetry-javaagent.properties:/etc/otel/opentelemetry-javaagent.properties:ro
+ - ./fluss-quickstart-observability/slf4j/log4j-opentelemetry-console.properties:/opt/fluss/conf/log4j-console.properties:ro
+ - ./opentelemetry-javaagent.jar:/opt/opentelemetry-javaagent.jar:ro
+ zookeeper:
+ restart: always
+ image: zookeeper:3.9.2
+ #end
+ #begin Flink cluster
+ jobmanager:
+ image: apache/fluss-quickstart-flink:1.20-$FLUSS_DOCKER_VERSION$
+ ports:
+ - "8083:8081"
+ command: jobmanager
+ environment:
+ - |
+ FLINK_PROPERTIES=
+ jobmanager.rpc.address: jobmanager
+ volumes:
+ - shared-tmpfs:/tmp/paimon
+ taskmanager:
+ image: apache/fluss-quickstart-flink:1.20-$FLUSS_DOCKER_VERSION$
+ depends_on:
+ - jobmanager
+ command: taskmanager
+ environment:
+ - |
+ FLINK_PROPERTIES=
+ jobmanager.rpc.address: jobmanager
+ taskmanager.numberOfTaskSlots: 10
+ taskmanager.memory.process.size: 2048m
+ taskmanager.memory.framework.off-heap.size: 256m
+ volumes:
+ - shared-tmpfs:/tmp/paimon
+ #end
+ #begin monitoring
+ opentelemetry-collector:
+ image: otel/opentelemetry-collector:0.128.0
+ command: "--config=/etc/otel/config.yml"
+ ports:
+ - "55681:55679"
+ volumes:
+ - ./fluss-quickstart-observability/opentelemetry/opentelemetry.yml:/etc/otel/config.yml:ro
+ prometheus:
+ image: bitnami/prometheus:2.55.1-debian-12-r0
+ ports:
+ - "9092:9090"
+ depends_on:
+ - opentelemetry-collector
+ volumes:
+ - ./fluss-quickstart-observability/prometheus/prometheus-opentelemetry.yml:/etc/prometheus/prometheus.yml:ro
+ loki:
+ # Do NOT use loki 2.x or older with OpenTelemetry, as this might require additional configuration!
+ image: grafana/loki:3.3.2
+ depends_on:
+ - opentelemetry-collector
+ ports:
+ - "3102:3100"
+ grafana:
+ image:
+ grafana/grafana:11.4.0
+ ports:
+ - "3002:3000"
+ depends_on:
+ - prometheus
+ - loki
+ volumes:
+ - ./fluss-quickstart-observability/grafana:/etc/grafana:ro
+ #end
+
+volumes:
+ shared-tmpfs:
+ driver: local
+ driver_opts:
+ type: "tmpfs"
+ device: "tmpfs"
+```
+
+:::warning
+The `OTEL_SERVICE_NAME` and `OTEL_SERVICE_VERSION` configuration and their equivalents (e.g., in a configuration file) only apply to the agent. If you want to configure them for the Fluss OpenTelemetry Metric Reporter, you have to set them using the respective configuration options.
+:::
+
+3. Run
+
+```shell
+docker compose up -d
+```
+
+to apply the changes.
+
+:::warning
+This recreates `shared-tmpfs` and all data is lost (created tables, running jobs, etc.)
+:::
+
+Make sure that the modified and added containers are up and running using
+
+```shell
+docker container ls -a
+```
+
+and also make sure that the OpenTelemetry Collector is up and running by vising the [health endpoint](http://localhost:55681/debug/servicez).
+
+4. Now you are all set! You can visit
+
+- Grafana to view Fluss logs with the [log explorer](http://localhost:3002/a/grafana-lokiexplore-app/) and observe metrics of the Fluss and Flink cluster with the [provided dashboards](http://localhost:3002/dashboards) or
+- the [Prometheus Web UI](http://localhost:9092) to directly query Prometheus with [PromQL](https://prometheus.io/docs/prometheus/2.55/getting_started/).
diff --git a/website/docs/quickstart/security.md b/website/docs/quickstart/security.md
index 31ac59de7d..b975457d1e 100644
--- a/website/docs/quickstart/security.md
+++ b/website/docs/quickstart/security.md
@@ -1,6 +1,6 @@
---
title: Secure Your Fluss Cluster
-sidebar_position: 2
+sidebar_position: 3
---