Skip to content

Commit

Permalink
Opensearch otel demo observability tutorial (#5)
Browse files Browse the repository at this point in the history
* add tutorial for using observability and diagnostic of a real use case



* adding otel landing page markdown dashboard


* adding tutorial markdown dashboards and documents


* adding Observability Introduction.md tutorial markdown dashboards and documents


* add dependencies to the docker-compose.yml


* exposing docker ports locally

Signed-off-by: YANGDB <yang.db.dev@gmail.com>

---------

Signed-off-by: YANGDB <yang.db.dev@gmail.com>
  • Loading branch information
YANG-DB committed Jun 2, 2023
1 parent 9d1b3ea commit ec75650
Show file tree
Hide file tree
Showing 17 changed files with 470 additions and 9 deletions.
17 changes: 10 additions & 7 deletions .github/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ The following guide describes how to setup the OpenTelemetry demo with OpenSearc
```bash
git clone https://github.com/opensearch/opentelemetry-demo.git
cd opentelemetry-demo
docker-compose up -d
docker compose up -d
```

### Services
Expand All @@ -33,22 +33,25 @@ OpenSearch has [documented](https://opensearch.org/docs/latest/observing-your-da
The next instructions are similar and use the same docker compose file.
1. Start the demo with the following command from the repository's root directory:
```
docker-compose up -d
docker compose up -d
```
**Note:** The docker-compose `--no-build` flag is used to fetch released docker images from [ghcr](http://ghcr.io/open-telemetry/demo) instead of building from source.
**Note:** The docker compose `--no-build` flag is used to fetch released docker images from [ghcr](http://ghcr.io/open-telemetry/demo) instead of building from source.
Removing the `--no-build` command line option will rebuild all images from source. It may take more than 20 minutes to build if the flag is omitted.

### Explore and analyze the data With OpenSearch Observability
Review revised OpenSearch [Observability Architecture](architecture.md)

### Service map
### Start learning OpenSearch Observability using our tutorial
[Getting started Tutorial](../tutorial/README.md)

#### Service map
![Service map](https://docs.aws.amazon.com/images/opensearch-service/latest/developerguide/images/ta-dashboards-services.png)

### Traces
#### Traces
![Traces](https://opensearch.org/docs/2.6/images/ta-trace.png)

### Correlation
#### Correlation
![Correlation](https://opensearch.org/docs/latest/images/observability-trace.png)

### Logs
#### Logs
![Logs](https://opensearch.org/docs/latest/images/trace_log_correlation.gif)
5 changes: 5 additions & 0 deletions .github/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,11 @@ Backend supportive services
- See [description](../src/featureflagservice/README.md)
- [Grafana](http://grafana:3000)
- See [description](https://github.com/YANG-DB/opentelemetry-demo/blob/12d52cbb23bbf4226f6de2dfec840482a0a7d054/docker-compose.yml#L637)

### Services Topology
The next diagram shows the docker compose services dependencies

![](img/docker-services-topology.png)
---

## Purpose
Expand Down
Binary file added .github/img/DemoFlow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added .github/img/docker-services-topology.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -296,6 +296,7 @@ services:
- OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE
- WEB_OTEL_SERVICE_NAME=frontend-web
depends_on:
- accountingservice
- adservice
- cartservice
- checkoutservice
Expand Down Expand Up @@ -770,6 +771,8 @@ services:
OPENSEARCH_HOSTS: '["https://opensearch-node1:9200","https://opensearch-node2:9200"]' # Define the OpenSearch nodes that OpenSearch Dashboards will query
depends_on:
- opensearch-node1
- opensearch-node2
- prometheus

# Observability OSD Integrations
integrations:
Expand Down
4 changes: 2 additions & 2 deletions src/currencyservice/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,15 @@ To build the currency service, run the following from root directory
of opentelemetry-demo

```sh
docker-compose build currencyservice
docker compose build currencyservice
```

## Run the service

Execute the below command to run the service.

```sh
docker-compose up currencyservice
docker compose up currencyservice
```

## Run the client
Expand Down
3 changes: 3 additions & 0 deletions src/integrations/display/demo-landing-page.ndjson
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{"attributes":{"description":"OTEL demo landing page","kibanaSavedObjectMeta":{"searchSourceJSON":"{\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filter\":[]}"},"title":"OTEL demo landing page","uiStateJSON":"{}","version":1,"visState":"{\"title\":\"OTEL demo landing page\",\"type\":\"markdown\",\"aggs\":[],\"params\":{\"fontSize\":12,\"openLinksInNewTab\":false,\"markdown\":\"\\n![](https://raw.githubusercontent.com/opensearch-project/.github/main/profile/banner.jpg)\\n# OpenSearch Observability OTEL Demo\\n\\nWelcome to the [OpenSearch](https://opensearch.org/docs/latest) OpenTelemetry [Demo](https://opentelemetry.io/docs/demo/) documentation, which covers how to install and run the demo, and some scenarios you can use to view OpenTelemetry in action.\\n\\n## Purpose\\nThe purpose of this demo is to demonstrate the different capabilities of OpenSearch Observability to investigate and reflect your system.\\n\\n![](../../../.github/img/DemoFlow.png)\\n\\n### Services\\n[OTEL DEMO](https://opentelemetry.io/docs/demo/services/) Describes the list of services that are composing the Astronomy Shop.\\n\\nThe main services that are open to user interactions:\\n\\n- [Dashboards](https://observability.playground.opensearch.org/)\\n\\n- [Demo Proxy](https://observability.playground.demo-proxy.opensearch.org/)\\n\\n- [Demo loader](https://observability.playground.demo-loader.opensearch.org/)\\n\\n- [Demo feature-flag](https://observability.playground.demo-feature-flag.opensearch.org/)\\n\\n### Screenshots\\n![](https://opentelemetry.io/docs/demo/screenshots/frontend-1.png)\\n\\n_**The shopping App**_\\n![](https://opentelemetry.io/docs/demo/screenshots/frontend-2.png)\\n\\n_**The feature flag**_\\n![](https://opentelemetry.io/docs/demo/screenshots/feature-flag-ui.png)\\n\\n_**The load generator**_\\n![](https://opentelemetry.io/docs/demo/screenshots/load-generator-ui.png)\\n\\n---\\n### Ingestion\\nThe ingestion capabilities for OpenSearch is to be able to support multiple pipelines:\\n- [Data-Prepper](https://github.com/opensearch-project/data-prepper/) is an OpenSearch ingestion project that allows ingestion of OTEL standard signals using Otel-Collector\\n- [Jaeger](https://opensearch.org/docs/latest/observing-your-data/trace/trace-analytics-jaeger/) is an ingestion framework which has a build in capability for pushing OTEL signals into OpenSearch\\n- [Fluent-Bit](https://docs.fluentbit.io/manual/pipeline/outputs/opensearch) is an ingestion framework which has a build in capability for pushing OTEL signals into OpenSearch\\n\\n### Integrations -\\nThe integration service is a list of pre-canned assets that are loaded in a combined manner to allow users the ability for simple and automatic way to discover and review their services topology.\\n\\nThese (demo-sample) integrations contain the following assets:\\n- components & index template mapping\\n- datasources\\n- data-stream & indices\\n- queries\\n- dashboards\\n \\n\"}}"},"id":"dd4bebe0-f66a-11ed-9518-f5d5eb1d70bf","migrationVersion":{"visualization":"7.10.0"},"references":[],"type":"visualization","updated_at":"2023-05-19T17:30:35.804Z","version":"WzM0LDJd"}
{"attributes":{"description":"OTEL demo landing page","hits":0,"kibanaSavedObjectMeta":{"searchSourceJSON":"{\"query\":{\"language\":\"kuery\",\"query\":\"\"},\"filter\":[]}"},"optionsJSON":"{\"hidePanelTitles\":false,\"useMargins\":true}","panelsJSON":"[{\"version\":\"2.7.0\",\"gridData\":{\"x\":0,\"y\":0,\"w\":24,\"h\":15,\"i\":\"0e0c418a-81f8-4d85-8ba7-8d8ef6e2b1d7\"},\"panelIndex\":\"0e0c418a-81f8-4d85-8ba7-8d8ef6e2b1d7\",\"embeddableConfig\":{},\"panelRefName\":\"panel_0\"}]","timeRestore":false,"title":"OTEL demo landing page","version":1},"id":"e66e2da0-f66a-11ed-9518-f5d5eb1d70bf","migrationVersion":{"dashboard":"7.9.3"},"references":[{"id":"dd4bebe0-f66a-11ed-9518-f5d5eb1d70bf","name":"panel_0","type":"visualization"}],"type":"dashboard","updated_at":"2023-05-19T17:30:51.130Z","version":"WzM1LDJd"}
{"exportedCount":2,"missingRefCount":0,"missingReferences":[]}
3 changes: 3 additions & 0 deletions src/integrations/display/memory-leak-tutorial.ndjson
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{"attributes":{"description":"this Pattern present a memory leak diagnostic procedure tutorial","kibanaSavedObjectMeta":{"searchSourceJSON":"{\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filter\":[]}"},"title":"mem-leak-diagnostic","uiStateJSON":"{}","version":1,"visState":"{\"title\":\"mem-leak-diagnostic\",\"type\":\"markdown\",\"aggs\":[],\"params\":{\"fontSize\":12,\"openLinksInNewTab\":false,\"markdown\":\"# Memory Leak Investigation Tutorial\\n\\n## Tutorial Definition\\n\\nThe following tutorial describes Using Metrics and Traces to diagnose a memory leak\\nApplication telemetry, such as the kind that OpenTelemetry can provide, is very useful for diagnosing issues in a\\ndistributed system. In this scenario, we will walk through a scenario demonstrating how to move from high-level metrics\\nand traces to determine the cause of a memory leak.\\n\\n## Diagnosis\\n\\nThe first step in diagnosing a problem is to determine that a problem exists. Often the first stop will be a metrics\\ndashboard provided by a tool such as metrics analytics under open search observability.\\n\\n## Dashboards\\n\\nThis tutorial contains the OTEL demo dashboards with a number of charts:\\n\\n- Recommendation Service (CPU% and Memory)\\n- Service Latency (from SpanMetrics)\\n- Error Rate\\n\\nRecommendation Service charts are generated from OpenTelemetry Metrics exported to Prometheus, while the Service Latency\\nand Error Rate charts are generated through the OpenTelemetry Collector Span Metrics processor.\\n\\nFrom our dashboard, we can see that there seems to be anomalous behavior in the recommendation service – spiky CPU\\nutilization, as well as long tail latency in our p95, 99, and 99.9 histograms. We can also see that there are\\nintermittent spikes in the memory utilization of this service.\\nWe know that we’re emitting trace data from our application as well, so let’s think about another way that we’d be able\\nto determine that a problem exist.\\n\\n### Traces exploration\\n\\nOpenSearch Observability Trace analytics allows us to search for traces and display the end-to-end latency of an entire\\nrequest with visibility into each individual part of the overall request. Perhaps we noticed an increase in tail latency\\non our frontend requests. Traces dashboard allows us to then search and filter our traces to include only those that\\ninclude requests to recommendation service.\\n\\nBy sorting by latency, we’re able to quickly find specific traces that took a long time. Clicking on a trace in the\\nright panel, we’re able to view the waterfall view.\\nWe can see that the recommendation service is taking a long time to complete its work, and viewing the details allows us\\nto get a better idea of what’s going on.\\n\\n### Confirming the Diagnosis\\n\\nWe can see in our waterfall view that the app.cache_hit attribute is set to false, and that the `app.products.count` value\\nis extremely high.\\n\\nReturning to the search UI, filter to `recommendationservice` in the Service dropdown, and search for app.cache_hit=true\\nin the Tags box.\\n\\nNotice that requests tend to be faster when the cache is hit. Now search for `app.cache_hit=false` and compare the\\nlatency.\\n\\nYou should notice some changes in the visualization at the top of the trace list.\\n\\nNow, since this is a contrived scenario, we know where to find the underlying bug in our code. However, in a real-world\\nscenario, we may need to perform further searching to find out what’s going on in our code, or the interactions between\\nservices that cause it.\\n\\n### SOP flow context aware\\n\\nThe next diagram shows the context aware phases within this SOP.\\n\\nThe user can be shown the summary of the flow for solving his issue and in addition can focus on the actual step he is\\ncurrently performing.\\n\\nThe overall process is mapped into a **state machine** in-which each step has a state with a **transition**.\\n\\nWhen user goes into a different **scope** (`time based` ,`service based`, `log based`) this is defined as a indexed Context (`Ctx[1]`,`Ctx[2]`,...)\\n\\n---\\n\\nThis sequence outlines a process for investigating memory leaks that begins with gathering service data from both Prometheus and OpenSearch. Upon combining and reviewing latency of these services, an anomaly detection leads to a review of service traces, followed by log correlation, log fetching, and eventually an overlay of logs to highlight differences.\\n\\n```mermaid\\n Info[Memory Leak Investigation]\\n |\\n V\\nGet All Services --> Query?[Prometheus]\\n | |\\n | V\\n |--> Query?[OpenSearch]\\n | |\\n V V\\nCombine --> Review[Services Latency]\\n |\\n V\\nIdentify Anomaly --> Query?[Service@traces]\\n | |\\n | V\\n |--> Time Based --> Review[Services traces]\\n | |\\n V V\\nWhats Next? --> Suggest[Correlation with logs]\\n | |\\n | V\\n |--> Fetch Logs --> Review[logs]\\n | |\\n V V\\nWhats Next? --> Suggest[logs overlay]\\n | |\\n | V\\n |--> Fetch Logs --> Review[logs diff]\\n | |\\n V V\\nEnd <------------------ End\\n\\n```\\n\"}}"},"id":"92546710-f751-11ed-b6d0-850581e4a72d","migrationVersion":{"visualization":"7.10.0"},"references":[],"type":"visualization","updated_at":"2023-05-20T21:02:03.776Z","version":"WzUxLDVd"}
{"attributes":{"description":"this Pattern present a memory leak diagnostic procedure tutorial","hits":0,"kibanaSavedObjectMeta":{"searchSourceJSON":"{\"query\":{\"language\":\"kuery\",\"query\":\"\"},\"filter\":[]}"},"optionsJSON":"{\"hidePanelTitles\":false,\"useMargins\":true}","panelsJSON":"[{\"version\":\"2.7.0\",\"gridData\":{\"x\":0,\"y\":0,\"w\":24,\"h\":15,\"i\":\"a1954dc7-8655-4ea8-9a75-67cbe201b80c\"},\"panelIndex\":\"a1954dc7-8655-4ea8-9a75-67cbe201b80c\",\"embeddableConfig\":{},\"panelRefName\":\"panel_0\"}]","timeRestore":false,"title":"mem-leak-dignostic","version":1},"id":"9aa66080-f751-11ed-b6d0-850581e4a72d","migrationVersion":{"dashboard":"7.9.3"},"references":[{"id":"92546710-f751-11ed-b6d0-850581e4a72d","name":"panel_0","type":"visualization"}],"type":"dashboard","updated_at":"2023-05-20T21:02:17.736Z","version":"WzUyLDVd"}
{"exportedCount":2,"missingRefCount":0,"missingReferences":[]}
3 changes: 3 additions & 0 deletions src/integrations/display/otel-architecture.ndjson
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{"attributes":{"description":"OTEL Astronomy Demo Application architecture","kibanaSavedObjectMeta":{"searchSourceJSON":"{\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filter\":[]}"},"title":"otel-architecture","uiStateJSON":"{}","version":1,"visState":"{\"title\":\"otel-architecture\",\"type\":\"markdown\",\"aggs\":[],\"params\":{\"fontSize\":12,\"openLinksInNewTab\":false,\"markdown\":\"# OTEL Astronomy Demo Application\\n\\nThe following diagram presents the OTEL Astronomy shop services architecture:\\n\\n![](img/DemoServices.png)\\n\\n\\n### Trace Collectors\\nGaining a macro-level perspective on incoming data, such as sample counts and cardinality, is essential for comprehending the collector’s internal dynamics. However, when delving into the details, the interconnections can become complex. The Collector Data Flow Dashboard aims to demonstrate the capabilities of the OpenTelemetry demo application, offering a solid foundation for users to build upon.\\n\\nMonitoring data flow through the OpenTelemetry Collector is crucial for several reasons.\\n - All services are traces in all the development languages\\n - Auto instrumented\\n - Manual spans and attributes\\n - Span events\\n - Span links\\n\\nTrace Headers are propagated across all services (**Context propagation**)\\n\\n\\n### Metric Collectors\\nCollecting all the KPI information into Prometheus time series storage including:\\n - runtime metrics\\n - HTTP / gRPC latency distribution\\n\\n### Data Flow Overview\\nCollector Data Flow Dashboard provides valuable guidance on which metrics to monitor. Users can tailor their own dashboard variations by adding necessary metrics specific to their use cases, such as memory_delimiter processor or other data flow indicators. This demo dashboard serves as a starting point, enabling users to explore diverse usage scenarios and adapt the tool to their unique monitoring needs.\\n\\nThe diagram below provides an overview of the system components, showcasing the configuration derived from the OpenTelemetry Collector (otelcol) configuration file utilized by the OpenTelemetry demo application. Additionally, it highlights the observability data (traces and metrics) flow within the system.\\n\\n#### Simple purchase use case\\nThis flow diagram shows the trace evolution from the user selecting a purchased item going through different backend services until reaching the storage database.\\n![](img/DemoFlow.png)\\n\\n## Reference\\n**_OTEL Demo info_**\\n- [architecture](https://opentelemetry.io/docs/demo/architecture/)\\n- [collector-data-flow-dashboard](https://opentelemetry.io/docs/demo/collector-data-flow-dashboard/)\\n- [services](https://opentelemetry.io/docs/demo/services/)\\n\\n**_OTEL Demo youtubes_**\\n - [Cloud Native Live: OpenTelemetry community demo](https://www.youtube.com/watch?v=kD0EAjly9jc)\\n\"}}"},"id":"5c297aa0-f750-11ed-b6d0-850581e4a72d","migrationVersion":{"visualization":"7.10.0"},"references":[],"type":"visualization","updated_at":"2023-05-20T20:53:23.402Z","version":"WzQ3LDVd"}
{"attributes":{"description":"OTEL Astronomy Demo Application architecture","hits":0,"kibanaSavedObjectMeta":{"searchSourceJSON":"{\"query\":{\"language\":\"kuery\",\"query\":\"\"},\"filter\":[]}"},"optionsJSON":"{\"hidePanelTitles\":false,\"useMargins\":true}","panelsJSON":"[{\"version\":\"2.7.0\",\"gridData\":{\"x\":0,\"y\":0,\"w\":24,\"h\":15,\"i\":\"cc8d389f-6ab0-4590-bd7d-140ed04a28b1\"},\"panelIndex\":\"cc8d389f-6ab0-4590-bd7d-140ed04a28b1\",\"embeddableConfig\":{},\"panelRefName\":\"panel_0\"}]","timeRestore":false,"title":"otel-demo-architecture","version":1},"id":"67e37e40-f750-11ed-b6d0-850581e4a72d","migrationVersion":{"dashboard":"7.9.3"},"references":[{"id":"5c297aa0-f750-11ed-b6d0-850581e4a72d","name":"panel_0","type":"visualization"}],"type":"dashboard","updated_at":"2023-05-20T20:53:43.076Z","version":"WzQ4LDVd"}
{"exportedCount":2,"missingRefCount":0,"missingReferences":[]}
Loading

0 comments on commit ec75650

Please sign in to comment.