-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] OpenSearch Tracing Collector #7352
Comments
I explored the alternative Communication model for the Receiver(Tracing Framework) and the Reader(Collector) as mentioned in the below table:
The recommended approach from our end is Shared Memory because of following reasons:
Sizing: The shared memory (/dev/shm) usage depends on the number of spans generate on the node, which in turn is a factor of user workload & number of shards allocated on the node. The default size will be 10% of the total RAM memory, this setting will be configurable (If usage exceeds this capacity, traces will not be written to the shared memory). The implementation will ensure that on Agent crash/ shutdown, all the data files in shared memory are cleaned up. Access Permission: |
Why not just add your own opensearch receiver? Instead of creating your own collector |
Thanks @khushbr for putting this together , the overall solution looks overly complicated to me, so please couple of questions.
|
Thanks @khushbr for the doc The processor component, would this be modelled as a part of PerformanceAnalyser agent or could this be pushed out to components hosted externally like logstash to transform/enrich the data |
Thank you for the feedback @lewis262626 and @reta. For some additional context, the proposed core tenets for OpenSearch Distributed Tracing feature (Instrumentation Framework and Collector) are:
The customization proposed is at the Collection and Processing layer. The first is discussed at #7352 (comment) in regard to Shared Memory. Let me elaborate on the processing part. In OpenSearch, there are long-running background tasks like peer recovery, and it would be expensive to persist trace data for such tasks in the core processes. Instead, this post-processing can be delegated to the Collector. In the future, Collector can support other use-cases, such as Thoughts? |
I would argue that the design you are suggesting is having considerably larger memory footprint than the simplified version:
This is the place where it would great to see
I would argue collecting whole traces in memory is not sustainable. The traces / spans could (and probably should) be reconciled on query level (with the data available at that moment).
I would argue this is not the responsibility of the collector: once the traces are stored, it is easy to answer such questions at query time, or any other questions that may come up in the future. The role of the collector (in my view) is as simple as receive the traces / spans, possible augment with some metadata, and flush it to the storage. |
Hello @reta ! Reviving this thread. Picking from our last conversation,
Agreed. This requirement has been updated - Now, the trace data will be reconciled with supplementary, resource usage data at the query time.
Agreed. I thought about this more, and it makes sense to delegate the responsibility of aggregation and analysis to the DB layer. This can come out-of-the-box with stores like Prometheus + Grafana or can be custom-built (as per the use-case) with a thin layer on top of data store. Coming back to the memory question and if we need a Collector Agent. The alternatives : Writer periodically flushes data directly to datastore, over the network has drawbacks:
I see Collector Agent with shared memory (/tmpfs) as following Separation of Concerns principle and offering superior IPC b/w the writer (Instrumentation code) and the Reader (Collector). In the long term, Collector can be the single funnel through which all the telemetry data (traces, service metrics, performance analyzer metrics, slow logs) is written to the user Observability data-store for visualization and monitoring. Let me know your thoughts. |
Thanks @khushbr
Agree, I think the Otel has the vendor agnostic Collector implementation, I don't think we should not have one, I think we should not implement one ourselves. |
@reta Agreed. The proposal here is to create a distribution of Otel Collector - Otel Framework with curated set of processors and exporters relevant to OpenSearch, with custom Receiever/Reader logic. |
@khushbr This feature sounds like it has a large intersection with the existing Audit Logs feature in the Security Plugin [1]; how do you envision the Audit Log be impacted by this change, or do you think there are components of that framework that should be reused? |
@peternied I am afraid this feature serves different purpose. Audit Log are always on, access driven triggers (please correct me if I am wrong), where tracing is purely optional infrastructure component designed to help troubleshooting issues in the distributed systems. |
Is your feature request related to a problem? Please describe.
#1061
Introduction
Colloquially Tracking/Traceability, Distributed Tracing is the ability to trace a request end-to-end in the system to get a complete view of the request execution, With respect to OpenSearch: From Coordinator, fanning to all the respective nodes with primary/replica shard and aggregating the result back on the coordinator. The supported use-cases will include Bad Query Debugging, Critical Path Analysis and A/B Testing.
The Distributed Tracing system will consist of 2 major components: Tracing Framework(details at #6750) and Telemetry Collector. The Tracing Framework will instrument and collect (node)local and (cluster-wide)distributed traces for OpenSearch search, index workloads and background jobs etc. The OpenSearch Telemetry Collector will offer a vendor-agnostic implementation to read, process and export telemetry (Initially trace, later Performance Analyzer metrics as well) data to stores like Prometheus, Jaeger, AWS XRay etc.
Proposed Implementation
Components
The Collector Agent on high level will have 3 main components. A user, based on their specific use-case, can configure the individual component in the collector pipeline yaml.
Packaging
The OpenSearch Collector will be a distribution of Otel Collector, curated for OpenSearch specific use-case. The Collector Agent will be designed as a Golang Binary, running as a side-car process on the same node to ensure process level resource isolation and no over-the-network cost. It will run as a custom user/RBAC role with limited privileged access to network, specific Disk Locations etc.
High Level Design
Sample Pipeline Configuration:
1. Reader
Borrowing from current Performance Analyzer-RCA inter-process communication model,
/dev/shm
(also known as tmpfs) - A shared memory implementation is used to write and read the tracing data. This Shared Memory Write/Read model works without any need for explicit synchronization and locking between the writer and reader. However, it involves the risk of Disk Read Congestion - the polling interval will be tuned for both less disk access frequency and low memory.2. Processor
Following the plug-and-play model, Users can configure ‘N’ processor types depending on their use-case. By default, only Memory Limiter and Batch processor will be enabled. The Collector will support dynamic turning on/off of the individual processor. The data will flow through the processors sequentially, strictly in the order defined, before reaching the exporter. The following processor types will be supported in the initial release:
3. Exporter
The Exporter is the final step in the Collector pipeline; it translates the OpenSearch trace format into the data-store format, and the http/grpc client in the exporter writes data to the data-store. The initial release will support OpenSearch, Prometheus and Jaeger exporter.
Telemetry Data Store
Users can configure the Agent to work with their Telemetry Data Stores. A Time-series Database which optimizes space with built-in data folding and compression is highly recommended as a Data-Store. The telemetry data-store should satisfy the following requirements:
* Must provide Correlational Analytics, Aggregation, Filtering functionality - required for building the global view of the cluster and also to serve Visualization.
* Visualization - Run Visualizations on top of the data-store, Example - Graphana, OpenSearch Dashboard.
* Support multiple telemetry data types (metrics, stats, trace and logs)
* Archival upto 30 days - configurable and depending on the use-case scenario.
How can you help?
Any feedback on the overall proposal is welcome. If you have specific suggestions on:
The text was updated successfully, but these errors were encountered: