-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jaeger v2 based on OpenTelemetry collector #3500
Comments
Direction wise makes total sense as jaeger pipeline looks rudimentary, however I believe there are some features that are not yet avail in otel collectors What's left to Jaeger then? custom protocol as receiver (which should go away at some point), UI as extension and custom storage integration? Means just coupling querier with particular exporters? |
Jaeger will provide the storage layer and UI. Once we are based on the OTEL collector it will be easier to write extensions and provide additional functionality. |
@eugeniyk Think of this not from the architecture point of view but from the user impact - everything about Jaeger is still there, plus some additional capabilities we could inherit from OTEL (the most critical one - OTLP receiver). Jaeger is an end-to-end tracing platform, OTEL collector is not, just a piece of it. |
So, what about the data model which is used on storage, and how about the terminology in Jaeger would evolve to align with Otel versus OT? I don't think we need to make those changes, but they would help for the future. Unfortunatley they don't really help users so much, but are more for maintainers and the project. The only other concern is the UI, we haven't been anything really meaningful on the UI side beyond the recent monitoring tab for the last several years. I have been pushing the OpenSearch Dashboards team to add Jaeger format support, they are already working with an OpenTelemetry schema on OpenSearch, but as you know we are on an OT schema for Jaeger which is "legacy". I think we need to address the schema and allow the Jaeger format to be more widely used for UIs which will evolve, as the Jaeger UI is likely to stagnate. |
The question I would as here is, what "features" OTEL schema gives compared to Jaeger/OT? They are pretty much interchangeable. The storage layer can change schema independently at any point, it's more like an implementation detail given the scope of this issue.
I agree the Jaeger UI needs migration work as weel. At minimum we can start with renaming some UI elements Tags->Attributes etc. and at one point migrate UI to use the Jaeger V3 or other OpenTelemetry compatible model. Do you have references to OpenSeach adopting OTEL model? The OTEL JSON is (or was) not stable yet. In the past we had issues in Jaeger when storage layer used directly jaeger model. The separation gives more flexibility and shading from breaking changes. |
Just more about Jaeger being aligned with the current tracing standard versus being aligned with the deprecated tracing standard. The second idea is that if we adopt the Otel schema we could work with OpenSearch Dashboards trace analytics which is using an Otel schema too. More on this in the next paragraph.
Yes.... and more on this below
Currently, they are using "Data prepper" which does the writing to OpenSearch/ElasticSearch. They are not using Otel collector since they are doing aggregates similar to the spark component in Jaeger. Here is the schema for Data Prepper: https://github.com/opendistro-for-elasticsearch/data-prepper/blob/634a426ef0377ebc2e525e954177b350ebaeabe2/docs/schemas/trace-analytics/otel-v1-apm-span-index-template.md Ideally they will support a Jaeger schema for reading too, but without aggregates there will be a bunch of features that do not work (maps, aggregated metrics, etc). |
https://docs.aws.amazon.com/opensearch-service/latest/developerguide/trace-analytics.html Fairly primitive so far, but certainly a viable alternative to the Monitor tab (#2954). @jkowall do you know if OS is planning to venture more into trace visualizations like the timeline view, graph view, etc.? I think this brings one other crucial questions of Jaeger V2 - should we just focus on a single storage backend that can actually provide these analytical capabilities? TBH, it's not clear to me if querying OS for aggregate data is preferable over the Monitoring tab approach of using a metrics system for the same aggregates. It's certainly simpler dealing with just one backend, but likely at the expense of query latency. |
Still needs work, the plugin is not that old. We had the PM on a Jaeger community call last year. They are planning on timeline and graphical topology. We also want to build metric query capabilities into OpenSearch Dashboards so you can hook it up to a PromQL backend, but no one is actively working on that at this time. I think less backends is a good move, but I also think that the current backends are not very good with Metrics. I hope there is something better on the horizon that supports PromQL and unstructured data well. Hard to tell what and when that might happen. The number of breaking changes coming in v8 will be interesting: https://www.elastic.co/guide/en/elasticsearch/reference/8.0/migrating-8.0.html Welcome to the ongoing Elastic push to break open source and backwards compatibility. |
Another benefit of the Monitoring tab approach is its tighter integration within Jaeger UI, supporting use cases that rely on aggregated data to help narrow down the search space to the more "interesting" traces (high latency, error rate traces) with a single button click. I agree though, that the approach used with the Monitoring tab has quite a few moving parts: OTEL collector with the correct config to perform the aggregation from traces to metrics, a metrics store to persist the metrics, then Jaeger query + UI to visualize the metrics; and like the idea of supporting a single storage backend. |
Jager v2 would reduce the dependency on the OTEL collector for the monitoring feature. We bould include the processor that extracts metrics in the Jaeger main distribution. |
From a user's perspective, I would like to know what it really means by "Jaeger collector based on Otel collector", is that a drop-in replacement for the Otel collector? The folks from Dow Jones has written an excellent blog post detailing how OS, Jaeger and Otel plays together in enterprise settings, Otel acts as centralized trace pipeline and Jaeger being a sink. Perhaps Jaeger in the future can remove client libraries/agents and focus on how to deal with the trace data (e.g. adding anomaly detection features?) On a side note, recent enhancements in Grafana's Jaeger data source plugin are really impressive, not saying that can replace the Jaeger UI, but maybe in the future more users will use Grafana instead of the Jaeger UI. |
It would have some additional exporters or other capabilities the project needs, but otherwise it would be very similar and mostly upstream code.
Problem is that data prepper is a pain to handle since it's doing the same things the collector is doing. Creates a lot of complexity, but yes it can play well, you are duplicating the data twice since it needs to be stored as Data prepper and Jaeger schemas.
The problem is we'd have to become more opinionated about the backend. The client libraries would be replaced by Otel, in fact they already have been since we deprecated the old ones: https://www.jaegertracing.io/docs/1.30/client-libraries/
The problem is that Grafana's licensing is not friendly to CNCF, and in the future I wouldn't be surprised to see an Elastic move pulled creating a gap in the Prometheus community and potentially a gap in Jaeger. I prefer to see an Apache 2.0 licensed solution like OpenSearch Dashboards or Apache Superset. |
I have some good news for this ticket! I have created https://github.com/jaegertracing/jaeger-opentelemetry-collector to bootstrap the work on the jaeger v2/ rebase on top of the OpenTelemetry collector. It is a community project, anybody interested can start contributing by migrating Jaeger storage implementations as exporters. |
@pavolloffay may I suggest adding a tracking issue (or a project) in that repo that lists the overall plan / list of tasks that need to be done to achieve some success criteria? |
That is a great idea - jaegertracing/jaeger-opentelemetry-collector#49 |
Hello, I'm trying to send trace data of a wordpress website to opensearch through jaeger. I'm using wordpress plugin wordpress to send the data to jaeger, and from there to opensearch : docker run --rm -it -v ${PWD}:/config With config.yaml : exporters: -> It gives the following error ./opensearch-docker-entrypoint.sh: line 140: /usr/share/opensearch/jaegertracing/jaeger-opentelemetry-collector: No such file or directory I'm looking for this since using open telemetry seperately and data prepper is returning connection errors/no valid pipeline for execution. So if this works it would simplify the process so much. Any suggestions to discuss would be appreciated. |
where are you getting instructions to use this ^ name? It's not supported atm. If your Wordpress installation exports OpenTelemetry data, you can send it to regular Jaeger collector: https://medium.com/jaegertracing/introducing-native-support-for-opentelemetry-in-jaeger-eb661be8183c |
Hey folks @pavolloffay @yurishkuro @gai6948 @eugeniyk @albertteoh @issraee, I've been working on jaeger integration with OpenSearch Dashboards Trace Analytics. Here's a quick demo, can you folks let me know what you think of it? We are hoping to get this into the next release, but would definitely like to hear some feedback about it from the community on what is useful/could be improved from end users and work with the Jaeger community to drive future development. opensearch-project/dashboards-observability#83 The github issue tracking some of the PM/UI/UX work is here: opensearch-project/dashboards-observability#83. Caveat: This demo is made from jaeger data ingested in a specific format that is more friendly to opensearch: |
For people subscribing to this issue - I have a new PR that I'd like to get feedback on. It currently implements a working all-in-one on top of OTEL Collector framework. #4766 |
## Which problem is this PR solving? - Third prototype of "Jaeger-v2" - Another alternative approach to #3500 ## Description of the changes - Adds a new binary `jaeger-v2` using OTEL Collector framework - Minimal amount of extensions is included, to mimic what `jaeger-collector` normally has - It will combine all previous functions of agent/collector/query in one binary, but controllable via config file ``` $ go run -tags=ui ./cmd/jaeger-v2 --config ./cmd/jaeger-v2/config.yaml ``` ## Roadmap https://docs.google.com/document/d/1s4_6VgAS7qAVp6iEm5KYvpiGw3h2Ja5T5HpKo29iv00/edit ## Design * the ingestion and storing of traces will be done via standard receivers/processors/exporters OTEL Collector components * the jaeger-query and UI are implemented as `jaeger_query` extension (already working in this PR) ### Storage In order to keep the flexibility of mixing & matching storage implementations, all backends can be configured via `jaeger_storage` extension (we may need to add `jaeger_metrics_storage` extension in the future). It might look like this: ```yaml jaeger_storage: memory: # defines Factory memstore: max_traces: 100000 cassandra: cassandra_primary: servers: [...] namespace: jaeger cassandra_archive: servers: [...] namespace: jaeger_archive ``` The `jaeger_query` extension then references specific storage factories by name: ```yaml jaeger_query: trace_storage: memstore dependencies: something_else metrics_store: prometheus_store ``` It's not clear yet if `jaeger_query` extension should simply subsume `jaeger_storage` extension, because Query is the only one that needs this _generic_ access to storage, while things like exporters or Kafka ingester (receiver) always deal with a single implementation (because OTEL Coll pipeline allows to connect them with each other, which is not possible with extensions). ## Trade-offs - This not using OTEL Collector builder `ocb`. That means people won't be able to assemble a different version of the collector with other extensions. - We may want to support `ocb` in the future, as it makes it easier to write custom in-process exporters for custom storage. It will require converting all the components into their own modules. ## Next steps * [x] Get feedback from the community on the approach * [x] Fully implement all-in-one by wiring receivers / exporters correctly ## Open Questions * How can we implement all-in-one equivalent that can be run without any config file? * Do we want [healthcheckextension](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/extension/healthcheckextension/README.md) to be included by default? * Investigate startup error `2023-09-23T19:55:46.661-0400 warn zapgrpc/zapgrpc.go:195 [core] [Channel #2 SubChannel #3] grpc: addrConn.createTransport failed to connect to {Addr: ":16685", ServerName: "localhost:16685", }. Err: connection error: desc = "transport: Error while dialing: dial tcp :16685: connect: connection refused" {"grpc_log": true}` --------- Signed-off-by: Yuri Shkuro <github@ysh.us> Signed-off-by: Yuri Shkuro <yurishkuro@users.noreply.github.com> Co-authored-by: Albert <26584478+albertteoh@users.noreply.github.com>
I am closing this in favor of a new issue where the actual roadmap is tracked. |
Update (Sept 2023)
Superseded by #4843
Proposal (2022)
Creating top-level issue for Jaeger v2 based on OpenTelemetry collector.
This has already been discussed a couple of times:
I would like to bring the topic back as I think it is vital for the future success of Jaeger and staying relevant. Since we worked on v2 a couple of things changed, and therefore, I have created a new proposal on how v2 could be designed https://docs.google.com/document/d/1d7j956tDVYacKHF-l0JL9sVhpbYkt0JECDyGeup-olc/edit?usp=sharing and what is the impact on the ecosystem.
I would like to open this for discussion. If you are Jaeger user please upvote the issue if you would like the proposal including the breaking changes and impact on deployment helm/operator.
The text was updated successfully, but these errors were encountered: