Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigation about tracing, logging, and metrics support. #482

Open
liurenjie1024 opened this issue Jul 26, 2024 · 12 comments
Open

Investigation about tracing, logging, and metrics support. #482

liurenjie1024 opened this issue Jul 26, 2024 · 12 comments
Labels
discussion Discussion about idea of this project. enhancement New feature or request

Comments

@liurenjie1024
Copy link
Contributor

As a library for production usage, it would be useful to add logging, metrics, and tracing support. We need to do some investigation about to fit into rust ecosystem as much as possible.

@Xuanwo
Copy link
Member

Xuanwo commented Jul 26, 2024

I will suggestion the following set:


Why not tracing?

Tracing is widely used, but it often confuses the scope of logging and tracing.

From a tracing perspective, it's 100 times slower than fastrace.

Meanwhile, log offers the most compatible solution for all logging systems, making integration easier for users.

Why not prometheus?

prometheus project maintained by tikv community is almost dead.

Why not metrics?

It makes sense to introduce an abstraction based on metrics tailored to my own usage. However, its abstraction does not effectively support prometheus, which is the most widely used metrics collector.

I suggest we first integrate directly with Prometheus and consider metrics, while also accommodating requests for support of other metrics tools like tcp.

@sdd
Copy link
Contributor

sdd commented Jul 26, 2024

Personally I'd support tracing crate over logging. Adoption of tracing has supplanted that of logging over the past few years. It is more capable and flexible than logging. I disagree with the sentiment that it confuses things.

@Xuanwo
Copy link
Member

Xuanwo commented Jul 27, 2024

Adoption of tracing has supplanted that of logging over the past few years.

Users who rely on tracing can still integrate with our log, as tracing has native integration. However, if we adopt tracing, then log users will not be able to integrate with us. Therefore, I believe libraries like Iceberg should use log instead of tracing. Nonetheless, I'm open to using tracing_subscriber in our tests for a better user experience.

Here is a decision matrix:

Does it works? iceberg use log iceberg use tracing
user use env_logger (log ecosystem) YES NO
user use tracing_subscriber(tracing ecosystem) YES YES

@liurenjie1024
Copy link
Contributor Author

Hi, @sdd

It is more capable and flexible than logging.

Could you elaborate on this? Previously for logging capability I thought tracing crate is better since it provides structural logging, but log also provides structural logging now.

As with using tracing + logging backends, there is tracing-log, but it seems it's not actively maintained.

I'm trying to summarize the proposed approaches here:

log + fastrace

Pros:

  • Each crate fouces on its own functionality only, that means they have most widely supported ecosystem, and maybe easier to enable logging and tracing separately?
  • fastrace reports that it's much faster than tracing.

Cons:

  • tracing seems to have better ecosystem if you look at their main page.
  • It seems that fastrace is less popular than tracing, that means it maybe more difficult to integrate with other libraries? For example if someone wants to use iceberg-rust in an application in h2, which relies on tracing library, that means the user needs to setup two collectors/subscribers?

tracing only

Pros:

  • Most widely used tracing library in rust with best integrations.

Cons:

  • It's mixing logging with tracing. If we use tracing for logging, our logging compability will not be compatible with logger system.
  • It's reported to be slower than fastrace.

So how about we use log + tracing? We stick to the rule that using tracing library for tracing only, and use log for logging?

@twuebi
Copy link
Contributor

twuebi commented Jul 29, 2024

Hi all,

I agree with @sdd and suggest to stick with tracing. It is the default choice for logging in async applications and has the most wide-spread adoption.

Regarding performance arguments, I doubt that this matters in any real world benchmark of this crate, any tracing incurred overhead will be dwarfed by the network latencies of whatever storage cloud service serves the data files / roundtrips to DB catalog / API calls to a Rest catalog. I'm strongly in favor with sticking with a tried and tested implementation and put some real world benchmarks in place and only if those show tracing to be a bottleneck consider choosing a less active alternative.

I also do not agree with the sentiment that it is a problem to address tracing & logging in the same crate. It rather provides very useful debugging utilities, e.g. when used in a server, some middleware can create a span containing a request id, now every (tracing-)log statement within that span will have the request id attached to it so your favorite log aggregator can filter by this request id to get all logs pertaining to it which is incredibly helpful when dealing with production issues.

Regarding metrics, I'm not sure I understand the issue, there exists https://crates.io/crates/metrics-prometheus which provides integration of prometheus with metrics.rs.

@c-thiel
Copy link
Collaborator

c-thiel commented Jul 30, 2024

Hey guys, some thoughts from my side:

  • I don't think performance matters much - we are talking nanoseconds while roundtrips to S3 and catalogs are Milliseconds. Thus, the wider adoption is a clear plus for me for tracing.
  • We are an io-heavy crate - lots of roundtrips via OpenDAL to Object Stores, Catalogs and more. IMO we would benefit alot from the tracing features over plain logging. It would enable us to consistently track operations across all affected systems - including forwarding of trace-ids to them. Iceberg-Rust is one component in an inherently distributed system that we call a Lakehouse. Distributed systems can best be monitored as a whole if traces are used everywhere along the way.

@Xuanwo
Copy link
Member

Xuanwo commented Jul 30, 2024

Thank you all for joining the discussion. It seems most people prefer using tracing for trace and log. I'm willing to accept this since it's more important for us to move forward.

The left part from my side is metrics: What do you think about prometheus_client and metrics?


Context from opendal.

OpenDAL supports ALL. Regardless of the library chosen, we simply need to enable the corresponding layer for it.

Name Depends Description
DtraceLayer probe Support User Statically-Defined Tracing(aka USDT) on Linux
LoggingLayer log Add log for every operations.
MetricsLayer metrics Add metrics for every operations.
FastraceLayer fastrace Add fastrace for every operations.
OtelTraceLayer opentelemetry::trace Add opentelemetry::trace for every operations.
PrometheusClientLayer prometheus_client Add prometheus metrics for every operations.
PrometheusLayer prometheus Add prometheus metrics for every operations.
TracingLayer tracing Add tracing for every operations.

@sdd
Copy link
Contributor

sdd commented Jul 30, 2024

Some additional context. Renjie mentioned:

Cons:
It's mixing logging with tracing. If we use tracing for logging, our logging compability will not be compatible with logger system.

This is not quite right. You can use the tracing_log crate to redirect any other crates logs that went to log to tracing:

use tracing_log::LogTracer;

// redirects all output sent to log:* to their tracing::* equivalent
LogTracer::init().expect("Failed to set logger");

Regarding metrics, my preference would be either metrics or opentelemetry. Both are vendor-agnostic with pluggable exporters. My first choice would be opentelemetry as it seems to have better cross-language recognition and a bit more active / vibrant community (I'm basing this on github stars, watchers, number of contributors and commit / merge frequency)

@sdd
Copy link
Contributor

sdd commented Jul 31, 2024

It sounds like tracing is the preferred option for tracing and logging. I'm happy to raise a PR to add this if we are all in agreement.

Regarding telemetry, does anyone have objections to the choice of opentelemetry?

@Xuanwo
Copy link
Member

Xuanwo commented Aug 1, 2024

It sounds like tracing is the preferred option for tracing and logging. I'm happy to raise a PR to add this if we are all in agreement.

I'm fine with this. Also cc @liurenjie1024 for ideas.

Regarding telemetry, does anyone have objections to the choice of opentelemetry?

opentelemetry's metrics support is currently in the Alpha stage. I believe most people primarily use it for trace. See: https://opentelemetry.io/docs/languages/rust/

I will support to use metrics instead if we don't want to depends on lib like prometheus_client directly.

@liurenjie1024
Copy link
Contributor Author

It sounds like tracing is the preferred option for tracing and logging. I'm happy to raise a PR to add this if we are all in agreement.

+1.

I will support to use metrics instead if we don't want to depends on lib like prometheus_client directly.

+1. I would also prefer to use metrics since opentelemetry is in too early stage.

@ZENOTME
Copy link
Contributor

ZENOTME commented Aug 4, 2024

Some additional context. Renjie mentioned:

Cons:
It's mixing logging with tracing. If we use tracing for logging, our logging compability will not be compatible with logger system.

This is not quite right. You can use the tracing_log crate to redirect any other crates logs that went to log to tracing:

use tracing_log::LogTracer;

// redirects all output sent to log:* to their tracing::* equivalent
LogTracer::init().expect("Failed to set logger");

Regarding metrics, my preference would be either metrics or opentelemetry. Both are vendor-agnostic with pluggable exporters. My first choice would be opentelemetry as it seems to have better cross-language recognition and a bit more active / vibrant community (I'm basing this on github stars, watchers, number of contributors and commit / merge frequency)

I have a question: the LogTracer in tracing_log is used to convert the log record into a trace event. Does this mean that if the user uses log originally and after they introduce iceberg-rust, they have two choices:

  1. maintain trace and log separately
  2. use trace and redirect all original log records into trace events.

For users, is it a more ideal behavior to redirect the trace to the original log? 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Discussion about idea of this project. enhancement New feature or request
Projects
Status: No status
Development

No branches or pull requests

6 participants