-
Notifications
You must be signed in to change notification settings - Fork 889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change default trace-id format to be similar to AWS X-Ray (use timestamp) #1947
Comments
I think the important thing is that we are on the same page on whether the left or right bytes are random or if it does not matter, for use with probability sampling. For example I think an old UUID v1 has the timestamp as first bytes and the last bytes are a static node ID. https://datatracker.ietf.org/doc/html/rfc4122#section-4.1.2 Not sure if these are still relevant though. I think it should not matter much for the "precision" of probability sampling whether we use a 32 or 64 bit number as input or maybe we decided to use some hash function anyway (CC @oertl @jmacd). |
@Oberon00 The smallest sampling rate that you can achieve with X random bits is 1/2^X. For 32 bits we have 1/2^32 = 2.3E-10. I am not 100% sure, if this limit is future-proof. |
Sorry, wrong number! If we are talking about the trace ID and only 32 bit timestamp within that, we would still have 128 - 32 = 96 random bits within that, and for sampling the 64 bit span ID is more relevant anyway which would still be fully random with this suggestion? |
Also this is how OpenTelemetry will generate, backends should not expect that all IDs have this structure, but naturally if most of the IDs are genereting from Otel then they can benefit. |
I think this is a major feature that requires an OTEP that should go into details on motivation and trade-offs. The current two-sentence motivation raises way more questions than it answers, like:
|
@yurishkuro happy to do an OTEP if there is interest into doing this.
As mentioned in the issue, it is hard to guarantee that the timestamp will be present and correctly set. This is a small optimization for backends that store the trace-id as a key in a backend that does better when data are "kind of sorted" like Cassandra, HBase, etc. Later if we want to guarantee the timestamp is present we can propose a "bit" in the trace-flags that when set the first 32 bits are guaranteed to be a timestamp.
I don't want to enter into the backend design, but things like dropping old "traces" can be implemented once we know for sure that the timestamp is present. |
I would prefer clear description / open spec for those optimizations in the OTEP. Cf. sampling OTEPs from @jmacd which honestly discuss costs/benefits, without references/implications of vendor-only "secret sauce". |
FYI, ULIDs are an open standard for 128 bit globally unique ids with the first 48 bits being a timestamp and 80bits of randomness (ULIDs are implemented in most languages so should be easy to use in our SDKs):
|
Given that spans are guaranteed to include a timestamp field, I assume this is a technical limitation in Cassandra, etc, but it would be helpful to have it spelled out in the proposal. One thing I will point out: this will be a very permanent change, and we give up the possibility of ~128-bits of randomness in the process. So when deciding this please consider what people will want in ten years, not just today. |
This requires the DB to support secondary indexes and there is cost in maintaining those. Having a timestamp in the ID can provide some more flexibility for backends, especially to allow non-relational DBs. For example, AFAIK object store providers generally provide a way to list objects after a certain prefix. A dirt simple ingestion backend could be implemented in probably just a few lines of code, save spans to a file Just throwing out there what could open up when the primary ID used during injection is sortable by time. |
@tigrannajaryan that format sounds very good, would be interested to compare both options (Xray format vs ULIDs). I would prefer to not reinvent the wheel when it comes to formats, so I think these are the two options on the table for the moment.
This is not about a single Span, the trace-id will include the "start_time" of the root span and not the start_time of individual spans, it is a big difference when it comes to indexing and querying. Also this is very similar reason on why @jmacd proposes "p" value to be propagated, similar argument can be made there, if every span records it's own "p" value the propagation is unnecessary.
Did I not say that?
No it is not. There is no guarantee (unless we mark some trace flag, or trace state, or other place) that the ID has any specific format (opaque value). OpenTelemetry still has the ability to change the IDs generator so any "backend/vendor" has either to support any opaque value, or needs to ask the customers to ensure all IDs follow a specific format. This proposal does not ask to change "trace-id" format which is still a 16-byte array (opaque value), this proposal asks only to change the default "implementation" of the Id Generators to follow a format where we have some sort of ordering. Since the TraceID is still an "opaque" 16-byte array this change can be considered backwards compatible, the only property required by the w3c spec is to be "globally" unique, which after this change will not be affected. |
What are you trying to achieve?
The motivation for this change is to try to help backends/processing of trace data. If we add a "timestamp" as the first 32-bits into the generated trace-id, it will ensure that trace data (full traces) are generated (sent to the backend) in a "pseudo" order. The order cannot be guaranteed, but it is still an improvement for different backends (stores) to write events in a more deterministic order than completely random.
What did you expect to see?
A change to the default trace-id generator that specifies that the trace-id should have the first 32-bits as timestamp.
Additional context.
Add any other context about the problem here. If you followed an existing documentation, please share the link to it.
See AWS X-ray definition of the trace-id:
https://docs.aws.amazon.com/xray/latest/devguide/xray-api-sendingdata.html
The suggestion is not to follow that model exactly but just to encode the same "timestamp" as the left most 8 hex bytes of the trace-id.
The text was updated successfully, but these errors were encountered: