Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decide if we need log record ID field #597

Closed
tigrannajaryan opened this issue May 12, 2020 · 8 comments · Fixed by #3047
Closed

Decide if we need log record ID field #597

tigrannajaryan opened this issue May 12, 2020 · 8 comments · Fixed by #3047
Labels
area:sdk Related to the SDK release:after-ga Not required before GA release, and not going to work on before GA spec:logs Related to the specification/logs directory

Comments

@tigrannajaryan
Copy link
Member

See discussion here open-telemetry/oteps#97 (comment)

Log Data model currently does not contain a field that identifies the log record, however having such field may be desirable.

We need to decide if we want such a field and if yes how it will be represented.

This can be added to the Data Model as an optional field, which will ensure we are not breaking the data model.

@tigrannajaryan tigrannajaryan added the spec:logs Related to the specification/logs directory label May 12, 2020
@tigrannajaryan
Copy link
Member Author

ULIDs may be a good fit for purpose: https://github.com/ulid/spec (but we probably should only recommend, not mandate them if we decide to use them).

@reyang reyang added the area:sdk Related to the SDK label Jun 26, 2020
@reyang reyang added the release:after-ga Not required before GA release, and not going to work on before GA label Jul 10, 2020
@djaglowski
Copy link
Member

This was discussed in the Log SIG today, where the consensus was that we do not need to make a decision on this prior to declaring the log data model stable.

@tigrannajaryan tigrannajaryan added release:required-logdatamodel-ga Required for declaring log data model stable and removed release:required-logdatamodel-ga Required for declaring log data model stable labels Nov 4, 2021
@pmm-sumo
Copy link
Contributor

pmm-sumo commented Mar 2, 2022

I think ULID would be very helpful to have as we could retain order of messages that way

@tigrannajaryan
Copy link
Member Author

We can define a semantic convention for an attribute with ULID value.

@svrnm
Copy link
Member

svrnm commented Dec 14, 2022

Copying over my use cases from the duplicate issue #2989:

  1. I have two observability tools where I ingest my log records, each one is "better" in doing one thing vs the other or they are used by different end-users (devs, ops, security, etc.). Now I'd like to provide a connection between the two for a logRecord, i.e. show logRecord A in both tools, or an end-user wants to share some log records with someone else so they can look it up in their tool etc.
  2. I want to use logRecords for metric exemplars
  3. I want to have a many-to-many relationship between logs and spans, this is very likely a corner case but worth considering (example: a fatal exception is thrown, bringing down the app, I log that exception but also want to link the exception with all spans affected by that crash)
  4. Additionally, from the previous discussions I understand that the initial use case for this ticket was around deduplication of log records reported multiple times.

Based on the conversation during the SIG call, I understand that there are a few questions that need to be answered:

  1. Does the ID need to be mandatory or is optional good enough? (e.g. via a semantic convention)
  2. How global does the uniqueness of the ID need to be? (universal or resource-level)
  3. What to do about existing IDs? (e.g. with windows events)

I probably miss something, but it should be possible to do (1). If I understand it correctly, there are 2 cases where a LogRecord gets created: by calling the Logs API or at the collector, when a log line gets ingested:

  • For the API it would be possible to have the ID generated when logRecord is called
  • For the collector it is also possible to generate that ID on creation of the record in OpenTelemetry format.

For (2) I don't know enough about different algorithms to compute unique IDs, what does ULID provide?

Finally (3) is a "nice-to-have" but probably not needed, the existing ID could be stored separately

@tigrannajaryan
Copy link
Member Author

Does the ID need to be mandatory or is optional good enough? (e.g. via a semantic convention)

I believe it needs to be optional since we must be able to represent external log records and not all log sources have an equivalent field. Yes, we can generate one when converting from external source to Otel, but I think it the id is a lot less useful in that case since we can't guarantee we don't read an external log record twice and assign it different ids, which defeats the purpose.

How global does the uniqueness of the ID need to be? (universal or resource-level)

I think global uniqueness is more valuable and is not very difficult to implement (generating a 128 bit UUID or ULID should be enough?).

What to do about existing IDs? (e.g. with windows events)

If we go for global uniqueness these won't work. AFAIK, windows events generate host-uniquie ids. These can still be recorded as windows-event-specific id attribute in addition to the global log record id.

For the API it would be possible to have the ID generated when logRecord is called

Correct.

For the collector it is also possible to generate that ID on creation of the record in OpenTelemetry format.

I think this is also correct with the caveat that if you use the Collector to read some external log source and happen to do it twice ("I have two observability tools where I ingest my log records" may be interpreted this way) then the same log record will get assigned different ids. I think this is ok and it is a likely usage of a Collector.

For (2) I don't know enough about different algorithms to compute unique IDs, what does ULID provide?

ULIDs are like UUIDs, they are 128 bit, with the added benefits of (from Otel's perspective):

  • Shorter text representation (smaller payload since we use strings for attributes).
  • Monotonically increase - friendlier to the backends when used as an index in a database (UUIDs are random which may cause perf problems). Note that there are new UUID versions that try to address this problem, but they are new and I don't know how well they are supported in various languages.

@svrnm
Copy link
Member

svrnm commented Dec 19, 2022

Does the ID need to be mandatory or is optional good enough? (e.g. via a semantic convention)

I believe it needs to be optional since we must be able to represent external log records and not all log sources have an equivalent field. Yes, we can generate one when converting from external source to Otel, but I think it the id is a lot less useful in that case since we can't guarantee we don't read an external log record twice and assign it different ids, which defeats the purpose.

For deduplication yes, it will loose it's value, but for the other use cases (e.g. linking across backends) it still would be good to have some certainty that the ID is available, although it still works with an optional ID created at the SDK/collector.

Let me take some time to write a proposal for the semantic convention, so we have something tangible to discuss.

@svrnm
Copy link
Member

svrnm commented Dec 19, 2022

I created #3047 for this

tigrannajaryan added a commit that referenced this issue Apr 6, 2023
Fixes #597

## Changes

- Add a section for "generic attributes" to the log semconv
- Add an attribute `log_record.id` making use of ULID as discussed in
#597

Some additional notes:
- I kept the PR small, so I left out some other potential attributes,
e.g. something for pre-existing ID (like windows event logs) or for
storing the used logging/eventing system or even something like a
"signature" that might be worth discussing, etc.
- I followed the structure of "generic attributes" from the spans
semconv
- I took some of the existing wording from #597 &
open-telemetry/oteps#97 (comment) to
describe the field

---------

Signed-off-by: svrnm <neumanns@cisco.com>
Co-authored-by: Joao Grassi <joao@joaograssi.com>
Co-authored-by: jack-berg <34418638+jack-berg@users.noreply.github.com>
Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com>
carlosalberto pushed a commit to carlosalberto/opentelemetry-specification that referenced this issue Oct 31, 2024
Fixes open-telemetry#597

## Changes

- Add a section for "generic attributes" to the log semconv
- Add an attribute `log_record.id` making use of ULID as discussed in
open-telemetry#597

Some additional notes:
- I kept the PR small, so I left out some other potential attributes,
e.g. something for pre-existing ID (like windows event logs) or for
storing the used logging/eventing system or even something like a
"signature" that might be worth discussing, etc.
- I followed the structure of "generic attributes" from the spans
semconv
- I took some of the existing wording from open-telemetry#597 &
open-telemetry/oteps#97 (comment) to
describe the field

---------

Signed-off-by: svrnm <neumanns@cisco.com>
Co-authored-by: Joao Grassi <joao@joaograssi.com>
Co-authored-by: jack-berg <34418638+jack-berg@users.noreply.github.com>
Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:sdk Related to the SDK release:after-ga Not required before GA release, and not going to work on before GA spec:logs Related to the specification/logs directory
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants