Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] correct discrepancies between the current log schema and the OTel guidelines #217

Open
YANG-DB opened this issue Feb 5, 2025 · 4 comments
Labels
enhancement New feature or request untriaged

Comments

@YANG-DB
Copy link
Member

YANG-DB commented Feb 5, 2025

Is your feature request related to a problem?
Log Schema Alignment: Identify and correct discrepancies between the current log schema and the OTel guidelines.

What solution would you like?
A clear and concise description of what you want to happen.

What alternatives have you considered?
A clear and concise description of any alternative solutions or features you've considered.

Do you have any additional context?
Add any other context or screenshots about the feature request here.

@YANG-DB YANG-DB added enhancement New feature or request untriaged labels Feb 5, 2025
@paulstn
Copy link

paulstn commented Feb 11, 2025

Field Name from Log Record (OTEL) Description Simple Schema DataPrepper Scehma Comments
Timestamp Time when the event occurred. @timestamp (date) @timestamp (date_nanos) Why one of them is in date_nanos and other is simply date.
ObservedTimestamp Time when the event was observed. observedTimestamp (date) observedTimestamp (date_nanos) The same issue as above.
"observerTime": { "type": "alias", "path": "observedTimestamp" } "observedTime": { "type": "alias", "path": "observedTimestamp" } There are two problems, what is the rationale behind this aliasing and field name itself is different across the schemas.
TraceId Request trace id. traceId (keyword) traceId (keyword)
SpanId Request span id. spanId (keyword) spanId (keyword)
TraceFlags W3C trace flag. ❌ Not Present ❌ Not Present In OpenTelemetry (OTel) convention, the TraceFlags field is a byte-sized value that represents trace flags as defined in the W3C Trace Context specification36. Currently, the specification defines only one flag: the SAMPLED flag9. TraceFlags are typically used in conjunction with TraceId and SpanId to provide complete tracing context36. We don't have this.
SeverityText The severity text (also known as log level). severity.text (text, keyword) severity.text (keyword) We have keyword and text in simple schema but only keyword in dataprepper. Do we need both?
SeverityNumber Numerical value of the severity. severity.number (long) severity.number (long) Do we need long here? Currently it ranges only until 24. 
Body The body of the log record. body (text) body (text)
"@message": { "type": "alias", "path": "body" }, We have aliasing for body in simple schem but not in data prepper schema. Lets remove this.
❌ Not Present ❌ Not Present schemaUrl (text, keyword) schemaUrl (keyword) Move this to under resource.
Resource Describes the source of the log. ❌ Not Present resource.attributes.* (keyword) Why does simple schema doesn't have resource attributes ?  Shoudl we use flat object? componentize each semantic convention.
InstrumentationScope Describes the scope that emitted the log. instrumentationScope (object) instrumentationScope (object)
instrumentationScope.schemaUrl (text, keyword) ❌ Not Present We should schemaUrl in data prepper?
instrumentationScope.version (text, keyword) instrumentationScope.version (keyword)
instrumentationScope.name (text, keyword) instrumentationScope.name (keyword)
instrumentationScope.dropped_attributes_count (integer) ❌ Not Present We should add dropped attributes count in 
No attributes under scope No attributes under scope No attributes under scope.
Attributes Additional information about the event. attributes (object) log.attributes.* (keyword) We are moving to the top into attributes field in data prepper. We will make this optional as part of deddoting changes.
attributes.data_stream.dataset (keyword) ❌ Not Present Why do we need data_stream attributes here and not present in data prepper schema. https://github.com/opensearch-project/opensearch-catalog/blob/main/docs/schema/observability/Naming-convention.md [Lior Perry pointed to thsi] This should ideally be constant_keyword which only stores once for an index. [https://opensearch.org/docs/latest/field-types/supported-field-types/constant-keyword/] We should use this? Having these fields helps doing multi index query and filter based on attribute types.
attributes.data_stream.namespace (keyword) ❌ Not Present
attributes.data_stream.type (keyword) ❌ Not Present
EventName Name that identifies the class / type of event. event.name (keyword) ❌ Not Present We should add event name.
❌ Not Present ❌ Not Present event.type (keyword) event.type (keyword) Why do we need separate event object when OTEL recommend event should be modelled as a type fo LogRecord. At max, we should add few more fields in log mapping to repurpose it for events. Link: https://opentelemetry.io/docs/specs/otel/logs/data-model/#events https://opentelemetry.io/docs/specs/semconv/general/events/
❌ Not Present ❌ Not Present ❌ Not Present event.result (keyword)
❌ Not Present ❌ Not Present event.source (keyword) ❌ Not Present
❌ Not Present ❌ Not Present event.exception.type (keyword) event.exception.type (keyword)
❌ Not Present ❌ Not Present event.exception.message (keyword) event.exception.message (text)
❌ Not Present ❌ Not Present event.exception.stacktrace (text) event.exception.stacktrace (text)

@YANG-DB
Copy link
Member Author

YANG-DB commented Feb 11, 2025

Thanks @paulstn for the detailed review
how would we like to distribute these tasks between us ?

@paulstn
Copy link

paulstn commented Feb 11, 2025

Hi @YANG-DB, I'm currently writing up a schema proposal for this feature (logs) first, and then I'll move to traces. Were there any other specific other sub-tasks I should be aware of? Will follow your lead for distribution

@paulstn
Copy link

paulstn commented Feb 12, 2025

Draft proposed schema (currently changing):

Proposed Field Type Analysis
Timestamp date_nanos OTel uses timestamp in nanos
ObservedTimestamp date_nanos Can be linked with the alias observedTime
TraceId keyword
SpanId keyword
TraceFlags numeric-bytes Start using this flag to comply with Otel standards, reasons listed here: https://www.w3.org/TR/trace-context/#trace-flags
severity object Contains text (keyword) and number (numeric-long)
body text
resource Object Would need to contain schemaUrl in some form, within the OTel Resource schema
InstrumentationScope Object Contain name, version, schemaUrl, droppedAttributeCount
Attributes Object under data_stream; dataset, namespace, type
EventName text All of the event fields used within ss4o and data prepper could be moved under OTel's resource

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request untriaged
Projects
None yet
Development

No branches or pull requests

2 participants