Skip to content

Commit

Permalink
Proposal: Inclusion of Trace Object & Profile (#1243)
Browse files Browse the repository at this point in the history
---

## Proposal: Inclusion of Trace Object & Profile

**Description**: This proposal introduces the concept of traces to the
OCSF schema. The goal is to enhance OCSF to cover observability data for
distributed traces.

**UPDATES**:
- Removing status_code from trace (only needed in span)
- Removing start_time, end_time, duration from span. (Already presend in
base event)
- Removing parent_span object, only parent_span_id will be known to a
given span. Prevents deep nesting too.
- Maintaining start_time, end_time, duration in span and trace object.
These are often normalized to the software such as appdynamics which can
differ from event times.
- Added flags to trace 
- Added message to span
- Updated descriptions of attributes and objects for clarity

Traces contain vital information about the flow of requests through a
distributed system, including a unique trace ID, individual span IDs,
timestamps for start and end, duration, and metadata such as service
names and error details. This data provides a comprehensive view of how
requests are processed, revealing performance metrics and service
dependencies. Traces are useful for performance monitoring, as they help
identify bottlenecks and slow operations. They also facilitate root
cause analysis by allowing developers to pinpoint issues and optimize
the overall system for improved reliability and user experience.

**To support the proposal, here's how the modeled example would look
when applied to a purchase transaction trace. This illustrates how each
span and event would be structured, using OCSF:**

### Example Trace: Purchase Transaction Trace

#### 1. User Service Span
- **Span Name**: User Authentication  
- **Service**: User Service  
- **Duration**: 10ms  
- **Events**:
  - `start_auth`: Marks when authentication started
  - `db_query`: Records time spent querying the user database
  - `auth_success`: Indicates successful authentication

#### 2. Order Service Span
- **Span Name**: Create Order  
- **Service**: Order Service  
- **Parent Span**: User Authentication (order creation requires user
authentication)
- **Duration**: 50ms  
- **Events**:
  - `validate_cart`: Checks if all items in the cart are available
  - `calculate_total`: Calculates the total price
  - `order_created`: Confirms that the order was created in the system

#### 3. Payment Service Span
- **Span Name**: Process Payment  
- **Service**: Payment Service  
- **Parent Span**: Create Order  
- **Duration**: 100ms  
- **Events**:
  - `start_payment`: Marks the initiation of the payment process
- `payment_gateway_call`: Time spent calling an external payment gateway
  - `payment_success`: Confirms successful payment processing

#### 4. Inventory Service Span
- **Span Name**: Update Inventory  
- **Service**: Inventory Service  
- **Parent Span**: Create Order  
- **Duration**: 30ms  
- **Events**:
  - `inventory_lock`: Temporarily locks inventory items
  - `update_db`: Updates inventory database to reflect items sold
  - `inventory_release`: Releases inventory lock

#### 5. Notification Service Span
- **Span Name**: Send Confirmation Email  
- **Service**: Notification Service  
- **Parent Span**: Create Order  
- **Duration**: 20ms  
- **Events**:
  - `email_generated`: Generates the email content
  - `email_sent`: Confirms the email was sent to the user

### Summary of Trace
- **Trace**: Purchase Item  
- **Flow**: `User Authentication` → `Create Order` → `Process Payment` →
`Update Inventory` → `Send Confirmation Email`

### OCSF Model (Table)
| *Action* | *Description* | *Event Class* | *Profile Type* | *Trace ID*
| *Span ID* |

|-------------------------|------------------------------------------------------|---------------------------------------------|------------------|-------------|-------------|
| *start_auth* | Marks when authentication started | 3002 | Trace
Profile | Trace_001 | Span_001 |
| *db_query* | Records time spent querying the user database | 6005 |
Trace Profile | Trace_001 | Span_002 |
| *auth_success* | Indicates successful authentication | 3002 | Trace
Profile | Trace_001 | Span_003 |
| *validate_cart* | Checks if all items in the cart are available | 6009
(New Application Execution Activity) | Trace Profile | Trace_001 |
Span_004 |
| *calculate_total* | Calculates the total price | 6009 (New Application
Execution Activity) | Trace Profile | Trace_001 | Span_005 |
| *order_created* | Confirms that the order was created in the system |
6009 (New Application Execution Activity) | Trace Profile | Trace_001 |
Span_006 |
| *start_payment* | Marks the initiation of the payment process | 6009
(New Application Execution Activity) | Trace Profile | Trace_001 |
Span_007 |
| *payment_gateway_call* | Time spent calling an external payment
gateway | 6003 | Trace Profile | Trace_001 | Span_008 |
| *payment_success* | Confirms successful payment processing | 6003 |
Trace Profile | Trace_001 | Span_009 |
| *inventory_lock* | Temporarily locks inventory items | 6009 (New
Application Execution Activity) | Trace Profile | Trace_001 | Span_010 |
| *update_db* | Updates inventory database to reflect items sold | 6005
| Trace Profile | Trace_001 | Span_011 |
| *inventory_release* | Releases inventory lock | 6009 (New Application
Execution Activity) | Trace Profile | Trace_001 | Span_012 |
| *email_generated* | Generates the email content | 4009 | Trace Profile
| Trace_001 | Span_013 |
| *email_sent* | Confirms the email was sent to the user | 4009 | Trace
Profile | Trace_001 | Span_014 |

---

### New `trace_info` Object & Profile

**Trace Object**: Defines key application Trace Information for trace
events. (Included Via `trace` profile)

    {
      "caption": "Trace",
"description": "The trace object contains information about distruibuted
traces which are critical to observability and describe how requests
move through a system, capturing each step's timing and status.",
      "extends": "object",
      "name": "trace",
      "attributes": {
        "uid": {
"description": "The unique identifier of the trace used in distributed
systems and microservices architecture to track and correlate requests
across various components of an application.",
          "requirement": "required"
        },
        "span": {
"description": "The attributes associated with a span within a
distributed trace.",
          "requirement": "optional"
        },
        "service": {
"description": "Identifies the service or component generating the
trace.",
          "requirement": "optional"
        },
        "status_code": {
"description": "Indicates whether the operations in the trace were
successful, failed, or had an error, aiding in pinpointing issues.",
          "requirement": "optional"
        },
        "start_time": {
"description": "The start timestamp of the trace, essential for
identifying latency and performance bottlenecks.",
          "requirement": "optional"
        },
        "end_time": {
"description": "The end timestamp of the trace, essential for
identifying latency and performance bottlenecks.",
          "requirement": "optional"
        },
        "duration": {
"description": "The trace duration, the amount of time the trace covers
from <code>start_time</code> to <code>end_time</code> in milliseconds.",
          "requirement": "optional"
        }
      }
    }


---

**New Trace Attributes**: Enum of key application Trace Information for
trace events.

    "trace": {
      "caption": "Trace",
"description": "The attributes associated with an event containing trace
data.",
      "type": "trace"
    },
    "span": {
      "caption": "Span",
"description": "The attributes associated with an event containing span
data.",
      "type": "span"
    },

---

**New Span Object Attributes**: Enum of key application Trace
Information for trace events.

    {
      "caption": "Span",
"description": "The attributes associated with an event containing span
data.",
      "extends": "object",
      "name": "span",
      "attributes": {
        "uid": {
"description": "The unique identifier of the span used in distributed
systems and microservices architecture to track and correlate requests
across various components of an application.",
          "requirement": "required"
        },
        "service": {
"description": "Identifies the service or component creating the span,
which helps track its path through a distributed system.",
          "requirement": "optional"
        },
        "operation": {
"description": "Describes an actions performed in a span, such as API
requests, database queries, or computations.",
          "requirement": "optional",
          "is_array": true
        },
        "parent_span": {
"description": "The parent span of this span object. It is recommended
to only populate this field for the first process object, to prevent
deep nesting.",
          "requirement": "optional"
        },
        "start_time": {
"description": "The start timestamp of the span, essential for
identifying latency and performance bottlenecks.",
          "requirement": "optional"
        },
        "end_time": {
"description": "The end timestamp of the span, essential for identifying
latency and performance bottlenecks.",
          "requirement": "optional"
        },
        "duration": {
"description": "The span duration, the amount of time the trace covers
from <code>start_time</code> to <code>end_time</code> in milliseconds.",
          "requirement": "optional"
        },
        "status_code": {
"description": "Indicates whether the operations in the span were
successful, failed, or had an error, aiding in pinpointing issues.",
          "requirement": "optional"
        }
      }
    }


## Traces profile

    {
"description": "The Traces Profile extends the OCSF framework to capture
and standardize observability events, specifically targeting trace-level
data. This profile enables integration and normalization of distributed
tracing information, allowing OCSF events to retain essential trace
context such as trace IDs, span relationships, and service
dependencies.",
      "meta": "profile",
      "caption": "Traces",
      "name": "traces",
      "annotations": {
        "group": "primary"
      },
      "attributes": {
        "trace": {
"description": "The trace object contains information about distruibuted
traces which are critical to observability and describe how requests
move through a system, capturing each step's timing and status.",
          "requirement": "recommended"
        }
      }
    }

---

---------

Signed-off-by: Adam Gregory <pladamgregory@gmail.com>
Co-authored-by: Paul Agbabian <pagbabian@splunk.com>
Co-authored-by: Jonathan Rau <139361268+jonrau-at-queryai@users.noreply.github.com>
  • Loading branch information
3 people authored Nov 26, 2024
1 parent 692d615 commit a7afa67
Show file tree
Hide file tree
Showing 6 changed files with 122 additions and 5 deletions.
12 changes: 11 additions & 1 deletion dictionary.json
Original file line number Diff line number Diff line change
Expand Up @@ -4507,6 +4507,11 @@
"description": "The version number of the latest Service Pack.",
"type": "integer_t"
},
"span": {
"caption": "Span",
"description": "The information about the span. See specific usage.",
"type": "span"
},
"speed": {
"caption": "Speed",
"description": "Ground speed of flight. This value is provided in meters per second with a minimum resolution of 0.25 m/s. Special Values: <code>Invalid</code>, <code>No Value</code>, or <code>Unknown: 255 m/s</code>.",
Expand Down Expand Up @@ -4910,6 +4915,11 @@
"description": "The event transmission time from one device to another. See specific usage.",
"type": "timestamp_t"
},
"trace": {
"caption": "Trace",
"description": "The information about the trace. See specific usage.",
"type": "trace"
},
"tree_uid": {
"caption": "Tree UID",
"description": "The tree id is a unique SMB identifier which represents an open connection to a share.",
Expand Down Expand Up @@ -5374,4 +5384,4 @@
}
}
}
}
}
10 changes: 8 additions & 2 deletions events/application/api_activity.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@
"extends": "application",
"name": "api_activity",
"attributes": {
"$include": [
"profiles/trace.json"
],
"activity_id": {
"enum": {
"1": {
Expand Down Expand Up @@ -58,5 +61,8 @@
"group": "primary",
"requirement": "required"
}
}
}
},
"profiles": [
"trace"
]
}
10 changes: 8 additions & 2 deletions events/network/http_activity.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@
"extends": "network",
"name": "http_activity",
"attributes": {
"$include": [
"profiles/trace.json"
],
"activity_id": {
"enum": {
"1": {
Expand Down Expand Up @@ -62,5 +65,8 @@
"group": "primary",
"requirement": "recommended"
}
}
}
},
"profiles": [
"trace"
]
}
44 changes: 44 additions & 0 deletions objects/span.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
{
"caption": "Span",
"description": "Represents a single unit of work or operation within a distributed trace. A span typically tracks the execution of a request across a service, capturing important details such as the operation, timestamps, and status. Spans help break down the overall trace into smaller, manageable parts, enabling detailed analysis of the performance and behavior of specific operations within the system. They are crucial for understanding latency, dependencies, and bottlenecks in complex distributed systems.",
"extends": "object",
"name": "span",
"attributes": {
"duration": {
"description": "The total time, in seconds, that the span represents, calculated as the difference between start_time and end_time. It reflects the operation's performance and latency, independent of event timestamps, and accounts for normalized times used by observability tools to ensure consistency across distributed systems.",
"requirement": "optional"
},
"end_time": {
"description": "The end timestamp of the span, essential for identifying latency and performance bottlenecks. Like the start time, this timestamp is normalized across the observability system to ensure consistency, even when events are recorded across distributed services with unsynchronized clocks. Normalized time allows for accurate duration calculations and helps observability tools track performance across services, regardless of the individual system time settings.",
"requirement": "required"
},
"message": {
"description": "The message in a span (often refered to as a span event) serves as a way to record significant moments or occurrences during the span's lifecycle. This content typically manifests as log entries, annotations, or semi-structured events as a string, providing additional granularity and context about what happens at specific points during the execution of an operation.",
"requirement": "optional"
},
"operation": {
"description": "Describes an action performed in a span, such as API requests, database queries, or computations.",
"requirement": "optional"
},
"parent_uid": {
"description": "The ID of the parent span for this span object, establishing its relationship in the trace hierarchy.",
"requirement": "optional"
},
"service": {
"description": "Identifies the service or component that generates the span, helping trace its path through the distributed system.",
"requirement": "optional"
},
"start_time": {
"description": "The start timestamp of the span, essential for identifying latency and performance bottlenecks. This timestamp is normalized across the observability system, ensuring consistency even when events occur across distributed services with potentially unsynchronized clocks. By using normalized time, observability tools can provide accurate, uniform measurements of operation performance and latency, regardless of where or when the events actually occur.",
"requirement": "required"
},
"status_code": {
"description": "Indicates the outcome of the operation in the span, such as success, failure, or error. Issues in a span typically refer to problems such as failed operations, timeouts, service unavailability, or errors in processing that can negatively impact the performance or reliability of the system. Tracking the `status_code` helps pinpoint these issues, enabling quicker identification and resolution of system inefficiencies or faults.",
"requirement": "optional"
},
"uid": {
"description": "The unique identifier for the span, used in distributed systems and microservices architectures to track and correlate requests across different components of an application. It enables tracing the flow of a request through various services.",
"requirement": "required"
}
}
}
36 changes: 36 additions & 0 deletions objects/trace.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
{
"caption": "Trace",
"description": "The trace object contains information about a distributed trace, which is crucial for observability. Traces are made up of one or more spans, which are individual units of work in application activity. Traces track the journey of a request as it moves through various services in a system, capturing key details like timing, status, and dependencies at each step. Traces provide insights into system performance, helping to identify latency, bottlenecks, and issues in complex, distributed environments.",
"extends": "object",
"name": "trace",
"attributes": {
"duration": {
"description": "The total time, in seconds, that the trace covers, calculated as the difference between start_time and end_time. This duration helps assess the overall performance of a request as it travels across various services, and is essential for identifying latency and potential bottlenecks within the distributed system. The trace duration may differ from individual span durations due to the propagation and processing times of the trace as it spans multiple components.",
"requirement": "optional"
},
"end_time": {
"description": "The end timestamp of the trace, essential for identifying latency and performance bottlenecks. Like the start time, this timestamp is normalized across the trace system to ensure consistency, even when events are recorded across distributed services with unsynchronized clocks. Normalized time allows for accurate trace duration calculations and helps observability tools track overall performance across services, regardless of the individual system time settings.",
"requirement": "optional"
},
"flags": {
"description": "The flags associated with the trace, used to indicate specific properties or behaviors, such as whether the trace is sampled or if it has special handling. Flags help control how traces are processed, logged, and analyzed, providing valuable context for tracing and observability tools in identifying trace characteristics or specific tracking requirements.",
"requirement": "optional"
},
"service": {
"description": "Identifies the service or component generating the trace, helping to track and correlate the flow of requests through various parts of a distributed system. This information is essential for understanding the role and performance of specific services within the broader context of system operations and for diagnosing issues across different components.",
"requirement": "optional"
},
"span": {
"description": "Represents a single unit of work or operation within a distributed trace. A span typically tracks the execution of a request across a service, capturing important details such as the operation, timestamps, and status. Spans help break down the overall trace into smaller, manageable parts, enabling detailed analysis of the performance and behavior of specific operations within the system. They are crucial for understanding latency, dependencies, and bottlenecks in complex distributed systems.",
"requirement": "optional"
},
"start_time": {
"description": "The start timestamp of the trace, essential for identifying latency and performance bottlenecks. Like the end time, this timestamp is normalized across the trace system to ensure consistency, even when events are recorded across distributed services with unsynchronized clocks. Normalized time enables accurate trace duration calculations and helps observability tools track performance across services, regardless of the individual system time settings.",
"requirement": "optional"
},
"uid": {
"description": "The unique identifier of the trace used in distributed systems and microservices architecture to track and correlate requests across various components of an application.",
"requirement": "required"
}
}
}
15 changes: 15 additions & 0 deletions profiles/trace.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{
"description": "The Trace Profile extends the OCSF framework to capture and standardize observability events, specifically targeting trace-level data. This profile enables integration and normalization of distributed tracing information, allowing OCSF events to retain essential trace context such as trace IDs, span relationships, and service dependencies.",
"meta": "profile",
"caption": "Trace",
"name": "trace",
"annotations": {
"group": "primary"
},
"attributes": {
"trace": {
"description": "The trace object contains information about distruibuted traces which are critical to observability and describe how requests move through a system, capturing each step's timing and status.",
"requirement": "recommended"
}
}
}

0 comments on commit a7afa67

Please sign in to comment.