You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a proposed design for adding trace span support to the FVM's execution logs. The FVM already has some rudimentary tracing support (logging calls and gas consumption) but lacks metadata to allow fine after-the-fact analysis. This design is informed by prior efforts that used formatted Log events to simulate spans (implemented here) in order to do the kind of analysis shown in filecoin-project/builtin-actors#1020.
While we could keep going with merely conventional span infrastructure via formatted messages, we'll probably want to build a fair bit of tooling for doing analysis of the subsequent traces. Settling on a data model will allow many developers to benefit from tooling we might build e.g. for built-in actor development, and build mutually compatible tooling themselves.
Related context: #1080. This design might answer the core need underlying #696 and #1498, although there are other ideas there that are independently valuable.
Span data model
The FVM’s existing execution trace is a list of events, such as GasCharge, Call, CallReturn, and Log. Extent this with a pair of new event types that demarcate spans: SpanBegin and SpanEnd.
A span comprises:
ID: an integer unique to the instance of the VM (i.e a top-level message execution).
Begin/End time: timestamps (see below) recording the span’s place in time.
Code Context: the code CID and method number of the currently executing method
Label: a string provided by the programmer
Tag (optional): a string provided by the programmer (with conventionally formatted content)
Parent (optional): an ID of another span to be denoted as the parent
All these items except the end timestamp are recorded with the SpanBegin event, and the end timestamp with the SpanEnd.
Span timestamps do not need to use a real-world reference of time, they need only record the sequencing of events. The host machine clock nanoseconds may suffice, but equally so would any monotonic counter like WASM instruction count, or syscall count, or even a simple logical clock producing sequence numbers. It would be most helpful if this execution timestamp was attached to all trace entries, not just the spans.
The span’s label is intended to be a string constant, so that each execution of the “same” span in code carries the same label. The tag on the other hand may vary between executions of the same code span, e.g. carrying a parameter value or counter. We should immediately establish a convention for the format of its contents as a comma-separated list of key=value pairs (but this won’t be enforced by the VM).
During execution, the FVM only needs to record these span events in the execution trace. It does not need to maintain state or associate any other events with the span. The determination of relationships like parent/child or containment is done with subsequent analysis.
Alternatives
Other span data models (e.g. in traditional distributed systems) provide for an arbitrary key/value mapping of metadata to be associated with a span. This model is reduced to just an opaque tag here in order to minimise complexity and cost at the FVM syscall interface.
Tracing syscalls
Actor code can mark spans with two new syscalls. Like the log syscall, these take effect only when VM debugging is enabled. Also like use of the log syscall, actor code should wrap these calls in a macro that does nothing if VM debugging is disabled, avoiding any argument formatting etc.
/// Records a span start event and returns an identifier for it./// The current execution time is recorded as the start time./// The currently-executing actor's code CID and method number are recorded as context/// The parent is recorded as the spans parent (but not checked for validity).
sys::debug::span_begin(label:*const u8, label_len:u32, tag:*const u8, tag_len:u32, parent:u64) -> u64/// Records an end event for the identified span./// The ID is not checked for validity.
sys::debug::span_end(id: u64)
Implicit spans for sends
The VM should automatically generate span events for calls (in addition to the current Call events).
The code context is the code CID and method of the caller
The label is “Send”
The tag is the code CID and method number of the callee
The parent is the current call’s span ID, with the top-level call having no parent.
Analysis
Analysis of execution trace spans is performed from the execution trace events after execution. Many analysis methods are possible, here we just want to sketch out some likely methods to ensure that the data we are capturing is sufficient.
Note that:
Nothing requires that spans are properly nested in time: spans can overlap arbitrarily.
Nothing constrain s a child span to be nested within its parent – a child could complete after a parent does.
Programmer error may lead to spans that are unclosed, or closed twice
However, the spans created automatically for sends will be properly nested.
Count gas usage in sends
A traversal of the trace can maintain a stack of spans with label Send, and associate gas charge events with the topmost one. A call tree can be computed from the span parent metadata. This produces “self” gas (i.e. excluding subcalls), and the total (transitive) gas may be computed by propagating totals up the call tree.
Summing over span labels
If the same code is executed repeatedly during one top-level execution, there may be multiple spans with the same label (possibly with different tags). E.g. a span could wrap the common read-from-HAMT code. These repeated executions could be summed by matching on label.
The text was updated successfully, but these errors were encountered:
This is a proposed design for adding trace span support to the FVM's execution logs. The FVM already has some rudimentary tracing support (logging calls and gas consumption) but lacks metadata to allow fine after-the-fact analysis. This design is informed by prior efforts that used formatted
Log
events to simulate spans (implemented here) in order to do the kind of analysis shown in filecoin-project/builtin-actors#1020.While we could keep going with merely conventional span infrastructure via formatted messages, we'll probably want to build a fair bit of tooling for doing analysis of the subsequent traces. Settling on a data model will allow many developers to benefit from tooling we might build e.g. for built-in actor development, and build mutually compatible tooling themselves.
Related context: #1080. This design might answer the core need underlying #696 and #1498, although there are other ideas there that are independently valuable.
Span data model
The FVM’s existing execution trace is a list of events, such as
GasCharge
,Call
,CallReturn
, andLog
. Extent this with a pair of new event types that demarcate spans:SpanBegin
andSpanEnd
.A span comprises:
All these items except the end timestamp are recorded with the
SpanBegin
event, and the end timestamp with theSpanEnd
.Span timestamps do not need to use a real-world reference of time, they need only record the sequencing of events. The host machine clock nanoseconds may suffice, but equally so would any monotonic counter like WASM instruction count, or syscall count, or even a simple logical clock producing sequence numbers. It would be most helpful if this execution timestamp was attached to all trace entries, not just the spans.
The span’s label is intended to be a string constant, so that each execution of the “same” span in code carries the same label. The tag on the other hand may vary between executions of the same code span, e.g. carrying a parameter value or counter. We should immediately establish a convention for the format of its contents as a comma-separated list of key=value pairs (but this won’t be enforced by the VM).
During execution, the FVM only needs to record these span events in the execution trace. It does not need to maintain state or associate any other events with the span. The determination of relationships like parent/child or containment is done with subsequent analysis.
Alternatives
Other span data models (e.g. in traditional distributed systems) provide for an arbitrary key/value mapping of metadata to be associated with a span. This model is reduced to just an opaque tag here in order to minimise complexity and cost at the FVM syscall interface.
Tracing syscalls
Actor code can mark spans with two new syscalls. Like the
log
syscall, these take effect only when VM debugging is enabled. Also like use of thelog
syscall, actor code should wrap these calls in a macro that does nothing if VM debugging is disabled, avoiding any argument formatting etc.Implicit spans for sends
The VM should automatically generate span events for calls (in addition to the current
Call
events).Analysis
Analysis of execution trace spans is performed from the execution trace events after execution. Many analysis methods are possible, here we just want to sketch out some likely methods to ensure that the data we are capturing is sufficient.
Note that:
However, the spans created automatically for sends will be properly nested.
Count gas usage in sends
A traversal of the trace can maintain a stack of spans with label
Send
, and associate gas charge events with the topmost one. A call tree can be computed from the span parent metadata. This produces “self” gas (i.e. excluding subcalls), and the total (transitive) gas may be computed by propagating totals up the call tree.Summing over span labels
If the same code is executed repeatedly during one top-level execution, there may be multiple spans with the same label (possibly with different tags). E.g. a span could wrap the common read-from-HAMT code. These repeated executions could be summed by matching on label.
The text was updated successfully, but these errors were encountered: