-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xsnap message timestamp logging/reporting #5152
Comments
@warner Is this needed for MN-1 or can it go into the Product Backlog? |
This does not change the engine behavior, but it augments the metering results with a "timestamps" field (an array of Numbers), which captures the times at which: * the primary delivery message was received * each syscall ("issueCommand" to parent) was sent * the response from each syscall was received * the delivery response was sent (actually just before the metering results are serialized) The parent process can correlate these timestamps with similar ones collected on the parent side, to determine how long it took to send messages over the pipe. If an unexpected stall is observed, this can help distinguish between the stall happening on the worker process or on the parent process. refs Agoric/agoric-sdk#5152
This moves to the version of `xsnap-worker.c` which includes receipt/transmit timestamps in the metering results. Tests are updated to expect 'timestamps' in xsnap metering results, and to assert that they correctly interleave with the times of upstream/downstream commands as seen by the parent process. refs #5152
This moves to the version of `xsnap-worker.c` which includes receipt/transmit timestamps in the metering results. Tests are updated to expect 'timestamps' in xsnap metering results, and to assert that they correctly interleave with the times of upstream/downstream commands as seen by the parent process. refs #5152
Unix provides multiple time APIs. `gettimeofday()` reports seconds-since-epoch with microsecond resolution, and can jump around, speed up, or even go backwards if/when someone changes the clock on the host computer. `clock_gettime(CLOCK_MONOTONIC)` never goes backwards, has nanosecond resolution, and in practice reports seconds-since-last-reboot (of the host computer). `CLOCK_MONOTONIC` is implemented (at least on linux) by initializing an offset (to the current time) at boot, and reporting `gettimeofday() - offset`. If someone calls `settimeofday()`, the kernel measures the delta, and subtracts it from the offset, so the `CLOCK_MONOTONIC` value does change even though the `gettimeofday()` does. However both `gettimeofday()` and `CLOCK_MONOTONIC` will reflect the gentler frequency adjustments that NTP makes (through `adjtime()`) to keep the clock in sync without timequakes. Previously, our slogfile's `.time` property recorded `performance.timeOrigin + performance.now()`, which is effectively the sum of `gettimeofday()` at process start, plus the delta in `CLOCK_MONOTONIC` between one sample at process start and another when `now()` is called, plus a tiny offset because `gettimeofday()` and the initial sample cannot happen exactly simultaneously. This has the benefit of never decreasing, and preserves the accuracy of intervals (timestamp subtractions) that span a `settimeofday()` timequake. But has the downside of becoming unique per-process when timequakes happen: if you compare this value against a similar one built in a different process, they will be different by the sum of any timequakes that occurred between the first process launching and the second one launching. If a process restarts, the timestamps before and after the restart will be offset by a similar amount. This changes the slogfile to use `gettimeofday()`, via a small C module named `microtime`. This matches what #5152 changes in `xsnap-worker.c` to record delivery/syscall send+receive timestamps. By using `gettimeofday()` on both sides, the timestamps remain comparable independent of when the processes are started or restarted. The drawback is that spans which cross a timequake will be corrupted, even becoming negative if `settimeofday()` causes the system clock to go backwards far enough. However we only really pay attention to one span at a time (a single block, or delivery, or syscall), so the damage is limited. I care more about comparability of timestamps across long-running processes (potentially across multiple hopefully-NTP-synced machines) than I'm worried about the occasional timequake-corrupted span. If we manage to make swingset fast enough that the microsecond resolution of `gettimeofday()` feels insufficient, we should probably change both the slogger and `xsnap-worker.c` to use the earlier approach: `timeOrigin + CLOCK_MONOTONIC` (which will involve writing new code on the `xsnap-worker.c` side). refs #5152
This moves to the version of `xsnap-worker.c` which includes receipt/transmit timestamps in the metering results. Tests are updated to expect 'timestamps' in xsnap metering results, and to assert that they correctly interleave with the times of upstream/downstream commands as seen by the parent process. refs #5152
This moves to the version of `xsnap-worker.c` which includes receipt/transmit timestamps in the metering results. Tests are updated to expect 'timestamps' in xsnap metering results, and to assert that they correctly interleave with the times of upstream/downstream commands as seen by the parent process. refs #5152
This moves to the version of `xsnap-worker.c` which includes receipt/transmit timestamps in the metering results. Tests are updated to expect 'timestamps' in xsnap metering results, and to assert that they correctly interleave with the times of upstream/downstream commands as seen by the parent process. refs #5152
This moves to the version of `xsnap-worker.c` which includes receipt/transmit timestamps in the metering results. Tests are updated to expect 'timestamps' in xsnap metering results, and to assert that they correctly interleave with the times of upstream/downstream commands as seen by the parent process. refs #5152
Unix provides multiple time APIs. `gettimeofday()` reports seconds-since-epoch with microsecond resolution, and can jump around, speed up, or even go backwards if/when someone changes the clock on the host computer. `clock_gettime(CLOCK_MONOTONIC)` never goes backwards, has nanosecond resolution, and in practice reports seconds-since-last-reboot (of the host computer). `CLOCK_MONOTONIC` is implemented (at least on linux) by initializing an offset (to the current time) at boot, and reporting `gettimeofday() - offset`. If someone calls `settimeofday()`, the kernel measures the delta, and subtracts it from the offset, so the `CLOCK_MONOTONIC` value does change even though the `gettimeofday()` does. However both `gettimeofday()` and `CLOCK_MONOTONIC` will reflect the gentler frequency adjustments that NTP makes (through `adjtime()`) to keep the clock in sync without timequakes. Previously, our slogfile's `.time` property recorded `performance.timeOrigin + performance.now()`, which is effectively the sum of `gettimeofday()` at process start, plus the delta in `CLOCK_MONOTONIC` between one sample at process start and another when `now()` is called, plus a tiny offset because `gettimeofday()` and the initial sample cannot happen exactly simultaneously. This has the benefit of never decreasing, and preserves the accuracy of intervals (timestamp subtractions) that span a `settimeofday()` timequake. But has the downside of becoming unique per-process when timequakes happen: if you compare this value against a similar one built in a different process, they will be different by the sum of any timequakes that occurred between the first process launching and the second one launching. If a process restarts, the timestamps before and after the restart will be offset by a similar amount. This changes the slogfile to use `gettimeofday()` for the `.time` property, via a small C module named `microtime`. This matches what issue #5152 changes in `xsnap-worker.c` to record delivery/syscall send+receive timestamps. By using `gettimeofday()` on both sides, the timestamps remain comparable independent of when the processes are started or restarted. The drawback is that spans which cross a timequake will be corrupted, even becoming negative if `settimeofday()` causes the system clock to go backwards far enough. However we only really pay attention to one span at a time (a single block, or delivery, or syscall), so the damage is limited. I care more about comparability of timestamps across long-running processes (potentially across multiple hopefully-NTP-synced machines) than I'm worried about the occasional timequake-corrupted span. A new `.monotime` property records just `performance.now()`, which is CLOCK_MONOTONIC relative to process start, and should have nanosecond resolution (despite returning a number that absurdly claims femtoseconds, probably because of base-2 representation limits). This is good for higher-resolution interval computation, and is immune to timequakes, but cannot be easily compared to anything outside the process. The slogfile lines were also rearranged, to put `type` at the front, and the two timestamps at the end. This should make the file more legible to humans who are just trying to eyeball the start/end of each delivery. If, some day, we manage to make swingset fast enough that the microsecond resolution of `gettimeofday()` feels insufficient, we should probably change both the slogger and `xsnap-worker.c` to use the earlier approach: `timeOrigin + CLOCK_MONOTONIC` (which will involve writing new code on the `xsnap-worker.c` side). refs #5152
Unix provides multiple time APIs. `gettimeofday()` reports seconds-since-epoch with microsecond resolution, and can jump around, speed up, or even go backwards if/when someone changes the clock on the host computer. `clock_gettime(CLOCK_MONOTONIC)` never goes backwards, has nanosecond resolution, and in practice reports seconds-since-last-reboot (of the host computer). `CLOCK_MONOTONIC` is implemented (at least on linux) by initializing an offset (to the current time) at boot, and reporting `gettimeofday() - offset`. If someone calls `settimeofday()`, the kernel measures the delta, and subtracts it from the offset, so the `CLOCK_MONOTONIC` value does change even though the `gettimeofday()` does. However both `gettimeofday()` and `CLOCK_MONOTONIC` will reflect the gentler frequency adjustments that NTP makes (through `adjtime()`) to keep the clock in sync without timequakes. Previously, our slogfile's `.time` property recorded `performance.timeOrigin + performance.now()`, which is effectively the sum of `gettimeofday()` at process start, plus the delta in `CLOCK_MONOTONIC` between one sample at process start and another when `now()` is called, plus a tiny offset because `gettimeofday()` and the initial sample cannot happen exactly simultaneously. This has the benefit of never decreasing, and preserves the accuracy of intervals (timestamp subtractions) that span a `settimeofday()` timequake. But has the downside of becoming unique per-process when timequakes happen: if you compare this value against a similar one built in a different process, they will be different by the sum of any timequakes that occurred between the first process launching and the second one launching. If a process restarts, the timestamps before and after the restart will be offset by a similar amount. This changes the slogfile to use `gettimeofday()` for the `.time` property, via a small C module named `microtime`. This matches what issue #5152 changes in `xsnap-worker.c` to record delivery/syscall send+receive timestamps. By using `gettimeofday()` on both sides, the timestamps remain comparable independent of when the processes are started or restarted. The drawback is that spans which cross a timequake will be corrupted, even becoming negative if `settimeofday()` causes the system clock to go backwards far enough. However we only really pay attention to one span at a time (a single block, or delivery, or syscall), so the damage is limited. I care more about comparability of timestamps across long-running processes (potentially across multiple hopefully-NTP-synced machines) than I'm worried about the occasional timequake-corrupted span. A new `.monotime` property records just `performance.now()`, which is CLOCK_MONOTONIC relative to process start, and should have nanosecond resolution (despite returning a number that absurdly claims femtoseconds, probably because of base-2 representation limits). This is good for higher-resolution interval computation, and is immune to timequakes, but cannot be easily compared to anything outside the process. The slogfile lines were also rearranged, to put `type` at the front, and the two timestamps at the end. This should make the file more legible to humans who are just trying to eyeball the start/end of each delivery. If, some day, we manage to make swingset fast enough that the microsecond resolution of `gettimeofday()` feels insufficient, we should probably change both the slogger and `xsnap-worker.c` to use the earlier approach: `timeOrigin + CLOCK_MONOTONIC` (which will involve writing new code on the `xsnap-worker.c` side). refs #5152
Here's a (python) tool to scan a slogfile with the new import sys, json, gzip, re, time
from collections import defaultdict
from pprint import pprint
class Syscall:
def __init__(self, vsc, num, rx_request):
self.num = num
self.vsc = vsc
self.tx_request = None
self.rx_request = rx_request
self.tx_result = None
self.rx_result = None
if vsc[0] == "send":
target = vsc[1]
msg = vsc[2]
methargsCD = msg["methargs"]
method = json.loads(methargsCD["body"])[0]
result = msg.get("result")
self.description = "(%s).%s -> %s" % (target, method, result)
elif vsc[0] == "callNow":
target = vsc[1]
method = vsc[2]
self.description = "D(%s).%s" % (target, method)
elif vsc[0] == "subscribe":
vpid = vsc[1]
self.description = "subscribe(%s)" % vpid
elif vsc[0] == "resolve":
vpid_strings = []
for [vpid, reject, data] in vsc[1]:
r = "!" if reject else ""
vpid_strings.append(r + vpid)
self.description = "notify %s" % ",".join(vpid_strings)
elif vsc[0] in ["dropImports", "retireImports", "retireExports"]:
self.description = "%s(%s)" % (vsc[0], ",".join(vsc[1]))
else:
self.description = vsc[0]
def got_result(self, when):
self.tx_result = when
def set_tx_request(self, when):
self.tx_request = when
def set_rx_result(self, when):
self.rx_result = when
class Delivery:
def __init__(self, cranknum, vd, when):
self.cranknum = cranknum
self.syscalls = []
self.tx_delivery = when
self.rx_delivery = None
self.tx_result = None
self.rx_result = None
self.vd = vd
if vd[0] == "message":
target = vd[1]
msg = vd[2]
methargsCD = msg["methargs"]
method = json.loads(methargsCD["body"])[0]
result = msg.get("result")
self.description = "(%s).%s -> %s" % (target, method, result)
elif vd[0] == "notify":
vpid_strings = []
for [vpid, reject, data] in vd[1]:
r = "!" if reject else ""
vpid_strings.append(r + vpid)
self.description = "notify %s" % ",".join(vpid_strings)
else:
self.description = vd[0]
def add_syscall(self, vsc, when):
num = len(self.syscalls)
s = Syscall(vsc, num, when)
self.syscalls.append(s)
def num_syscalls(self):
return len(self.syscalls)
def get_syscall(self, which):
return self.syscalls[which]
def set_rx_result(self, when):
self.rx_result = when
def set_rx_delivery(self, when):
self.rx_delivery = when
def set_tx_result(self, when):
self.tx_result = when
fn = sys.argv[1]
vatID = sys.argv[2] if len(sys.argv) > 2 else None
deliveries = []
opener = gzip.open if fn.endswith(".gz") else open
with opener(sys.argv[1]) as f:
for line in f:
if isinstance(line, bytes):
line = line.decode("utf-8")
if not line.strip():
continue
data = json.loads(line.strip())
if vatID and data["vatID"] != vat:
continue
type = data["type"]
when = data["time"]
if type == 'deliver':
cranknum = data["crankNum"]
deliverynum = data["deliveryNum"]
d = Delivery(cranknum, data["vd"], when)
deliveries.append(d)
if type == 'syscall':
deliveries[-1].add_syscall(data['vsc'], when)
if type == 'syscall-result':
deliveries[-1].get_syscall(-1).got_result(when)
if type == 'deliver-result':
deliveries[-1].set_rx_result(when)
dr = data["dr"]
mr = dr[2]
if mr and "timestamps" in dr[2]:
timestamps = mr["timestamps"]
d = deliveries[-1]
if len(timestamps):
d.set_rx_delivery(timestamps.pop(0))
for syscallnum in range(d.num_syscalls()):
s = d.get_syscall(syscallnum)
if len(timestamps):
s.set_tx_request(timestamps.pop(0))
if len(timestamps):
s.set_rx_result(timestamps.pop(0))
if len(timestamps):
d.set_tx_result(timestamps.pop(0))
fill = " "
prev_d = None
for d in deliveries:
total_worker = 0
total_kernel = 0
total_pipe = 0
if prev_d:
kerneltime = d.tx_delivery - prev_d.rx_result
print(" %.6f kern between-cranks" % kerneltime)
prev_d = d
print("c%d %s" % (d.cranknum, d.description))
if d.rx_delivery:
print(" k -> %.6f -> w (send delivery)" % (d.rx_delivery - d.tx_delivery))
#d.tx_result - d.rx_delivery,
total_pipe += (d.rx_delivery - d.tx_delivery)
else:
print(" k -> w (send delivery)")
prev_s = None
for s in d.syscalls:
if not prev_s and d.rx_delivery and s.tx_request:
print("%s%.6f worker" % (fill, s.tx_request - d.rx_delivery))
total_worker += (s.tx_request - d.rx_delivery)
if prev_s and prev_s.rx_result and s.tx_request:
print("%s%.6f worker" % (fill, s.tx_request - prev_s.rx_result))
total_worker += (s.tx_request - prev_s.rx_result)
prev_s = s
if s.tx_request and s.rx_result:
print(" w -> %.6f -> k, %.6f kern, k -> %.6f -> w : %s" % (
s.rx_request - s.tx_request,
s.tx_result - s.rx_request,
s.rx_result - s.tx_result,
s.description))
total_pipe += (s.rx_request - s.tx_request)
total_kernel += (s.tx_result - s.rx_request)
total_pipe += (s.rx_result - s.tx_result)
else:
print(" w -> k, %.6f kern, k -> w : %s" % (
s.tx_result - s.rx_request,
s.description))
if prev_s and d.tx_result:
print("%s%.6f worker" % (fill, d.tx_result - prev_s.rx_result))
total_worker += (d.tx_result - prev_s.rx_result)
if not prev_s and d.tx_result:
print("%s%.6f worker" % (fill, d.tx_result - d.rx_delivery))
total_worker += (d.tx_result - d.rx_delivery)
if d.tx_result:
print(" w -> %.6f -> k (return result)" % (d.rx_result - d.tx_result))
total_pipe += (d.rx_result - d.tx_result)
else:
print(" w -> k (return result)")
k_to_k = d.rx_result - d.tx_delivery
if d.tx_result:
print(" pipe %.6f, worker %.6f, kernel %.6f total (k->k %.6f)" % (
total_pipe, total_worker, total_kernel, k_to_k))
else:
print(" total (k->k %.6f)" % k_to_k)
A sample report looks like:
which shows two deliveries ( This dataset came from a unit test, so it's pretty quiet (there's no chain activity going on). The next step is to adapt this code into |
@mhofman estimate please? |
The |
What is the Problem Being Solved?
Our "slogfile" currently records kernel-side timestamps of messages going to and from vats: deliveries, syscall requests, syscall responses, and delivery results.
To support investigations like #5139, as well as to collect data for performance improvements, it would be nice if we also had worker-side timestamps for those same messages. We expect the cross-process messaging to be quick, but we don't actually have enough data to prove it. And we expect the kernel to be idle and thus react immediately to a worker's syscall request, but if the kernel were stuck for some reason (VM thrashing, waiting for some IO write), then a late timestamp on the receipt of a syscall request would look identical to the worker spending a long time before emitting the syscall.
Description of the Design
I'm thinking that
xsnap.c
should have an array (maybe pre-allocated) of timestamps. Whenever it receives the end of a netstring frame, it should append agettimeofday
(ideally as 64-bit seconds-plus-microseconds-since-epoch) to the array. When it sends a string to the parent process, it should also append a timestamp. We might need to include a tiny bit (maybe just one character) of the message so we can distinguish between console messages and syscalls.When the delivery is complete and the worker is serializing the results, it should fetch one last timestamp, then serialize the entire array and include it in the metering results. Then it should reset the array length to zero in preparation for the next delivery to begin.
On the kernel side, we already include the full metering record in the slogfile. As long as the kernel doesn't attempt to interpret the additional data, we need no additional modifications.
With this information in the slogfile, we can write post-mortem tools that correlate the sends and receives of each syscall request and response, and build a picture of how long the worker spent working, as opposed to how long it spent waiting for the kernel.
Security Considerations
None, the additional metering data is not shared elsewhere in the kernel, and is not included in consensus.
Test Plan
Unit tests in
packages/xsnap/
asserting the presence of the timetable and its correct length, and that the values are sorted and bounded by parent-side timestamps captured before and after the worker interaction.The text was updated successfully, but these errors were encountered: