-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add sent/received timestamps to the metering record #28
Conversation
This does not change the engine behavior, but it augments the metering results with a "timestamps" field (an array of Numbers), which captures the times at which: * the primary delivery message was received * each syscall ("issueCommand" to parent) was sent * the response from each syscall was received * the delivery response was sent (actually just before the metering results are serialized) The parent process can correlate these timestamps with similar ones collected on the parent side, to determine how long it took to send messages over the pipe. If an unexpected stall is observed, this can help distinguish between the stall happening on the worker process or on the parent process. refs Agoric/agoric-sdk#5152
The compiler flagged this as "case label value is less than minimum value for type", since `command` is a `char`.
cc @phoddie you might eyeball the docs I wrote and see if I got anything wrong |
} | ||
static void recordTimestamp() { | ||
if (num_timestamps < MAX_TIMESTAMPS) { | ||
gettimeofday(&(timestamps[num_timestamps]), NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the resolution of this syscall? Is it monotonic? Which Node API do we use on the other side, and do we know which syscall that corresponds to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to man gettimeofday
, it returns a struct tv
, which is a time_t
of seconds and a suseconds_t
of microseconds. Experimentally I observe both sizeof time_t
and sizeof suseconds_t
reporting 8 bytes, so a 64bit integer each (on current macOS, M1Pro CPU). The resolution is platform dependent, but I think everything modern provides actual microseconds. There are syscalls to do better, but this one is super-standard/portable, and I'm not currently concerned about anything smaller than microseconds (although I'll be delighted if we get everything else running fast enough to make us care about nanoseconds).
In our application (swingset), the Node API on the other side uses perf_hooks.performance
, and adds the constant performance.timeOrigin
to the current value of performance.now()
.
I spent the afternoon tracing down how these are managed within Node (and I'm reminded that Time is one of the three deadly problems in computer science, which seems like it should be trivial, but in fact the complexity wants to kill you.. strings and money are the other two on my list).
Node bottoms out into two POSIX functions. The first is the same gettimeofday()
I used in xsnap.c . The second is clock_gettime(CLOCK_MONOTONIC)
, which POSIX specifies to not decrease, but is still subject to frequency adjustments. Experimentally, CLOCK_MONOTONIC appears to return a structure with seconds/nanoseconds since boot.
At startup, every Node.js process samples clock_gettime(CLOCK_MONOTONIC)
followed immediately by gettimeofday()
: the two calls are right next to each other in src/node_perf.cc
, but there's still probably a delay of a few ns between them (call this the "sampling delta"). The CLOCK_MONOTONIC value is storted in a variable named timeOrigin
, and is used as the offset for calls to performance.now()
: every time you call now()
, it returns clock_gettime(CLOCK_MONOTONIC) - timeOrigin
, so you get time-since-process-start. This protects now()
from timequakes (when something like NTP changes the host's clock, either slowly or drastically).
By adding that value to performance.timeOrigin
, we get time-since-epoch, plus the sampling delta. I think this is the recommended way to get high-resolution time-since-epoch in Node. The process.hrtime.bigint()
returns the CLOCK_MONOTONIC value directly, which is not since-epoch. Date.now()
has only millisecond resolution. But the sum will drift from a fresh sampling of gettimeofday()
with each timequake, and won't line up again until the process is restarted (or you use
So we're comparing repeated calls to gettimeofday()
(on the worker side), against a single gettimeofday()
plus a delta of CLOCK_MONOTONIC
(on the kernel side). One systemic offset is the sampling delta. Another potential offset is if/when gettimeofday()
has a timequake (something like NTP changes the host's clock). Those offsets may get larger and larger over time, until the kernel is restarted and gets a fresh sample of gettimeofday()
(there's a performance.refreshTimeOrigin()
, but it doesn't cause a new call to gettimeofday()
, so I think it's broken/irrelevant).
If I care enough to avoid that, I think we have two options:
- 1: change the
xsnap.c
side to perform the same double-sampling at startup, and build seconds-since-epoch by addition just like Node.js does.- This would be additionally vulnerable to drift between the two processes when either one restarts (and does a new
gettimeofday()
). - On the plus side, the API would continue to be in seconds-since-epoch, which is the most meaningful thing to report
- This would be additionally vulnerable to drift between the two processes when either one restarts (and does a new
- 2: change the interface to report CLOCK_MONOTONIC values in the metering data, ignoring the epoch, and having the kernel side do the addition so it can record an absolute time in the slogfile
- it's safe to compare
CLOCK_MONOTONIC
againstCLOCK_MONOTONIC
, but the Node.js API doesn't actually expose that: only deltas betweenCLOCK_MONOTONIC
values - the worker process could do its own subtraction, to build a value that behaves like
performance.now()
, but that'd be worse: you can't compare seconds-since-process-1-start against seconds-since-process-2-start
- it's safe to compare
So really, the question is whether to report each gettimeofday()
, or to add a single gettimeofday()
to a delta between CLOCK_MONOTONIC
values.
FWIW, my current tests aren't showing me any surprises: the timestamps I get from the worker line up in the middle of the timestamps I'm seeing on the kernel process. But I'm only looking at the timestamps from a single unit test. I should find a way to modify the system time in the middle of a test and see how bad it can get.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to do the simple and most-stable thing on the xsnap
side: just report gettimeofday()
. I'll investigate how to match this correctly on the swingset side as I make the corresponding swingset changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read a bit more about CLOCK_MONOTONIC
: linux implements it by returning the normal gettimeofday()
time plus an offset, which is initialized to zero at kernel boot. If someone calls settimeofday()
, the kernel calculates the delta being applied to the main clock, and reduces the offset by that delta, so the net CLOCK_MONOTONIC
value doesn't change. NTP generally only jumps the time once, when you first install it, and it finds that it needs to make a change of more than 500ms. That jump will get included in the offset. After that, NTP usually manages to limit itself to gentle slews by adjusting the tick update frequency, and since those don't use settimeofday()
, the offset isn't changed. So unless you're unlucky enough to observe a leap second or a time source flapping, or you're on a laptop that sleeps, or you've got a time synchronization tool that uses settimeofday()
a lot instead of adjtime()
, then gettimeofday
and timeOrigin + CLOCK_MONOTONIC
should remain within a tiny offset of each other.
Still, I found an NPM module (microtime
) to use gettimeofday()
on the swingset side, and I'll change the slogger to use it. I'm more interested in maintaining the comparability of slog timestamps to other processes and computers than I am worried about the occasional timequake causing an outlier in a few spans. We do lots of analysis that subtracts e.g. deliver-result
- deliver
, or syscall-result
- syscall
, and if a timequake occurs during a delivery or syscall, that one value will be corrupted. By using gettimeofday()
on both sides, we'll limit the damage to a single delivery, whereas if we used timeOrigin + CLOCK_MONOTONIC
, we'd see the offset present in every single timestamp until the process restarted.
I thought we already documented the worker protocol. I'm on a small screen so it's hard for me to confirm. Maybe @kriskowal is a better reviewer? |
xsnap/documentation/xsnap-worker.md
Outdated
|
||
`sources/xsnap-worker.c` contains a variant of `xsnap` which accepts execution commands over a file descriptor. It is designed to function as a "JS coprocessor", driven by a parent process (which can be written in any language). The parent does a fork+exec of `xsnap-worker`, then writes netstring-formatted commands to the child. The child executes those commands (evaluating JS or delivering the command to a handler function), possibly emitting one or more requests to the parent during execution, then finally finishes the command and writing a status to the parent (including metering information). | ||
|
||
By default, the process starts from an empty JS environment (not even SES or `lockdown`). If the child is started with a `-r SNAPSHOTFILENAME` argument, it will start from a previously-written heap snapshot instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"not even SES or lockdown" - this is true, but shouldn't be for much longer. Not sure if that matters here, but as written the intended direction is lost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aye, I'll add a note
xsnap/documentation/xsnap-worker.md
Outdated
|
||
`sources/xsnap-worker.c` contains a variant of `xsnap` which accepts execution commands over a file descriptor. It is designed to function as a "JS coprocessor", driven by a parent process (which can be written in any language). The parent does a fork+exec of `xsnap-worker`, then writes netstring-formatted commands to the child. The child executes those commands (evaluating JS or delivering the command to a handler function), possibly emitting one or more requests to the parent during execution, then finally finishes the command and writing a status to the parent (including metering information). | ||
|
||
By default, the process starts from an empty JS environment (not even SES or `lockdown`). If the child is started with a `-r SNAPSHOTFILENAME` argument, it will start from a previously-written heap snapshot instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"from a previously-written heap snapshot" - the snapshot is more than the heap (stack, symbol table, etc). Maybe "from a previously a saved virtual machine snapshot"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll use "JS engine snapshot". I'll also update the w
command description to something like "and then the JS engine state (heap, stack, symbol table) is written..".
xsnap/documentation/xsnap-worker.md
Outdated
The launch arguments are: | ||
|
||
* `-h`: print this help message | ||
* `-i <interval>`: set the metering interval (TODO: what does this mean?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The details are tricky (and likely to evolve). The idea is straightforward. Checking the meter for overflow takes some time. The interval is a way to check less frequently. A larger interval means code is likely to exceed its computron budget by more, but will execute more quickly.
xsnap/documentation/xsnap-worker.md
Outdated
* `s` (run script): the body is treated as the filename of a program to run (`xsRunProgramFile`) | ||
* `m` (load module): the body is treated as the name of a module to load (`xsRunModuleFile`). The module must already be defined, perhaps pre-compiled into the `xsnap` executable. | ||
* for both `s` and `m`, an error writes a terse `!` to fd4, and success writes `.${meterObj}\1` (the same success response as for `e`/`?` but with an empty message: just the metering data) | ||
* `w`: the body is treated as a filename. A GC collection is triggered, and then the JS heap is written to the given filename. The response is `!` or `.${meterObj}\1` as with `s`/`m` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, "heap" here might be better described as something like "virtual machine snapshot."
xsnap/documentation/xsnap-worker.md
Outdated
If at any point the computation exceeds one of the following limits, the process will exit with a non-zero (and non-negative) exit code: | ||
|
||
* `E_NOT_ENOUGH_MEMORY` (11): when memory allocation uses more than `allocationLimit` bytes (hard-coded to 2GiB in `xsnapPlatform.c`/`fxCreateMachinePlatform()`) | ||
* `E_STACK_OVERFLOW` (12): when the stack exceeds the configured limit (hard-coded in `xsnap-worker.c` as `stackCount` to 4096) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This XS abort value can be generated for two reasons: JavaScript stack overflow and native stack overflow. The doc refers only to the JavaScript stack case. We may want to create another error code for native stack overflow to eliminate the ambiguity (that's non-trivial as it will have to touch every XS host/port...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll mention both. Is stackCount
specifically limiting the JS stack? Is there another variable I should cite which defines the native stack size limit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stackCount
is the size of the JavaScript stack in slots. XS does not provide a way to manage the native stack size. If that is needed, it must be done externally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, now I'm not sure I understand. If E_STACK_OVERFLOW
could be provoked by either JS or C stacks, that implies something within XS (or within the platform, or the application wrapping the XS engine) is paying attention to the C stack. At least sometimes.
I see an fxCheckCStack()
which compares the C stack pointer to the->stackLimit
and can do fxAbort(the, XS_STACK_OVERFLOW_EXIT)
. Looks like it's called during the GC mark phase, and maybe others (via mxCheckCStack()
but only if mxBoundsCheck
is enabled (it is enabled for our build). And I see the->stackLimit
using fxCStackLimit
which might return 128 KiB plus some margin, but only if pthreads
cooperates?
I guess all I really care about is how to document this exit code. What I think I'm hearing is that some configurations might notice when the C stack grows too big, and if it does, E_STACK_OVERFLOW
is how it gets expressed. But there isn't a simple way to control what the limit is, and XS doesn't provide any guarantees that the C stack will be checked (some configurations might trigger a SIGSEGV or ENOMEM or something, instead of an fxAbort()
-signalled error exit).
Is that close enough? Does my new "also, at least for now, when the native stack exceeds a limit" messaging capture it? We don't have to name the limit if it's not something that can be configured, but if E_STACK_OVERFLOW
might be talking about a C stack instead of a JS stack, I should mention that in the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
XS can be built to check for native stack overflows or not. But, xsnap (and xsnap-worker) should always have the native stack check enabled. That being the case, the documentation for xsnap-worker should indicate that E_STACK_OVERFLOW
may be an overflow of the JavaScript stack or the native stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gotcha, thanks
https://github.com/agoric-labs/xsnap-pub/blob/master/xsnap/readme.md describes the JS environment (e.g. " (mostly I wanted to have some docs to change as this PR makes changes to the protocol return values) |
I was thinking of https://github.com/Agoric/agoric-sdk/blob/master/packages/xsnap/README.md#xsnap but that documents only the api state machine, not the pipe protocol. |
* the meterObj is separated from the result by a 0x01 byte (i.e. U+0001 if the body is parsed as UTF-8) | ||
* the `result` field is the ArrayBuffer | ||
* `s` (run script): the body is treated as the filename of a program to run (`xsRunProgramFile`) | ||
* `m` (load module): the body is treated as the name of a module to load (`xsRunModuleFile`). The module must already be defined, perhaps pre-compiled into the `xsnap` executable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The m and s features are probably best used via xsnap.c . We copied them over and haven't deleted them, but we probably should. Maybe mark them deprecated?
@@ -147,6 +147,67 @@ typedef enum { | |||
E_TOO_MUCH_COMPUTATION = 17, | |||
} ExitCode; | |||
|
|||
#define MAX_TIMESTAMPS 100 | |||
static struct timeval timestamps[MAX_TIMESTAMPS]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Global mutable state. Harrumph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recently watched a talk by Joe Armstrong ... Erlang processes cost just a few hundred bytes each.... It reminded me of our aspiration to put more than one vat in a process.
But that's a looooong way from the top of our list, so I won't insist on passing this around explicitly today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point. For the sake of practice, I'll sketch out what it would take to avoid this global state:
- define a
struct timestamps
- declare an instance of it on the stack of
main()
, next to thexsMachine *
- (in general I'm scared of declaring large structures on the stack, especially ones that get written to in variable amounts: if I had to choose between overwriting the stack and overwriting the heap, I'll pick the heap)
- define
resetTimestamps
,recordTimestamp
, andrenderTimestamps
to accept astruct timestamps *
- change
fxWriteOkay
to accept astruct timestamps *
- also accept a pointer to a buffer to fill, and change all callers of
fxWriteOkay
(all are inmain()
) to reference a rendering buffer declared on the stack ofmain()
- (same concern about overwriting the stack)
- or include the rendering buffer in
struct timestamps
- also accept a pointer to a buffer to fill, and change all callers of
- change
xs_issueCommand
to acceptstruct timestamps *
because it's needed byrecordTimestamp
- .. but
xs_issueCommand
is a callback invoked from the engine, so I can't really change its signature - so really, I'd need to modify
mxMachinePlatform
or something to jam thestruct timestamps *
into thexsMachine
structure
- .. but
Yep, I think that's not practical for today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW – you can add your own fields to the xsMachine structure (that's what mxMachinePlatform
does in xsnapPlatform.h). That would bind the timestamps to each VM rather than making it global state. I'm not sure if that's desirable here though.
feat: add sent/received timestamps to the metering record
This does not change the engine behavior, but it augments the metering
results with a "timestamps" field (an array of Numbers), which
captures the times at which:
results are serialized)
The parent process can correlate these timestamps with similar ones
collected on the parent side, to determine how long it took to send
messages over the pipe. If an unexpected stall is observed, this can
help distinguish between the stall happening on the worker process or
on the parent process.
refs Agoric/agoric-sdk#5152