Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add sent/received timestamps to the metering record #28

Merged
merged 5 commits into from
Jun 15, 2022

Conversation

warner
Copy link
Collaborator

@warner warner commented Jun 10, 2022

feat: add sent/received timestamps to the metering record

This does not change the engine behavior, but it augments the metering
results with a "timestamps" field (an array of Numbers), which
captures the times at which:

  • the primary delivery message was received
  • each syscall ("issueCommand" to parent) was sent
  • the response from each syscall was received
  • the delivery response was sent (actually just before the metering
    results are serialized)

The parent process can correlate these timestamps with similar ones
collected on the parent side, to determine how long it took to send
messages over the pipe. If an unexpected stall is observed, this can
help distinguish between the stall happening on the worker process or
on the parent process.

refs Agoric/agoric-sdk#5152

This does not change the engine behavior, but it augments the metering
results with a "timestamps" field (an array of Numbers), which
captures the times at which:
* the primary delivery message was received
* each syscall ("issueCommand" to parent) was sent
* the response from each syscall was received
* the delivery response was sent (actually just before the metering
results are serialized)

The parent process can correlate these timestamps with similar ones
collected on the parent side, to determine how long it took to send
messages over the pipe. If an unexpected stall is observed, this can
help distinguish between the stall happening on the worker process or
on the parent process.

refs Agoric/agoric-sdk#5152
The compiler flagged this as "case label value is less than minimum
value for type", since `command` is a `char`.
@warner warner requested a review from dckc June 10, 2022 17:29
@warner warner self-assigned this Jun 10, 2022
@warner warner requested a review from phoddie June 10, 2022 17:32
@warner
Copy link
Collaborator Author

warner commented Jun 10, 2022

cc @phoddie you might eyeball the docs I wrote and see if I got anything wrong

}
static void recordTimestamp() {
if (num_timestamps < MAX_TIMESTAMPS) {
gettimeofday(&(timestamps[num_timestamps]), NULL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the resolution of this syscall? Is it monotonic? Which Node API do we use on the other side, and do we know which syscall that corresponds to?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to man gettimeofday, it returns a struct tv, which is a time_t of seconds and a suseconds_t of microseconds. Experimentally I observe both sizeof time_t and sizeof suseconds_t reporting 8 bytes, so a 64bit integer each (on current macOS, M1Pro CPU). The resolution is platform dependent, but I think everything modern provides actual microseconds. There are syscalls to do better, but this one is super-standard/portable, and I'm not currently concerned about anything smaller than microseconds (although I'll be delighted if we get everything else running fast enough to make us care about nanoseconds).

In our application (swingset), the Node API on the other side uses perf_hooks.performance, and adds the constant performance.timeOrigin to the current value of performance.now().

I spent the afternoon tracing down how these are managed within Node (and I'm reminded that Time is one of the three deadly problems in computer science, which seems like it should be trivial, but in fact the complexity wants to kill you.. strings and money are the other two on my list).

Node bottoms out into two POSIX functions. The first is the same gettimeofday() I used in xsnap.c . The second is clock_gettime(CLOCK_MONOTONIC), which POSIX specifies to not decrease, but is still subject to frequency adjustments. Experimentally, CLOCK_MONOTONIC appears to return a structure with seconds/nanoseconds since boot.

At startup, every Node.js process samples clock_gettime(CLOCK_MONOTONIC) followed immediately by gettimeofday(): the two calls are right next to each other in src/node_perf.cc, but there's still probably a delay of a few ns between them (call this the "sampling delta"). The CLOCK_MONOTONIC value is storted in a variable named timeOrigin, and is used as the offset for calls to performance.now(): every time you call now(), it returns clock_gettime(CLOCK_MONOTONIC) - timeOrigin, so you get time-since-process-start. This protects now() from timequakes (when something like NTP changes the host's clock, either slowly or drastically).

By adding that value to performance.timeOrigin, we get time-since-epoch, plus the sampling delta. I think this is the recommended way to get high-resolution time-since-epoch in Node. The process.hrtime.bigint() returns the CLOCK_MONOTONIC value directly, which is not since-epoch. Date.now() has only millisecond resolution. But the sum will drift from a fresh sampling of gettimeofday() with each timequake, and won't line up again until the process is restarted (or you use

So we're comparing repeated calls to gettimeofday() (on the worker side), against a single gettimeofday() plus a delta of CLOCK_MONOTONIC (on the kernel side). One systemic offset is the sampling delta. Another potential offset is if/when gettimeofday() has a timequake (something like NTP changes the host's clock). Those offsets may get larger and larger over time, until the kernel is restarted and gets a fresh sample of gettimeofday() (there's a performance.refreshTimeOrigin(), but it doesn't cause a new call to gettimeofday(), so I think it's broken/irrelevant).

If I care enough to avoid that, I think we have two options:

  • 1: change the xsnap.c side to perform the same double-sampling at startup, and build seconds-since-epoch by addition just like Node.js does.
    • This would be additionally vulnerable to drift between the two processes when either one restarts (and does a new gettimeofday()).
    • On the plus side, the API would continue to be in seconds-since-epoch, which is the most meaningful thing to report
  • 2: change the interface to report CLOCK_MONOTONIC values in the metering data, ignoring the epoch, and having the kernel side do the addition so it can record an absolute time in the slogfile
    • it's safe to compare CLOCK_MONOTONIC against CLOCK_MONOTONIC, but the Node.js API doesn't actually expose that: only deltas between CLOCK_MONOTONIC values
    • the worker process could do its own subtraction, to build a value that behaves like performance.now(), but that'd be worse: you can't compare seconds-since-process-1-start against seconds-since-process-2-start

So really, the question is whether to report each gettimeofday(), or to add a single gettimeofday() to a delta between CLOCK_MONOTONIC values.

FWIW, my current tests aren't showing me any surprises: the timestamps I get from the worker line up in the middle of the timestamps I'm seeing on the kernel process. But I'm only looking at the timestamps from a single unit test. I should find a way to modify the system time in the middle of a test and see how bad it can get.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to do the simple and most-stable thing on the xsnap side: just report gettimeofday(). I'll investigate how to match this correctly on the swingset side as I make the corresponding swingset changes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read a bit more about CLOCK_MONOTONIC: linux implements it by returning the normal gettimeofday() time plus an offset, which is initialized to zero at kernel boot. If someone calls settimeofday(), the kernel calculates the delta being applied to the main clock, and reduces the offset by that delta, so the net CLOCK_MONOTONIC value doesn't change. NTP generally only jumps the time once, when you first install it, and it finds that it needs to make a change of more than 500ms. That jump will get included in the offset. After that, NTP usually manages to limit itself to gentle slews by adjusting the tick update frequency, and since those don't use settimeofday(), the offset isn't changed. So unless you're unlucky enough to observe a leap second or a time source flapping, or you're on a laptop that sleeps, or you've got a time synchronization tool that uses settimeofday() a lot instead of adjtime(), then gettimeofday and timeOrigin + CLOCK_MONOTONIC should remain within a tiny offset of each other.

Still, I found an NPM module (microtime) to use gettimeofday() on the swingset side, and I'll change the slogger to use it. I'm more interested in maintaining the comparability of slog timestamps to other processes and computers than I am worried about the occasional timequake causing an outlier in a few spans. We do lots of analysis that subtracts e.g. deliver-result - deliver, or syscall-result - syscall, and if a timequake occurs during a delivery or syscall, that one value will be corrupted. By using gettimeofday() on both sides, we'll limit the damage to a single delivery, whereas if we used timeOrigin + CLOCK_MONOTONIC, we'd see the offset present in every single timestamp until the process restarted.

@dckc
Copy link
Collaborator

dckc commented Jun 10, 2022

I thought we already documented the worker protocol. I'm on a small screen so it's hard for me to confirm. Maybe @kriskowal is a better reviewer?


`sources/xsnap-worker.c` contains a variant of `xsnap` which accepts execution commands over a file descriptor. It is designed to function as a "JS coprocessor", driven by a parent process (which can be written in any language). The parent does a fork+exec of `xsnap-worker`, then writes netstring-formatted commands to the child. The child executes those commands (evaluating JS or delivering the command to a handler function), possibly emitting one or more requests to the parent during execution, then finally finishes the command and writing a status to the parent (including metering information).

By default, the process starts from an empty JS environment (not even SES or `lockdown`). If the child is started with a `-r SNAPSHOTFILENAME` argument, it will start from a previously-written heap snapshot instead.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"not even SES or lockdown" - this is true, but shouldn't be for much longer. Not sure if that matters here, but as written the intended direction is lost.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aye, I'll add a note


`sources/xsnap-worker.c` contains a variant of `xsnap` which accepts execution commands over a file descriptor. It is designed to function as a "JS coprocessor", driven by a parent process (which can be written in any language). The parent does a fork+exec of `xsnap-worker`, then writes netstring-formatted commands to the child. The child executes those commands (evaluating JS or delivering the command to a handler function), possibly emitting one or more requests to the parent during execution, then finally finishes the command and writing a status to the parent (including metering information).

By default, the process starts from an empty JS environment (not even SES or `lockdown`). If the child is started with a `-r SNAPSHOTFILENAME` argument, it will start from a previously-written heap snapshot instead.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"from a previously-written heap snapshot" - the snapshot is more than the heap (stack, symbol table, etc). Maybe "from a previously a saved virtual machine snapshot"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll use "JS engine snapshot". I'll also update the w command description to something like "and then the JS engine state (heap, stack, symbol table) is written..".

The launch arguments are:

* `-h`: print this help message
* `-i <interval>`: set the metering interval (TODO: what does this mean?)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The details are tricky (and likely to evolve). The idea is straightforward. Checking the meter for overflow takes some time. The interval is a way to check less frequently. A larger interval means code is likely to exceed its computron budget by more, but will execute more quickly.

* `s` (run script): the body is treated as the filename of a program to run (`xsRunProgramFile`)
* `m` (load module): the body is treated as the name of a module to load (`xsRunModuleFile`). The module must already be defined, perhaps pre-compiled into the `xsnap` executable.
* for both `s` and `m`, an error writes a terse `!` to fd4, and success writes `.${meterObj}\1` (the same success response as for `e`/`?` but with an empty message: just the metering data)
* `w`: the body is treated as a filename. A GC collection is triggered, and then the JS heap is written to the given filename. The response is `!` or `.${meterObj}\1` as with `s`/`m`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, "heap" here might be better described as something like "virtual machine snapshot."

If at any point the computation exceeds one of the following limits, the process will exit with a non-zero (and non-negative) exit code:

* `E_NOT_ENOUGH_MEMORY` (11): when memory allocation uses more than `allocationLimit` bytes (hard-coded to 2GiB in `xsnapPlatform.c`/`fxCreateMachinePlatform()`)
* `E_STACK_OVERFLOW` (12): when the stack exceeds the configured limit (hard-coded in `xsnap-worker.c` as `stackCount` to 4096)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This XS abort value can be generated for two reasons: JavaScript stack overflow and native stack overflow. The doc refers only to the JavaScript stack case. We may want to create another error code for native stack overflow to eliminate the ambiguity (that's non-trivial as it will have to touch every XS host/port...)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll mention both. Is stackCount specifically limiting the JS stack? Is there another variable I should cite which defines the native stack size limit?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stackCount is the size of the JavaScript stack in slots. XS does not provide a way to manage the native stack size. If that is needed, it must be done externally.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, now I'm not sure I understand. If E_STACK_OVERFLOW could be provoked by either JS or C stacks, that implies something within XS (or within the platform, or the application wrapping the XS engine) is paying attention to the C stack. At least sometimes.

I see an fxCheckCStack() which compares the C stack pointer to the->stackLimit and can do fxAbort(the, XS_STACK_OVERFLOW_EXIT). Looks like it's called during the GC mark phase, and maybe others (via mxCheckCStack() but only if mxBoundsCheck is enabled (it is enabled for our build). And I see the->stackLimit using fxCStackLimit which might return 128 KiB plus some margin, but only if pthreads cooperates?

I guess all I really care about is how to document this exit code. What I think I'm hearing is that some configurations might notice when the C stack grows too big, and if it does, E_STACK_OVERFLOW is how it gets expressed. But there isn't a simple way to control what the limit is, and XS doesn't provide any guarantees that the C stack will be checked (some configurations might trigger a SIGSEGV or ENOMEM or something, instead of an fxAbort() -signalled error exit).

Is that close enough? Does my new "also, at least for now, when the native stack exceeds a limit" messaging capture it? We don't have to name the limit if it's not something that can be configured, but if E_STACK_OVERFLOW might be talking about a C stack instead of a JS stack, I should mention that in the docs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XS can be built to check for native stack overflows or not. But, xsnap (and xsnap-worker) should always have the native stack check enabled. That being the case, the documentation for xsnap-worker should indicate that E_STACK_OVERFLOW may be an overflow of the JavaScript stack or the native stack.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha, thanks

@warner
Copy link
Collaborator Author

warner commented Jun 11, 2022

I thought we already documented the worker protocol. I'm on a small screen so it's hard for me to confirm. Maybe @kriskowal is a better reviewer?

https://github.com/agoric-labs/xsnap-pub/blob/master/xsnap/readme.md describes the JS environment (e.g. "TextEncoder is added") and the CLI args for the xsnap program, but doesn't mention our xsnap-worker at all. The other docs in https://github.com/agoric-labs/xsnap-pub/tree/master/xsnap/documentation provides a nice description of specific features (snapshots, metering), but also nothing about xsnap-worker or the command protocol. I didn't find any other .md files in the repo.

(mostly I wanted to have some docs to change as this PR makes changes to the protocol return values)

@warner warner requested review from kriskowal and removed request for dckc June 11, 2022 00:44
@dckc
Copy link
Collaborator

dckc commented Jun 11, 2022

I was thinking of https://github.com/Agoric/agoric-sdk/blob/master/packages/xsnap/README.md#xsnap but that documents only the api state machine, not the pipe protocol.

* the meterObj is separated from the result by a 0x01 byte (i.e. U+0001 if the body is parsed as UTF-8)
* the `result` field is the ArrayBuffer
* `s` (run script): the body is treated as the filename of a program to run (`xsRunProgramFile`)
* `m` (load module): the body is treated as the name of a module to load (`xsRunModuleFile`). The module must already be defined, perhaps pre-compiled into the `xsnap` executable.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The m and s features are probably best used via xsnap.c . We copied them over and haven't deleted them, but we probably should. Maybe mark them deprecated?

@@ -147,6 +147,67 @@ typedef enum {
E_TOO_MUCH_COMPUTATION = 17,
} ExitCode;

#define MAX_TIMESTAMPS 100
static struct timeval timestamps[MAX_TIMESTAMPS];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Global mutable state. Harrumph.

Copy link
Collaborator

@dckc dckc Jun 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recently watched a talk by Joe Armstrong ... Erlang processes cost just a few hundred bytes each.... It reminded me of our aspiration to put more than one vat in a process.

But that's a looooong way from the top of our list, so I won't insist on passing this around explicitly today.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. For the sake of practice, I'll sketch out what it would take to avoid this global state:

  • define a struct timestamps
  • declare an instance of it on the stack of main(), next to the xsMachine *
    • (in general I'm scared of declaring large structures on the stack, especially ones that get written to in variable amounts: if I had to choose between overwriting the stack and overwriting the heap, I'll pick the heap)
  • define resetTimestamps, recordTimestamp, and renderTimestamps to accept a struct timestamps *
  • change fxWriteOkay to accept a struct timestamps *
    • also accept a pointer to a buffer to fill, and change all callers of fxWriteOkay (all are in main()) to reference a rendering buffer declared on the stack of main()
    • (same concern about overwriting the stack)
    • or include the rendering buffer in struct timestamps
  • change xs_issueCommand to accept struct timestamps * because it's needed by recordTimestamp
    • .. but xs_issueCommand is a callback invoked from the engine, so I can't really change its signature
    • so really, I'd need to modify mxMachinePlatform or something to jam the struct timestamps * into the xsMachine structure

Yep, I think that's not practical for today.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW – you can add your own fields to the xsMachine structure (that's what mxMachinePlatform does in xsnapPlatform.h). That would bind the timestamps to each VM rather than making it global state. I'm not sure if that's desirable here though.

@warner warner merged commit 3e67165 into master Jun 15, 2022
@warner warner deleted the 5152-xsnap-timestamps branch June 15, 2022 07:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants