Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xsnap-worker: define and implement command pipe-related error handling policy and mechanisms #10098

Open
siarhei-agoric opened this issue Sep 17, 2024 · 2 comments
Labels
enhancement New feature or request hygiene Tidying up around the house needs-design question Further information is requested technical-debt xsnap the XS execution tool

Comments

@siarhei-agoric
Copy link
Contributor

siarhei-agoric commented Sep 17, 2024

A clear and consistent error handling policy has to be developed, documented, and implemented to provide a solid architectural foundation for further development of xsnap-worker and JS that it hosts.

This is a spin-off from #4103 and agoric-labs/xsnap-pub PR#53.

TL;DR:
xsnap-worker's internal state machine is driven by the control loop which reads commands from the parent (controlling process) via one pipe and sends the responses back to the parent via another pipe, all of which implemented in C. JavaScript has no knowledge of the state machine, the pipes, nor the protocol. Thus, the question arises regarding what to do when an error is encountered by a C code while handling command-pipe IO related upcoming JS invocation or the results of there of.

There are two obvious extremes here:

A. On one end of the spectrum there is a strict separation policy which dictates that since JS has no knowledge of the protocol, lacks access to the pipes, and has no agency with respect to establishing or correcting any of the comms-pipe-related issues, the JS side should not get involved with the error handling at all. This implies that any pipe-comms related error leads to an immediate death of the JS container, since C side of the business lacks any understanding of what the actual JS code is doing.

B. Another extreme is to push all of the protocol into JS and let it figure our what to do on a case-by-case basis, depending on specific location in the code and the error type.

Current state of affairs is somewhere in between, but much closer to A above:
In some cases the worker takes a direct exit option xsnap-worker.c:531, while in others attempts to throw an error back into JS world xsnap-worker.c:927. PR#53 attempts to bring it even closer to A by causing and explicit exit on EOF/SIGPIPE.

@siarhei-agoric siarhei-agoric added enhancement New feature or request question Further information is requested hygiene Tidying up around the house needs-design technical-debt xsnap the XS execution tool labels Sep 17, 2024
@siarhei-agoric
Copy link
Contributor Author

siarhei-agoric commented Sep 17, 2024

Personally, I strongly prefer option A as it greatly simplifies C<->JS interface and minimizes attack/bug surface. However, I do recognize that some time in the future there may be a case where it would be beneficial for the JS to take some action when an error is detected. Thus, I would be open to have some sort of a limited interface which would let JS know regarding the upcoming imminent doom. I can see a model where C side invokes shutdown() in JS when it gets q (orderly shutdown while JS is quiescent) from the command pipe, and panic() when any pipe/protocol error is detected while JS is not quiescent.

Things to keep in mind:

  • the protocol does not allow for comms recovery, so the worker shutdown and restart is the only remedy
  • the I/O errors may happen before any JS is invoked or after it had completed all of the work and returned the results to the xs host for transmission.
  • the errors can also happen when JS is active and attempting to send a message to the parent.
  • further, the errors may be encountered during operations not directly involving the JS, such as R or w.
  • regardless of the error handling policy the fact of the error has to be logged in a helpful way.
  • any issues with control pipes are a likely indication of parent's inability/unwillingness to process any further input/output, so the worker has to handle and log the error on its own.

@mhofman
Copy link
Member

mhofman commented Sep 17, 2024

The approach taken should consider potential future improvements such as moving the vat store into the worker (#6254). Some local worker logic (in C or JS) may want to cleanly abort any transaction before exiting, or cleanup any temporary files created for an export, all of which does not involve communication with the parent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request hygiene Tidying up around the house needs-design question Further information is requested technical-debt xsnap the XS execution tool
Projects
None yet
Development

No branches or pull requests

2 participants