-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RPC Raw Streams for Binary Data #2
Comments
As just discussed. we need to add some changes to the RPC system here.
Either of these points might be a new issues. TBD. |
Also this one before extracting out to Polykey CLI? |
I'm only going to get to this at the trailing end of the agent migration. In fact, I may have to complete the agent migration PR before the CLI change just so we have a clean history to start from. But maybe not, CLI is reasonably separate from agent migration.. |
Some notes for binary data formats for encapsulation:
The protobuf-es appears to support dynamic messages that don't require creating a proto file. The creation of a proto file is sort of unnecessary in most of our API due to our usage of JSON RPC. It would be rare for us to need protobuf. As per the comment in MatrixAI/Polykey#495 (comment) the protobuf is only needed for mixed messages (and in most cases we will use JSON with base64url) and/or chunked processing (that isn't already governed by underlying protocols like git and filesystem). Furthermore CBOR is pure JS, it runs in the browser too. So CBOR is likely the better solution since it would be used in the DB with MatrixAI/js-db#58 and then the same dependency can be shared. CBOR though is a less popular format... so third party clients may find it more difficult. |
We can't be sure that endianness actually matters here. Because we would just be sending bytes. See the discussion here: https://chat.openai.com/share/8786aa4f-53c2-4b25-95a0-c29e7223da12 This basically means for file transfers, we should not need to care about endianness. This should apply to when we are doing vault cloning and pulling, and uploading/downloading files to the vault. It would only matter if we are sending "structured" binary data that we expect to be interpreted by a program. But even then, this structured binary data would have to be as if we were to just serialise a C struct (literally raw memory) and send it over. Most binary formats now specify a specific endianness. |
So yes, if we were to send |
Maybe the terminology is a bit weird? We don't technically downgrade the stream. What we do is take the raw binary stream and pipe it into a JSON message parser and read out a single message. When then cancel the parser stream allowing the raw stream to be used again. That raw stream is then passed to the raw handler along with the initial message that was read. Note that we only have 1 kind of raw stream which is a duplex stream. Client, server or unary behaviour has to be re-implemented within the raw handler. All the normal style handlers are built on-top of the raw handler. It takes the raw stream and pipes it through the middle-ware and parsing streams before providing it to the handlers re-inserting the initial message into the stream while doing so. We need to add the feature where a raw stream can respond with leading metadata. Ideally the middle-ware is used for leading metadata. In all cases the raw handler should respond immediately with leading metadata before using the stream. This may have to be done callback style. |
Oh I must have forgotten then. Only the duplex stream is raw. Well in that case, duplex stream should allow the client and server to both send a header message. However I think the leading header message should be required... before you continue using it in binary style. One question is that I assume that ordering constraints don't need to exist. That is the client can immediately start sending binary data after sending the header without waiting for the response. Or the client can wait for the response. It's up to the client. The question is whether we should force the existence of the leading JSON message first from both ends? |
Ok we can force the server stream to send a JSON message as the leading message before it can send anything else. If it has no header to send, it can send |
One issue with this idea is that if an error occurs and we are already at the raw stream for duplex, then there's no way to propagate errors anymore. Unless there was also a possibility of trailing messages. But that's more difficult to do since it's not possible to know when the raw stream data finishes and the trailing message starts. Because as a raw stream all possible bytes are legal. Unless the raw stream data were to be encapsulated in cbor/protobuf or other message framing constructs. |
Regarding errors. For regular JSON streams, errors are captured and converted to JSON and the other side will get it as an exception. For raw streams, errors are not captured and converted to JSON. Instead it depends on the underlying stream transport on how these errors are propagated like So I think we can go with this. But it is necessary to document and test the error behaviour of raw streams vs JSON streams for both QUIC streams and WebSocket streams. In particular I'd like have some common wrapper for stream errors that allows us to distinguish errors that occur at the RPC level, vs errors that are on the underlying stream (possibly caused by using raw streams). |
Further notes show that when using a stream. The underlying stream errors can only have a code. Well in the case of web sockets there can also be a data buffer. But to align the lowest common denominator between quic streams and web socket connections, it would have to be just a regular code. Then we need to distinguish between stream level errors and RPC level errors. That can be done with a different exception class. Then we need to consider UI behaviour. Suppose a user runs @tegefaulkes can you add some context on what needs to be changed for streams to deal with this. I think the websocket connection currently doesn't actually encode any error code. And in the case of quic stream, you have to use The spec for this should be updated above! |
I think we can copy the |
Can you reference the part of the code that needs to be changed to support the error codes for web sockets? |
I think we need further elaboration on what the shared types will be for all of this. |
The only handlers that would make use of this currently are the two agent git handlers |
There will also be uploading/downloading files from filesystem into the Polykey Agent. That will be necessary for client service. |
@tegefaulkes while playing around with But if we are using JSON RPC, then we should use text mode. Is it possible to then switch to binary mode when we drop down to raw streams?
|
We should make sure that we are always using binary mode for simplicity then. |
I think the only thing that needs to be done in PK right now is to ensure that there are leading messages from both ends even when dropping to binary protocol. However further development of this depends on the error management between connection-errors, stream-errors and RPC-call-errors: #3. This extra development would make sense to be done in |
Because of that, I'm moving this to |
Vault RPC was updated to use the proper raw streaming. Which means header message protocol is updated. However there aren't any tests on edge cases and corner cases where Any JSON value is a valid header message. From the handlers point of view any valid JSON can be set to the On the stream level, the header messages are always JSONRPC records with the initial one as a JSONRPC request, and reverse is a JSONRPC response. |
Currently we don't have tests that test badly implemented clients/servers. Where they are not sending a proper @amydevs there should be tests added to this repo, so that we get automatic closure of the stream in case of a bad message. If the application level protocol is not followed, it should be an application protocol error. That might just be an error code plus message on the stream abstraction provided by the transport level. Alternatively we could provide a |
This is connected to the #3 and MatrixAI/js-ws#4. |
More handlers in Polykey-CLI needs to be checked if they are using raw streams. For example uploading secrets from local disk to the PK agent via the PK CLI. It should be using raw streams but that can be reviewed soon on the PK CLI. |
This was done in MatrixAI/Polykey#535. |
# This is the 1st commit message: fix: RPCServer.start is now no longer a factory function fix: fixed RPC tests after async-init change # This is the commit message #2: chore: updated inline documentation according to async-init changes # This is the commit message #3: wip: fixing imports # This is the commit message #4: wip: RPCServer deconstruction # This is the commit message #5: fix: RPCServer start stop order # This is the commit message #6: fix: added back createDestroy back to RPCClient # This is the commit message #7: chore: lintfix # This is the commit message #8: wip: completely removed create destroy from rpcclient # This is the commit message #9: fix: RPCClient tests after async-init changes # This is the commit message #10: wip # This is the commit message #11: fix: idgen is optional in constructor # This is the commit message #12: fix: type import in ./types # This is the commit message #13: chore: test for RPCServer force stopping # This is the commit message #14: fix: proper implementation of fromError toError, clientOutputTransformStream no longer outputs error.data, but rather error directly # This is the commit message #15: fix: jest fix for ErrorRPCRemote # This is the commit message #16: wip toError fromError fix: proper implementation of fromError toError, clientOutputTransformStream no longer outputs error.data, but rather error directly fix: changed rpcClient toError implementation fix: changing ErrorRPCRemote constructor to be in-line with toError error creation constructor fix: jest fix for ErrorRPCRemote fix: fixing exports fix: RPC errors now correctly extend AbstractError fix: removed old events fix: cleaned up imports feat: client has toError as parameter fix: removed type from JSONError obj fix: startStop constructor correctly used fix: infinity is definitely not == 1 minute chore: rename variable
Specification
RPC methods that need to send binary data such as secrets data for the client RPC or GIT pack data for the agent RPC. This needs to make use of the raw streams in the RPC system.
Raw streams take a header message and then act as a normal binary stream. The header can include any metadata you need. The binary data can then be streamed. To ensure there are no problems with Endianness we can chunk the data into protobuf messages.
Some things to note
See #2 for clarification on the usage of binary data formats like CBOR or Protobuf. It's only needed in limited usecases, and can be done as part of the RPC caller and handler.
When dropping to raw streams there should still be an exchange protocol between client and server.
The client sends a "header" message, and then should expect the server to send back a "header" message. Both should be in JSON. Only then the client can drop down to binary. However this can be just one of the ways of using it. This primarily applies to streaming: client streaming, server streaming and duplex streaming.
For example:
It seems we should enforce that there be a header message sent, even if it is empty like a
{}
.Additional context
Tasks
The text was updated successfully, but these errors were encountered: