Skip to content

Explicit Framing Protocol Proposal

emdash edited this page Jun 1, 2019 · 5 revisions

Key idea: allow Oil / OSH to manipulate chunks of arbitrary data via streams.

Internally, uses explicit framing format that can handle arbitrary bytes. Something similar to: https://tools.ietf.org/html/rfc6455#section-5.

Message (a.k.a. "Record", a.k.a. "Packet") is a variable-length chunk of arbitrary bytes. Can contain newlines, nulls, etc. Quoting is not required for data sent in framed packets. Escaping is done through filters.

Interface is via primitives:

  • put: analog to echo, but packs arguments into message.
  • get: analog to read, but unpacks a single record.
  • escape: read messages from input, output quoted / escaped text.
  • unescape: lift escaped input into messages of raw strings.

Use Cases

  • Safely converting between different formats, with different quoting rules.
  • Allow OSH / Oil to route data coming from disparate sources, preserving original message boundaries.
  • Possible strategy for / complement to Structured Data Over Pipes.
  • Distributed computing.
  • Message brokers / queues / event streams.

Note: by itself, not a format for structured data: just makes it easier and safer to manipulate streams of records from the shell. Allows handling quoting at pipeline endpoints, rather than needing to be managed at each stage by the programmer.

Examples

Trivial example

Safely output random 64-bit values.

rand64() {
  while true; do
    # bytes may contain embedded nulls, newlines, etc.
    read -n 8 bytes < /dev/random
    put "${bytes}"
  done
}

rand64 | escape --python -d'\n'

In the above example, rand64's stdout is framed. Escape must be used to obtain plain text. In this case, python escaping is used, delimiting records with newlines. Escape would default to some sensible shell-quoting dialect. Other formats might include:

  • json
  • c / c++
  • tsv2

TODO: Decode JSON, output msgpack to serial port in custom envelope.

JSON is "plain text", msgpack is binary, but otherwise very similar to JSON. Both are "document-oriented": neither spec defines how to pack multiple documents into a bytestream. JSON can be comfortably new-line delimited -- IF the document is packed onto one line. With msgpack, one needs an explicit framing protocol. With framing, we can use existing tool to safely handle streams of complete documents.

Advantages to demonstrate:

  • msgpack contains embedded nulls, but this is no problem for framed channels.
  • input json documents can contain newlines! (i.e. can handle pretty-printed input!)

TODO: Re-write Git log example.

Questions:

  • can we use strings as byte buffers, or does there need to be a byte buffer type?
    • I.e. What happens if an oil string contains an embedded null?
  • Is it better if get put work more like read, and interpret their argument as the name of a var?
    • i.e. put foo > socket vs . put "${foo}" > socket.
  • Would get \ put collide with any existing builtins / commands?
    • if so, perhaps could be re-cast as either flags for read, echo, printf, or as a "mode" on a file descriptor, accessible via dup. Or some combination.
Clone this wiki locally