Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgradeable ArrayBuffers #368

Open
sffc opened this issue Nov 30, 2022 · 5 comments
Open

Upgradeable ArrayBuffers #368

sffc opened this issue Nov 30, 2022 · 5 comments

Comments

@sffc
Copy link

sffc commented Nov 30, 2022

Possibly related: #134, #218

A potential direction this proposal could take, which would solve a new set of problems, could be "upgradeable ArrayBuffers":

  • The new language primitive is a byte array (ArrayBuffer by value).
  • The primitive can be "upgraded" by applying a schema to access data.

Something sort-of like this...

let input = { a: 1, b: 2 };

// Convert the object to its immutable value type
let primitive = Record.create(input);

// Convert the immutable value type back to an object using a pattern schema
let output = primitive.upgrade({ a: Number, b: Number });

// Access a field without upgrading (for efficiency)
primitive.get({ a: Number, b: Number }, `.a`)

What this achieves:

  1. Immutability
  2. Value Types
  3. Equality is trivial (a byte comparison)
  4. Hopefully easier to implement
  5. Step toward zero-copy deserialization

Some initial open questions:

  1. How to handle variable-length types in the record, like a String or a BigInt
    • Could allow one of these only in the final position (like Rust DSTs)
    • Or, variable-length fields could require a length prefix (but how is that represented?)
  2. Syntax for how to access fields
    • Call site needs the whole schema; how can this be expressed ergonomically?
@mhofman
Copy link
Member

mhofman commented Nov 30, 2022

How would this handle unique symbols?

Is there a fundamental difference with JSON serialization and string comparison? What would happen if you use a different schema between serialization and deserialization?

@sffc
Copy link
Author

sffc commented Nov 30, 2022

How would this handle unique symbols?

Good question; don't have an immediate answer to that. It could be that symbols are not permitted as values, but that would limit functionality (such as storing WeakMap keys).

Is there a fundamental difference with JSON serialization and string comparison?

The ArrayBuffer representation wouldn't be JSON; it would be a sequence of values.

This raises a question of whether a more ergonomic solution would be to say that the primitive can contain its context such that it can be upgraded without a schema. It would be much more flexible this way, but you'd lose the ability to perform random access.

What would happen if you use a different schema between serialization and deserialization?

Same type of thing as if you created a protobuf with one schema and deserialize it with a different schema.

@sffc
Copy link
Author

sffc commented Nov 30, 2022

Maybe a better solution in this vein is to say that the primitive is a CBOR buffer (RFC 8949), and then we can support fully ergonomic data access operations, with the catch that they may need to walk the CBOR buffer.

let input = { hello: "world", x: 100, y: true };

// Serialize to a primitive ArrayBuffer
let primitive = input.toCBOR();

// Deep equality
console.assert(primitive === input.toCBOR());

// Access a field (may require walking the CBOR :/)
console.log(primitive.hello) // "world"

There are dozens of attempts at binary object representation formats; maybe we could choose one with more efficient random access.

@lin72h
Copy link

lin72h commented Nov 30, 2022

please support symbol in the initial proposal, it's very important in our custom serialization format

@Maxdamantus
Copy link

Equality is trivial (a byte comparison)

I don't think reducing equality to byte comparison will ever be desirable. This will only work if string and bigint values are either copied into the structure (presumably as in this suggestion) or if they are interned on insertion.

I don't think copying strings is desirable. If an application has a very large string value and it decides to make a record from it, it shouldn't duplicate that memory use. Implementations of string concatenation even tend to avoid copying (instead producing "ropes" consisting of the source strings).

(As for the interning option, I think this is usually considered to be more of a performance burden than a gain, since it involves doing equality + hashing + global map manipulation on construction/free (lots of unconditional things) rather than simply doing equality on comparison).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants