Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shared structs vs Serializable/Transferable objects #8

Closed
Jamesernator opened this issue Sep 20, 2021 · 6 comments
Closed

Shared structs vs Serializable/Transferable objects #8

Jamesernator opened this issue Sep 20, 2021 · 6 comments

Comments

@Jamesernator
Copy link

This is an issue covering quite a large idea, however I think it would be considerably more ergonomic that the much lower level shared struct idea, while still perfectly permitting high efficiency shared structs where no code sharing is used.

So the idea is as the title suggests, instead of having highly basic "shared structs" we expose the notion of serializable/transferable as a first-class concept within the JS language itself. This would allow authors to implement a rich class of objects rather than the rather painful status quo of marshalling such objects into lower-level serializable/transferable values.

As an example consider a host object like OffscreenCanvas returned from a call to .transferControlToOffscreen(), such an object will have an implicit shared memory buffer which the thread can write with the OffscreenCanvas abstraction, but the renderer can read from an entirely different thread.

Now to give an example of how such an API could potentially look, I give below an example of a theoretical version of AbortSignal that is transferable, involves shared state, but is otherwise compatible with the existing API:

// NOTE: In this example I am going to use ${Name} to indicate
// places where free variables are initialized based on the
// thread

// We declare the class as serializable struct, this gives it all
// the following super-powers that enable it to be cloned
// across threads, NOTE that we do require this class to also be a struct
// as we cannot dynamically add properties to the class
// 
// One of the first things to note is that this serializable declaration
// applies to the WHOLE class, it causes the class, all prototype methods, all
// static methods, the constructor and beyond to inherit serializable semantics,
// the meaning of these semantics will become clearer as you follow the example
// 
// Now by serializable extending to the whole class it means many things are
// well founded, for example suppose we received an AbortSignal as defined below:
// self.onmessage = (event) => {
//   const abortSignal = event.data.signal;
//   // This class is available and fully operational
//   // because of the shared semantics the entire class
//   // can be cloned into this thread
//   const AbortSignal = abortSignal.constructor;
//   const newSignal = new AbortSignal();
// }
// 
// Now one of the first things to notice about this declaration itself is that
// we can subclass ANY objects, not just transferable ones, when the AbortSignal
// class is transferred into a thread (either directly, or indirectly via an instance)
// we lookup the free variable in the new thread and initialize it as such
// i.e. the ${EventTarget} acts as a kind've template, which is filled in by
// values on the thread this value has been received on
serializable struct class AbortSignal extends ${EventTarget} {
    // Upon serialize to another thread, this property is also structured serialized
    // in this case as it's just a shared memory, it becomes usable as normal
    // 
    // Ideally we would be able to sugar this up somehow, like instead of
    // explictly making a SharedArrayBuffer and manipulating it, we could just declare
    // shared #aborted = false;
    // however then we need to extend Atomics to private fields somehow, as this is just
    // sugar it does not change the conceptual design of this example so I'm omitting it
    #aborted = new Int32Array(new SharedArrayBuffer(4));

    // No AbortController in this example, so we'll just use the revealing constructor
    // pattern to provide abort capability, note that the callback we pass into
    // the start function is not in anyway shared, in fact it lives only on
    // the thread where we actually created the AbortSignal and is not serialized
    // or transfered in anyway as basic closures are not transfered
    //
    // The constructor here is quite special in that on deserialization of such objects
    // on another thread the constructor will be called on an ALREADY INITIALIZED instance
    // of the class
    constructor(start) {
        // We have a new meta-property (or something) available in the constructor
        // of seriazable classes, if we are deserializing this value from another thread
        // then the value of "this" will already be partially defined, upon calling
        // super() the "this" value in the constructor will become new.serializedInstance
        // exactly, the reason this is available now is so that we can access fields with
        // data that are needed to be passed up into any superclass
        // in this case I'm just logging it for illustrative purposes as EventTarget
        // accepts no arguments so we don't have anything to do here
        console.log(new.serializedThis);
    
        // super() behaves fairly specially here as well
        // in particular if Superclass is also serializable
        // it will ALSO be called in deserialization mode rather
        // than being called as a constructor normally, when this happens
        // field initializers DO NOT RUN, as the data is already available
        // on the new.serializedThis
        super();
        
        // If we already had an instance from another thread, then the constructor
        // has been called as a such we wouldn't need to initialize it
        if (!new.serializedThis) {
            // This function simply closes over the object in the usual way
            // this is fine as this function isn't transfered over the thread
            const abort = () => this.#abort();
            start(abort);
        }
        // Regardless of thread, we need to observe the #aborted 
        this.#listenForAbort();
    }
    
    // This is an ordinary method, it is simply cloned by definition to other threads,
    // note that it won't actually be called on other threads unless they create
    // new abortSignals (i.e. using signal.constructor)
    #abort() {
        // This logic isn't overly defensive as writing this
        // value is only done on a single thread
        if (Atomics.load(this.#aborted, 0) === 1) {
            // Already aborted so do nothing
            return;
        }
        // We set the abort on the signal
        Atomics.store(this.#aborted, 0, 1);
        // Notify all threads that their abort signal needs to fire an event
        Atomics.notify(this.#aborted, 0);
    }
    
    // This is on the whole, a regular method that simply returns
    // the value stored in this.#aborted, it is cloned purely by
    // it's definition so doesn't need any special treatment
    get aborted() {
        return Boolean(Atomics.load(this.#aborted, 0));
    }
    
    // Now this is the MOST IMPORTANT magic of what serializable enables, essentially
    // this is how we are able to initialize our objects when they are received on
    // other threads, essentially this "function"-block-thing is called when a thread
    // deserializes an AbortSignal object
    deserialize {
        // On other threads, the constructor was never called for this object
        // so our post deserialization steps are simply to register to listen
        // out for abort's on our #aborted field
        this.#listenForAbort();
    }
    
    // Again, just another bog-standard method, this is cloned simply by redefining
    // this function at the destination
    #listenForAbort() {
        const { async, value } = Atomics.waitAsync(
            this.#aborted,
            0,
            // If the abort signal has been aborted already then this will cause
            // waitAsync to return synchronously
            0,                
        );
        // If the signal is not already aborted, we'll fire an event when it
        // eventually does become 
        if (async === true) {
            value.then(() => this.dispatchEvent(new Event("abort"));
        }
    }
}

// Nothing special about this really, the inner function creates a local
// closure to the local value of abortSignal
const abortSignal = new AbortSignal(abort => {
    setTimeout(abort, 5000);
});

// We can send the signal to another thread, what this does is 
worker.postMessage({ abortSignal });

self.addEventListener("message", (event) => {
    // An abort signal from another thread, entirely up and ready to go, prior
    // to firing this event the object was entirely deserialized into the current thread,
    // future events won't need to repeatedly deserialize the whole AbortSignal class
    // however as it can just be cached
    const abortSignal = event.data.abortSignal;

    // We can call all methods and such as per normal, as super() was called to initialize
    // abortSignal as an EventTarget in this thread, it has become an EventTarget
    // also in this thread
    abortSignal.addEventListener("abort", () => {
        console.log("Aborted!");
    });
});
@Jamesernator Jamesernator changed the title Shared structs vs Serializable/Transferable objects with shared properties Shared structs vs Serializable/Transferable objects Sep 20, 2021
@syg
Copy link
Collaborator

syg commented Sep 20, 2021

IIUC there's nothing shared as part of a serializable struct idea, and it's up to each struct to carry along some SAB if it wants to share data?

That doesn't seem like an adequate replacement to me as the only sharing primitive. Having to create per-thread wrappers, even if those wrappers were easier to set up than plain objects, is a big performance issue for large object graphs. Additionally, do you have to postMessage each object individually if you had a large object graph, as you do now? I've called the latter problem "graph discovery" in the past, and to do an O(n) discovery of the object closure when you want to transfer an object graph is also a big performance issue.


Efficient transferables is something I've thought about in the past, and my conclusion was that it's better solved in a separate proposal. This proposal is narrowly scoped and is about providing a necessary evil/escape hatch kind of low-level shared memory primitive. I want it to be expressive enough to write lock-free code in userland, for instance, and I want it to be usable for mega-apps or frameworks willing to take on maintenance burden of something that's harder to use (but still way easier than SABs!) for the performance and memory savings.

But seeing I think this is an escape hatch-y power feature, I wouldn't recommend it as something apps without strict performance or memory requirements to adopt willy nilly. For those kind of apps, something like better transferables or a coarse-grained reader-writer lock kind of model without data races is a better answer. That's a different design space, and one I plan to tackle in the future. Most importantly though, I contend that we need both this proposal and the safer, higher-level thing.

@Jamesernator
Copy link
Author

Jamesernator commented Sep 20, 2021

IIUC there's nothing shared as part of a serializable struct idea, and it's up to each struct to carry along some SAB if it wants to share data?

The example doesn't use it because there was already a lot going on, but we'd want to include sugar around fields so that they can be set more directly, i.e. in the example we'd want something shorter like:

shared #aborted = false;

With this sugar, the object wrapper would become more of a fiction, in particular suppose we had something like the following:

// All fields are shared, so when transfering to another thread
// the object wrapper is entirely fictional, the memory can be accessed
// more directly
transferable struct class PureSharedStruct {    
    shared x = 10;
    shared y = 30;
}

// Becuase not all fields are shared on this MixedStruct there would
// be two layers to this object, the shared common
transferable struct class MixedStruct {
    // Accessing these fields would essentially be direct shared memory access
    shared x = 10;
    shared y = 20;
    
    // The presence of a non-shared field means there's a virtual wrapper
    // object around the shared parts, however this is still kind've fiction
    // because depending on what data is here the implementation may well be able
    // to optimize it away with things like copy-on-write, or if we had some way to mark this
    // as non-writable then even just memcpy or something
    regularData = someStructuredSerializable;
}

// Here we have a pure-shared-struct like object, but with methods
// upon transfering the first of these object onto another thread
// we clone the prototype and class definition, however the rest of
// the object is effectively just shared memory, and copying instances
// beyond the first should still be ideally efficient 
transferable struct class SharedStructWithMethods {
    shared x = 10;
    shared y = 20;
    
    updateX() {
        this.x = 30;
    }
}

Additionally, do you have to postMessage each object individually if you had a large object graph, as you do now? I've called the latter problem "graph discovery" in the past, and to do an O(n) discovery of the object closure when you want to transfer an object graph is also a big performance issue.

So the main goal with this idea is no, you don't need to post each object individually, rather the object is cloned basically as is, regular properties get structured cloned. With the sugar syntax though, engines would be easily able to discover fields that can be shared directly, i.e. these fields aren't structured cloned at all (or at least they are cloned in the same sense "SharedArrayBuffer" is "cloned" onto another theead).

Now mutable object graphs would still work, we would simply require that the only kinds of values that can be stored in a shared field would be (non-symbol) primitives and transferable objects. And yes any transferable object would be able to be stored on such a field, prior to atomically comitting the whole value the structured serialize/deserialize would be applied to the object and then the reference would be atomically set.

And even though the machinery seems like it would be more expensive than the current proposal, as long as engines check what is defined, it would still be pay for what you use, i.e. if you have a transferable that consists purely of shared fields, and superclasses that are also only pure-shared structs, then this behaves exactly like the shared struct proposed and can be optimized in essentially identical ways.

In fact even pure-shared + methods would still be able to be made very efficient as the prototype/class only needs to "cloned" once into the other thread ever, and even then it could be computed lazily as the idea above ensures a structured class that doesn't use ${LocalName} is essentially thread-invariant. For such classes the idea that these classes even live on a thread is itself also spec fiction, in practice the entire transferable class PureStructWithMethods {} could live in one location of the agent and shared amongst all threads that reference it.

@syg
Copy link
Collaborator

syg commented Sep 21, 2021

or at least they are cloned in the same sense "SharedArrayBuffer" is "cloned" onto another theead

I don't understand this part. SAB wrapper objects are actually cloned onto another thread. A core value add of the shared structs proposal is that there is no wrapper object cloning, the instance is shared directly.

Is identity of your transferable structs preserved in a roundtrip?

@Jamesernator
Copy link
Author

I don't understand this part. SAB wrapper objects are actually cloned onto another thread. A core value add of the shared structs proposal is that there is no wrapper object cloning, the instance is shared directly.

I seem to recall that one of the reasons for SharedArrayBuffer not surviving a round-trip (and other host transferables) is that it needs to be able to participate in heaps both inside v8, but also other heaps in the browser. I dunno if this is still a concern in engines as ideally SharedArrayBuffer would actually survive round-trips, however I dunno if this could made efficient as SharedArrayBuffer instances are not currently frozen, changing them to frozen would probably be a web-compat issue.

Is identity of your transferable structs preserved in a roundtrip?

Yeah, my intention was that it should, although I think my idea above as described doesn't quite work how I intended, there's a couple points above that would need tweaking so that everything works so that is the case. In particular about when non-shared fields get structured cloned would need to be considered properly so that round-tripping works as expected, in fact thinking about it it would probably be better that field values are not implictly structured cloned by default but we provide a way to initialize those fields properly.

Existing web objects don't survive round trips, but again this is something that probably came from the same reason SharedArrayBuffer doesn't survive round trips. I think that the idea could be tweaked so that we have explicit serialize/deserialize steps rather than overloading the constructor, with shared fields being auto initialized as part of super().

I'll have to have a think about exactly what tweaks would need to be made to my idea above to make it workable, but a core principle of my idea is that trivial objects without methods and without non-shared fields should work identically to how shared struct is currently proposed i.e. this example should behave identical in everyway to the shared struct proposal:

// Engines should be able to optimize this exactly the same way as
// shared struct, additional methods, accessors, non-shared fields and all that
// would be pay-for-what-you-use
transferable struct class MyClass {
  shared x = 10;
  shared y = 10;
}

The main divergence from the current proposal is that shared fields need to be explictly declared rather than implictly applying to all fields within the "shared struct", i.e.:

shared struct class MyClass {
  // Implictly shared and updatable on all threads
  x = 10;
}

transferable struct class MyClass {
  // Explictly declared as shared, however the semantics are
  // very similar to the current proposal
  shared x = 10;
  
  // The main difference is we can have non-shared fields they aren't automagically shared,
  // if you don't use these fields you pay nothing
  // And thinking about this more thoroughly based on your feedback I actually 
  // think there should be NO implicit structured cloning of this field, instead
  // when the receiving thread calls `constructor()`/`super()` we'd reinitialize
  // this field in the new thread as if it were a regular property
  // in this regard shared fields would be special in that calling `super()` would NOT
  // reinitialize those fields
  y = someObject;
}

@Jamesernator
Copy link
Author

Jamesernator commented Sep 21, 2021

Also just as a quick note, I think if we had shared field be required syntax for such fields in shared struct we could actually have everything I was thinking about above purely be extensions to shared struct as part of code-sharing i.e.:

// If we ship shared-struct like this, we can add non-shared fields and serialization
// methods and all that on top of this base at a later date as extensions
shared struct class MyClass {
  shared x;
  shared y;
}

I'll have a larger think later about what this layering might look like, but as a very initial idea:

shared struct class MyClass {
  // This field is shared, when constructor() is invoked
  // on the new thread this field is NOT INITIALIZED
  shared #sharedField = 20;
  // The FIRST TIME a thread receives a shared struct object it will
  // cause the constructor to run, these intializers will be initialized
  // in the usual way locally on that thread
  #nonShared;
  #nonSharedWithInitializer = new Map();

  constructor() {
    super();
    // We'd have some way to receive cloned data in the constructor
    console.log(new.clonedData);
  }
  
  // New special function-like block that allows sending some serialized data
  serialize {
    return structuredSerializableData;
  }
}

Effectively we'd have shared struct objects have a thread-shared-section containing all the shared fields and a thread-local section containing everything else, if there are no non-shared fields then the thread-local section would be empty and hence behaves exactly like what the current proposal does.

The main thing I overlooked above, and that I will need to think heavily about is when exactly the constructor() gets run on the new thread. Running the constructor obviously can't be atomic as it runs arbitrary JS code so it can't happen immediately on setting the field i.e.:

const sharedObject = new SharedObject();

transferToAnotherThread(sharedObject);

const otherSharedObject = new SharedObject();
// We can't atomically call the constructor on the other thread,
// however we might be able to schedule it to be called when it is
// first "observed" on the other thread, i.e.
// when `sharedObject.sharedField` is invoked on the other thread, if
// the object has a constructor and isn't already initialized then
// it will trigger the constructor, this will require some careful thought
// though to ensure it behaves nicely
sharedObject.sharedField = otherSharedObject;

@Jamesernator
Copy link
Author

Jamesernator commented Jun 16, 2022

So after some reflection, I think that while the concept I had above might be able to made technically feasible, it is probably still less appealing for both the shared structs use cases and the custom serializable/transferable use cases and the two use cases are probably better served by this proposal and this issue/proposal separately. (In fact the latter would probably be able to utilize shared structs fairly effectively for certain serializable objects).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants