Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API simplification: context owns builder, graph becomes internal slot #303

Closed
zolkis opened this issue Dec 1, 2022 · 7 comments
Closed
Labels

Comments

@zolkis
Copy link
Collaborator

zolkis commented Dec 1, 2022

Lifted from #298 comment.

Proposal: include graph builder, command encoder as attributes to MLContext, and make MLGraph an internal slot of MLContext

This would simplify things a great deal and would make things more consistent: avoid exposing the empty MLGraph interface in the spec, and using it as internal slot would allow differentiation of how it can be used in different context types and also manage its lifecycle.

[SecureContext, Exposed=(Window, DedicatedWorker)]
interface MLContext {
  // graph is an internal slot (eventually in different forms during lifecycle) 
  // lifecycle state could also be an internal slot, e.g. 
  //    "building", "compiled", "encoding", "encoded", "compute" etc.

  readonly attribute MLGraphBuilder builder;  // dedicated builder operates on internal slots
  readonly attribute MLCommandEncoder? commandEncoder;  // only defined for "webgpu" context

  Promise<MLNamedArrayBufferViews> compute(MLNamedArrayBufferViews inputs);  
};

Rationale for change

The spec contains certain constraints that are hard to describe and enforce via algorithms. For instance, the note from the [MLContext section(https://webmachinelearning.github.io/webnn/#api-mlcontext):

When the [[contextType]] is set to default with the MLContextOptions.deviceType set to gpu, the user agent is responsible for creating an internal GPU device that operates within the context and is capable of ML workload submission on behalf of the calling application. In this setting however, only ArrayBufferView inputs and outputs are allowed in and out of the graph execution since the application has no way to know what type of internal GPU device is being created on their behalf. In this case, the user agent is responsible for automatic uploads and downloads of the inputs and outputs to and from the GPU memory using this said internal device.

Or, from the MLCommandEncoder section,

Record the initialization of the MLGraph. This is a necessary step for optimal performance during graph execution as it gives the platform an opportunity to prepare and optimize constant input data for the subsequent execution of the graph. This method should only be called once per graph.

To achieve that, there should be a possibility of binding MLGraph and MLCommandEncoder to an MLContext of type "webgpu".

Therefore I would add an internal slot [[model]] to MLContext that represents a compute graph bound to a context. If that context is of type "webgpu", then it will have MLCommandEncoder-specific initialization, dispatch and finish (e.g. the MLCommandEncoder interface could be exposed in the context as an attribute).

Also, the discussion in #149 reveals a possible use case of discerning between a compute graph (as built by a builder, that could be executed in a multiple contexts) and a graph that is initialized for a given context for execution.
A builder could be generic (initialized without a context) or bound to a context (when adaptation to the context could already happen during the build).
A context-bound builder's build() produces MLGraph that is already bound to that context.
A generic builder's build() could have a parameter for context for which the MLGraph is built.

In summary: a builder's output, i.e. an MLGraph is always bound to an MLContext, so it could as well be (part of ) an internal slot of MLContext, and also the builder could be an attribute of MLContext.
As noted previously, MLCommandEncoder could also be an attribute of MLContext.

Related to #302.

@wacky6
Copy link

wacky6 commented Jun 19, 2023

I think having explicit stages of a Graph's lifecycle is desirable because it matches more closely to what would happen in ML frameworks. If we can unify WebGPU and CPU API design, that's even better.

Perhaps CPU/GPU could both use a compile stage (conceptual just-in-time compile) to mitigate cold starts?

// Define the graph
const builder = MLGraphBuilder();
const input1 = builder.input('a', {type: 'float32', dimensions: [2,2]})
const input2 = builder.input('b', {type: 'float32', dimensions: [2,2]})
const output = builder.add(input1, input2)

// It's possible to compile the same graph on different contexts
const cpuContext = navigator.ml.createContext('cpu', {maxThreads: 4})
const gpuContext = navigator.ml.createContext('gpu', {webgpuAdapter: adapter})

// Build MLGraph<Context>, returned graph is ready for inferencing
const cpuGraph = await cpuContext.build({output})  // => MLGraphCpu
const gpuGraph = await gpuContext.build({output})  // => MLGraphGpu

// Generic compute method
await cpuGraph.compute(/* inputs */)
await gpuGraph.compute(/* inputs */)  // Maybe this should be replaced with WebGPU interop methods?

// WebGPU interop methods are only available on gpuGraph
await gpuGraph.encodeGpuCommand(/* inputs */)

Making builder part of MLContext is (probably?) necessary?, because some backend might not support the operation or arguments (e.g. XNNPack concat doesn't support arbitrary number of args)

Making graph an internal slot seems strange. I thought we want to allow reusing a MLContext for multiple graphs.

@anssiko
Copy link
Member

anssiko commented Jun 19, 2023

@wacky6 thanks for this proposal!

If I parsed this right, the proposal moves build() and compute() around (return semantics remain the same):

MLGraphBuilder.build() -> MLContext.build() // => Promise<MLGraph(Cpu|Gpu)>
MLContext.compute()    -> MLGraph.compute() // => Promise<MLComputeResult>

Other changes:

  • createContext() by device type explicitly
  • introduce MLGraphCpu : MLGraph and MLGraphGpu : MLGraph
  • compute() bound to MLGraph, thus no need to pass MLGraph(Cpu|Gpu) as an argument
  • Other things I missed :)

@wchao1115 would these design changes help with #322?

@huningxin for comments.

(The original issue was labeled as v2 due to a breaking change and that applies to this proposal too. But I'd like to solicit input on this because this proposal suggest many ideas worth discussing and intersects with other discussions.)

@zolkis
Copy link
Collaborator Author

zolkis commented Jun 20, 2023

@wacky6

Making graph an internal slot seems strange. I thought we want to allow reusing a MLContext for multiple graphs.

True, multiple graphs can be built with the same context+builder pair.
The thinking above IIRC was that the API context objects were "cheap", that connect to backends and represent a sort of "session", with given compute capabilities. In my understanding, already the building process can be compute-optimized, so I thought even the graph structures were already context+build specific. In that thinking, when I wanted a new graph, I instantiated a new context object (good question why not call that a graph to start with, owning or linking to context and builder). I could also imagine the graph being the main user facing object, i.e. creating a new graph with a given a context, then build it using context.builder, then use compute methods exposed on the graph (which may internally map to the ones in context), as you showed in the example.

It is up to the editors to judge, but to me the flow in your example code looks good -- if feasible. That seems to assume that we have a graph structure that can be built in a generic way by a builder, independently from underlying backend/compute capabilities (but still using the builder methods), and can be "instantiated" to a given context+builder pair by the build() method and then becomes an internal "optimized" or "operational" graph (MLGraph) that can be executed.

You mentioned that some ops might not be supported by some backends to start with, which would mean we cannot build generic graph structures. But AFAICT we could defer reporting that kind of error until the time when the graph is built. So I think your idea is feasible and gets a +1 from me, -- but I trust this decision to the editors and implementers.

@wacky6
Copy link

wacky6 commented Jun 21, 2023

From a pure implementation perspective, I'm leaning towards context-specific builder and context-specific graph.

  • Make early feature detection possible (fails as soon as graphBuilder method is called)
  • Resolves the issue on "how to handle ops that aren't supported by X backend"
  • Simple to implement, we can remove MLGraphBuilder abstraction and just expose MLXBackendGraphBuilder
  • Lifting common features to a generic builder (assume we can find a set of operations that are supported by all backend) is easier than polyfilling X op for Y backend

Whether the the spec should handle backend variations is a different question.

@wchao1115
Copy link
Collaborator

wchao1115 commented Jun 21, 2023

+1 on @wacky6's comment on frameworks needing to manage the device graph's life cycle. This is indeed very important for caching of graphs within a browser's session.

The key difference between your proposed sample code up to the build calls and the current design is that the proposed builder is context-independent so that a single builder session can be compiled into multiple device graphs. This is nice and I believe is what we considered at one point. However, the reality is that the process of constructing a graph is already context-dependent in most platform implementations. The implementation of a builder already requires access to the context state, and already needs to deal with context-specific resource domain (as evidently in the current API signature of one of the constant methods, which accepts a GPU buffer as a constant param to the graph). It is the reason the builder's ctor requires access to the context.

Once we accept that graph construction can't truly be context-independent, the choice of whether to hoist the build method off the builder or the context (e.g. builder.build(output) vs. context.build(output)) becomes a matter of taste.

As to why the builder shouldn't be an attribute of the context, it is firstly because it's a one-to-many relationship and not one-to-one i.e. a context is at an adapter scope (in the GPU environment) while a builder state is per context per model. Additionally, even for a one-to-one relationship between any two concepts, making an instance of one a property of another implies ownership and lifetime coupling, which should be avoided unless explicitly necessary.

Lastly, is this line a typo? The gpuGraph appears twice within the line, both as the host interface and a param to its method. I'm assuming one of them is meant to be a WebGPU command buffer.

// WebGPU interop methods are only available on gpuGraph
await gpuGraph.encodeGpuCommand(gpuGraph, /* inputs */)

The idea behind the current MLCommandEncoder interface is that it separates the process of encoding a set of commands (in a WebGPU command buffer) needed to initialize the device graph from another set of commands needed to execute it. The former only needs to happen once per graph while the latter happens every time an inference call is made on the graph.

The graph initialization stage is very important for performance in the case of the GPU (and NPU in the future) as it gives the underlying driver an opportunity to prepare the weight of the graph before its first execution. This is known as weight reordering or weight preprocessing steps. Depending on the different hardware platform and the evolution of its system software at the driver level, this preparation steps may also include techniques such as weight compression and/or sparsity treatment.

The command encoder interface facilitates the orderly construction of the set of commands necessary to complete these two separate operations, sometimes together and sometimes independently completed. And since the lifetime of the command encoder may be different from the context itself, it makes sense to allow it to be created off the context only as needed and destroyed independently of the context when an interop with WebGPU becomes necessary in the scenario.

@wacky6
Copy link

wacky6 commented Jun 22, 2023

As to why the builder shouldn't be an attribute of the context, it is firstly because it's a one-to-many relationship and not one-to-one i.e. a context is at an adapter scope (in the GPU environment) while a builder state is per context per model. Additionally, even for a one-to-one relationship between any two concepts, making an instance of one a property of another implies ownership and lifetime coupling, which should be avoided unless explicitly necessary.

Ack. I agree the 1:1 relationship part isn't ideal. I don't think WG has decided if MLGraphBuilder is stateless or stateful yet?

The reason I bring up making builder an attribute of MLContext is that this would allow early feature detection if the op argument can't be supported.

Say (an imaginary scenario), we are building for a backend that can only do 3x3 conv2d, we can immediately throw an Error if the provided filter isn't 3x3.

const xpuContext = navigator.ml.createContext('some_accelerator')
const builder = xpuContext.builder;
const input1 = builder.input('a', {type: 'float32', dimensions: [1, 16, 16, 3]})
const input2 = builder.input('b', {type: 'float32', dimensions: [4, 5, 5, 1]})
const imtermediate = builder.conv2d(input, filter) // -> This can immediately throw an Error
// Developer can decide whether to polyfill or do something else.

// The current approach is to throw an Error at graph build time.
builder.buildSync(output) // -> Error

The idea behind the current MLCommandEncoder interface is that it separates the process of encoding a set of commands (in a WebGPU command buffer) needed to initialize the device graph from another set of commands needed to execute it.

I think it's reasonable to expose multiple methods on gpuGraph.

gpuGraph.encodeInitializeCommand()
gpuGraph.encodeComputeCommand()
// Do we need a way to deallocate the resources?

Lastly, is this line a typo? The gpuGraph appears twice within the line, both as the host interface and a param to its method. I'm assuming one of them is meant to be a WebGPU command buffer.

Yep. My typo, I fixed the example.

@zolkis
Copy link
Collaborator Author

zolkis commented Mar 25, 2024

As the discussion has become more specific and moved to the issues mentioned above, closing this.

@zolkis zolkis closed this as completed Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants