-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API simplification: context owns builder, graph becomes internal slot #303
Comments
I think having explicit stages of a Graph's lifecycle is desirable because it matches more closely to what would happen in ML frameworks. If we can unify WebGPU and CPU API design, that's even better. Perhaps CPU/GPU could both use a compile stage (conceptual just-in-time compile) to mitigate cold starts? // Define the graph
const builder = MLGraphBuilder();
const input1 = builder.input('a', {type: 'float32', dimensions: [2,2]})
const input2 = builder.input('b', {type: 'float32', dimensions: [2,2]})
const output = builder.add(input1, input2)
// It's possible to compile the same graph on different contexts
const cpuContext = navigator.ml.createContext('cpu', {maxThreads: 4})
const gpuContext = navigator.ml.createContext('gpu', {webgpuAdapter: adapter})
// Build MLGraph<Context>, returned graph is ready for inferencing
const cpuGraph = await cpuContext.build({output}) // => MLGraphCpu
const gpuGraph = await gpuContext.build({output}) // => MLGraphGpu
// Generic compute method
await cpuGraph.compute(/* inputs */)
await gpuGraph.compute(/* inputs */) // Maybe this should be replaced with WebGPU interop methods?
// WebGPU interop methods are only available on gpuGraph
await gpuGraph.encodeGpuCommand(/* inputs */) Making builder part of MLContext is (probably?) necessary?, because some backend might not support the operation or arguments (e.g. XNNPack concat doesn't support arbitrary number of args) Making graph an internal slot seems strange. I thought we want to allow reusing a MLContext for multiple graphs. |
@wacky6 thanks for this proposal! If I parsed this right, the proposal moves MLGraphBuilder.build() -> MLContext.build() // => Promise<MLGraph(Cpu|Gpu)>
MLContext.compute() -> MLGraph.compute() // => Promise<MLComputeResult> Other changes:
@wchao1115 would these design changes help with #322? @huningxin for comments. (The original issue was labeled as v2 due to a breaking change and that applies to this proposal too. But I'd like to solicit input on this because this proposal suggest many ideas worth discussing and intersects with other discussions.) |
True, multiple graphs can be built with the same context+builder pair. It is up to the editors to judge, but to me the flow in your example code looks good -- if feasible. That seems to assume that we have a graph structure that can be built in a generic way by a builder, independently from underlying backend/compute capabilities (but still using the builder methods), and can be "instantiated" to a given context+builder pair by the You mentioned that some ops might not be supported by some backends to start with, which would mean we cannot build generic graph structures. But AFAICT we could defer reporting that kind of error until the time when the graph is built. So I think your idea is feasible and gets a +1 from me, -- but I trust this decision to the editors and implementers. |
From a pure implementation perspective, I'm leaning towards context-specific builder and context-specific graph.
Whether the the spec should handle backend variations is a different question. |
+1 on @wacky6's comment on frameworks needing to manage the device graph's life cycle. This is indeed very important for caching of graphs within a browser's session. The key difference between your proposed sample code up to the Once we accept that graph construction can't truly be context-independent, the choice of whether to hoist the As to why the builder shouldn't be an attribute of the context, it is firstly because it's a one-to-many relationship and not one-to-one i.e. a context is at an adapter scope (in the GPU environment) while a builder state is per context per model. Additionally, even for a one-to-one relationship between any two concepts, making an instance of one a property of another implies ownership and lifetime coupling, which should be avoided unless explicitly necessary. Lastly, is this line a typo? The // WebGPU interop methods are only available on gpuGraph
await gpuGraph.encodeGpuCommand(gpuGraph, /* inputs */) The idea behind the current The graph initialization stage is very important for performance in the case of the GPU (and NPU in the future) as it gives the underlying driver an opportunity to prepare the weight of the graph before its first execution. This is known as weight reordering or weight preprocessing steps. Depending on the different hardware platform and the evolution of its system software at the driver level, this preparation steps may also include techniques such as weight compression and/or sparsity treatment. The command encoder interface facilitates the orderly construction of the set of commands necessary to complete these two separate operations, sometimes together and sometimes independently completed. And since the lifetime of the command encoder may be different from the context itself, it makes sense to allow it to be created off the context only as needed and destroyed independently of the context when an interop with WebGPU becomes necessary in the scenario. |
Ack. I agree the 1:1 relationship part isn't ideal. I don't think WG has decided if MLGraphBuilder is stateless or stateful yet? The reason I bring up making Say (an imaginary scenario), we are building for a backend that can only do 3x3 conv2d, we can immediately throw an Error if the provided filter isn't 3x3. const xpuContext = navigator.ml.createContext('some_accelerator')
const builder = xpuContext.builder;
const input1 = builder.input('a', {type: 'float32', dimensions: [1, 16, 16, 3]})
const input2 = builder.input('b', {type: 'float32', dimensions: [4, 5, 5, 1]})
const imtermediate = builder.conv2d(input, filter) // -> This can immediately throw an Error
// Developer can decide whether to polyfill or do something else.
// The current approach is to throw an Error at graph build time.
builder.buildSync(output) // -> Error
I think it's reasonable to expose multiple methods on gpuGraph. gpuGraph.encodeInitializeCommand()
gpuGraph.encodeComputeCommand()
// Do we need a way to deallocate the resources?
Yep. My typo, I fixed the example. |
As the discussion has become more specific and moved to the issues mentioned above, closing this. |
Lifted from #298 comment.
Proposal: include graph builder, command encoder as attributes to
MLContext
, and makeMLGraph
an internal slot ofMLContext
This would simplify things a great deal and would make things more consistent: avoid exposing the empty MLGraph interface in the spec, and using it as internal slot would allow differentiation of how it can be used in different context types and also manage its lifecycle.
Rationale for change
The spec contains certain constraints that are hard to describe and enforce via algorithms. For instance, the note from the [MLContext section(https://webmachinelearning.github.io/webnn/#api-mlcontext):
Or, from the MLCommandEncoder section,
To achieve that, there should be a possibility of binding MLGraph and MLCommandEncoder to an MLContext of type "webgpu".
Therefore I would add an internal slot
[[model]]
toMLContext
that represents a compute graph bound to a context. If that context is of type "webgpu", then it will haveMLCommandEncoder
-specific initialization, dispatch and finish (e.g. theMLCommandEncoder
interface could be exposed in the context as an attribute).Also, the discussion in #149 reveals a possible use case of discerning between a compute graph (as built by a builder, that could be executed in a multiple contexts) and a graph that is initialized for a given context for execution.
A builder could be generic (initialized without a context) or bound to a context (when adaptation to the context could already happen during the build).
A context-bound builder's
build()
producesMLGraph
that is already bound to that context.A generic builder's
build()
could have a parameter for context for which theMLGraph
is built.In summary: a builder's output, i.e. an
MLGraph
is always bound to anMLContext
, so it could as well be (part of ) an internal slot ofMLContext
, and also the builder could be an attribute ofMLContext
.As noted previously,
MLCommandEncoder
could also be an attribute ofMLContext
.Related to #302.
The text was updated successfully, but these errors were encountered: