Build and execute our own computation graph #137

philpax · 2023-04-13T11:05:44Z

At present, we are using GGML's computation graph. This works well, but it has a few flaws:

We're reliant on whatever support GGML has for threading; the Rust threading ecosystem is more versatile/OS-agnostic
Adding new operations requires patching GGML
We're coupled pretty tightly to GGML, so switching to an alternate backend would be quite difficult; this will only get worse as we support more models
Abstraction of shared pieces of functionality gets a little finicky with the exposed API

After reading ggerganov/llama.cpp#915, I had a flash of inspiration and realised we could address these problems by using our own computation graph.

The code would be fairly similar to what it is now - but instead of building up a GGML computation graph, we build up our own in Rust code with all of the usual strong-typing guarantees.

To begin with, this computation graph would then be "compiled" to a GGML computation graph, so that it works identically.

Once that's done, we would look at reimplementing the actual execution of the graph in Rust code and using GGML's operations to do so (e.g. we use its vec_dot_q4_0, etc).

This would allow us to decouple from GGML in the future (#3), and gives us freedom to implement new operations that aren't supported by GGML without having to maintain our own patched version.

Ideally, we would just use burn or something similar directly, but none of the existing libraries are in a position to serve our needs (GGML-like performance with quantization support). This lets us side-step that issue for now, and focus on describing models that could be executed by anything once support is available.

Constructing our own computation graph and compiling it to GGML should be fairly simple (this could be done with petgraph or our own graph implementation, it's not that difficult).

The main problem comes in the executor reimplementation - a lot of GGML's more complex operations are coupled to the executor, so we'd have to reimplement them (e.g. all the ggml_compute_forward_... functions). Additionally, a lot of the base operations are static void and not exposed to the outside world, so it's likely we'd have to patch GGML anyway.

An alternate approach to full graph reimplementation might be to add support for custom elementwise operations once (as @KerfuffleV2 has done in their fork), so that we can polyfill custom operations from our computation graph.

The text was updated successfully, but these errors were encountered:

KerfuffleV2 · 2023-04-14T10:22:29Z

I think this is a great a idea. Also, it's probably even more of a reason to decouple llama-rs from the GGML crates, and I would think what you're talking about also should be its own crate. (Using "crate" pretty much interchangeably with "repo" here.)

You'd also be able to do something like I mentioned in #130.

This would allow us to decouple from GGML in the future (#3), and gives us freedom to implement new operations that aren't supported by GGML without having to maintain our own patched version.

It looks like my mapping operations stuff is likely to get merged ( ggerganov/llama.cpp#874 ), so at least for operations that work with unary/binary mapping it won't be necessary to do that. Maybe the only other thing missing would be fold or 3d operations (not sure what would even need the latter). You could emulate a fold (albeit inefficiently) using map + something like statics.

KerfuffleV2 · 2023-04-16T15:28:24Z

I found this crate which looks pretty interesting: https://crates.io/crates/dagga

It's for scheduling directed acyclic graphs (like GGML's graph, and I assume other ML type graphs would be similar). You can do stuff like give the nodes semantics reflecting uses of resources, borrowing, dependencies, etc.

If nothing else, it might be useful for stealing ideas.

9876691 · 2023-05-19T07:49:37Z

Is using Onnx runtime an option here?

There's a rust binding here https://github.com/microsoft/onnxruntime/tree/main/rust

The compute graph is basically formed from a protobuf definition. So using a rust protoc compiler you would get a bunch of rust structs auto generated. Then at runtime put the structs together to the compute graph and pass it to the runtime.

As far as I can see onnx runtime supports

Quantization https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html
CPU https://onnxruntime.ai/docs/execution-providers/
GPU

It would perhaps be possible in the future to swap in the Wonnx rust version https://github.com/webonnx/wonnx

philpax · 2023-05-20T22:34:36Z

We're already in talks with wonnx to see if we can use them as a computation backend: webonnx/wonnx#169

As for using onnxruntime directly... I don't know. Maybe, but we'd like to avoid having to synthesize an entire ONNX graph at runtime, especially as ONNX is quite an intricate format and has lots of details we don't care about.

9876691 · 2023-05-26T13:57:03Z

For reference there's some ongoing work in ggml for graph support ggerganov/ggml#108

These are initial steps towards GPU support via computation graph export.
Still figuring out the basics needed. Playing with the mnist example

philpax added issue:enhancement New feature or request meta:maintenance Changes that will make it easier for us to maintain code labels Apr 13, 2023

This was referenced Apr 13, 2023

Change Rust-side GGML view ops to take element rather than byte arguments. #128

Closed

WIP: Bloom Inference #85

Closed

philpax referenced this issue Apr 27, 2023

Sync to GGML version as of 2023-04-23 02:17 UTC

6b2765d

philpax mentioned this issue May 10, 2023

Non-ggml backend #31

Open

philpax mentioned this issue Jun 13, 2023

Support WebGPU acceleration? #312

Open

philpax added the topic:backend-support Support for alternate non-GGML backends, or for particular GGML backend features label Jun 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build and execute our own computation graph #137

Build and execute our own computation graph #137

philpax commented Apr 13, 2023

KerfuffleV2 commented Apr 14, 2023 •

edited

Loading

KerfuffleV2 commented Apr 16, 2023

9876691 commented May 19, 2023

philpax commented May 20, 2023

9876691 commented May 26, 2023

Build and execute our own computation graph #137

Build and execute our own computation graph #137

Comments

philpax commented Apr 13, 2023

KerfuffleV2 commented Apr 14, 2023 • edited Loading

KerfuffleV2 commented Apr 16, 2023

9876691 commented May 19, 2023

philpax commented May 20, 2023

9876691 commented May 26, 2023

KerfuffleV2 commented Apr 14, 2023 •

edited

Loading