Skip to content
This repository has been archived by the owner on Jun 24, 2024. It is now read-only.

Doing math with Tensor can be a lot more ergonomic #130

Closed
KerfuffleV2 opened this issue Apr 12, 2023 · 4 comments
Closed

Doing math with Tensor can be a lot more ergonomic #130

KerfuffleV2 opened this issue Apr 12, 2023 · 4 comments

Comments

@KerfuffleV2
Copy link
Contributor

It seems pretty straightforward to implement std::ops::{Add, Div, Mul}, etc.

impl Add for &Tensor {
    type Output = Tensor;

    fn add(self, rhs: Self) -> Self::Output {
        Context {
            ptr: self.ctx.upgrade().expect("Couldn't get context!"),
        }
        .op_add(self, rhs)
    }
}

It's a little awkward with the vanilla version of GGML though since it doesn't even support subtraction or division so it kind of needs my map ops stuff: KerfuffleV2@7e2d3d0

I repurposed ^ (XOR) for mat_mul but it could be something else like & or |. I'm not really sure what operator makes the most sense here.

Just for example, code can go from:

impl LayerNorm {
    pub fn norm_ops(&self, ctx: &Context, x: &Tensor) -> Tensor {
        ctx.op_add(&ctx.op_mul(&ctx.op_norm(x), &self.weight), &self.bias)
    }
}

impl Mix {
    pub fn mix_ops(&self, ctx: &Context, x: &Tensor, last_x: &Tensor) -> Tensor {
        ctx.op_add(
            &ctx.op_mul(x, &self.0),
            &ctx.op_mul(last_x, &map_ops::one_minus(ctx, &self.0)),
        )
    }
}

to

impl LayerNorm {
    pub fn norm_ops(&self, ctx: &Context, x: &Tensor) -> Tensor {
        &ctx.op_norm(x) * &self.weight + self.bias.share()
    }
}

impl Mix {
    pub fn mix_ops(&self, ctx: &Context, x: &Tensor, last_x: &Tensor) -> Tensor {
        x * &self.0 + last_x * &map_ops::one_minus(ctx, &self.0)
    }
}

It's actually really convenient that every Tensor has an embedded Context otherwise this wouldn't be possible. As far as I know, the temporary upgrade should be safe (obviously it'll die horribly if you try to perform operations with a dead Context.)

Note: This isn't quite as straightforward as it may appear for the operations that aren't currently supported because my OP_MAP_UNARY|BINARY ops currently only support f32 values. So you couldn't use those to divide/subtract quantized values currently.

@KerfuffleV2
Copy link
Contributor Author

Need to do more investigation to make sure the temporary Context approach is safe. Seems like the pointers aren't valid when hitting C code currently.

@setzer22
Copy link
Collaborator

I'm a bit hesitant about this.

On one hand, it reads much better, but on the other hand the ownership semantics for GGML stuff are already quite tricky. So making things more implicit can obscure important details about who owns the tensor data.

When you call ctx.ops_add(a, b), this will implicitly allocate new tensor data inside of ctx to contain the result. However, a and b are not necessarily allocated inside ctx. This is why, when doing operations, we check if the context for the tensor is still alive using the Weak pointers. This can catch some mistakes, but it's actually not enough. The current API is unsound (and we're aware of it), because data for a tensor may be accessed even after the operation function is executed (ggml builds a computation graph, and only executes it when you tell it to execute at the end).

If we were to model this properly, a tensor operation would need to intertwine the lifetimes of ctx, and the contexts that allocated a and b (ctx.a, and ctx.b, which may or may not be ctx). The correct relation is that a.ctx and b.ctx must outlive ctx. Because otherwise ctx will happily dereference freed pointers when running the computation graph that references a and b.

Changes to improve the soundness of the API are welcome, but so far making fully sound and idiomatic bindings to GGML hasn't been a priority.

This is not just a theoretical concern. Right now, we already have a situation where the context that stores the model's weights and the temporary context used for an inference step are different. It's not a problem because the temporary context is temporary, and the weights for the model have to be alive for as long as the temporary context is.

Anyway, what I meant to say about all this, is that the problem of having tensors use their internal context pointer to do anything other than validation is that the order of operations changes the owner context. Writing a + b and b + a may allocate memory in different contexts and this may have safety implications if one of the two contexts is short-lived.

@KerfuffleV2
Copy link
Contributor Author

That's fair (and I'm not sure if you noticed but I actually closed this myself already).

Maybe a better approach would be to have Tensor not be a real GGML tensor. Then it could be used to build the graph and a separate step could transform it into actual tensors. The first part wouldn't actually need to do anything except model the relationship between tensors.

Anyway, I need to do more research on this but I think there definitely should be a way to make building the graph much more ergonomic.

It also could be done in a way that's not a breaking change by just adding a type that those operations could be performed on.

@setzer22
Copy link
Collaborator

Anyway, I need to do more research on this but I think there definitely should be a way to make building the graph much more ergonomic.

Agreed! Changes in this direction would be more than welcome :) Building a Rust API that ends up "compiling down to" GGML operations in a safe way sounds like a good abstraction.

Good news is, it will be easy to verify that the changes introduce no regressions since we have the existing code to validate against.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants