Skip to content

Feature request: dense-sparse-dense bilinear #137

@Linux-cpp-lisp

Description

@Linux-cpp-lisp

General bilinear operations take the form x^T @ A @ y where A is of shape (outdim, xdim, ydim). (See https://pytorch.org/docs/stable/generated/torch.nn.Bilinear.html).

Currently, as far as I can tell, implementing this for a sparse A using pytorch_sparse is only possible as a sparse-dense matmul of A with the outer product of x and y. This outer product is a huge memory cost and, if A is sparse, contains many unnecessary computations.

It seems to me like supporting this computation directly would not require dramatic changes to the existing CPU and CUDA matmul code — seemingly just a second outer loop over x in addition to y. But I'm not really familiar enough with the code to see if that's true.

Ideally, like existing sparse-dense multiplication, it would support a batch dimension in x and y:

x=[batch, dx], A=[dx, dy, dout], y=[batch, dy] --> [batch, dout]

I would be happy to help with implementing this if you think it is doable and something you'd be interested in having in the library!

Activity

rusty1s

rusty1s commented on May 17, 2021

@rusty1s
Owner

That's interesting. I think one current limitation is that SparseTensor needs to be two-dimensional. As such, I'm not really sure if we can directly re-use CPU and CUDA kernels for this. WDYT?

Linux-cpp-lisp

Linux-cpp-lisp commented on May 20, 2021

@Linux-cpp-lisp
Author

Hm right — I think all such bilinearities can be expressed as a linear operation between a linear operator A and the outer product of x and y, which can be expressed as a matmul between a flattened version of A and the outer product. As a result, I think the sparse tensor A can remain a 2D matrix, but that you'd need the compute kernel to generate the appropriate elements of the outer product on-the-fly.

A quick look at the CPU kernel (https://github.com/rusty1s/pytorch_sparse/blob/master/csrc/cpu/spmm_cpu.cpp#L84) makes it seem like you're already doing random access on the dense matrix, so extending that to random access on two dense tensors doesn't seem like a big deal. OTOH, I don't know any CUDA and have no idea what that looks like in the CUDA kernels, so it may be much more complicated than for the CPU kernel.

I'm trying a lot of very different approaches to accelerating this strange computation — my A is very sparse but also has a lot of repeated structure, so its not obvious that explicitly storing it in COO format is actually the best way to go — but can't rule anything out either 😄

added a commit that references this issue on Apr 26, 2022
added a commit that references this issue on Aug 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @Linux-cpp-lisp@rusty1s

        Issue actions

          Feature request: dense-sparse-dense bilinear · Issue #137 · rusty1s/pytorch_sparse