Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Native Convolution Support #907

Open
airMeng opened this issue Aug 1, 2024 · 5 comments
Open

[Enhancement] Native Convolution Support #907

airMeng opened this issue Aug 1, 2024 · 5 comments

Comments

@airMeng
Copy link
Contributor

airMeng commented Aug 1, 2024

hi @ggerganov @slaren, current ggml_conv_2d is implemented only in img2col algorithms, which is quite slow for most of the cases and will introduce extra memory consumption*
* source: https://github.com/leejet/stable-diffusion.cpp/blob/4a6e36edc586779918535e12b4fbe0583044ee6f/README.md#L57

Shall I made the the following changes and leave the convolution implementation to the each backend which I believe there will be much more efficient implementation themselves:

struct ggml_tensor * ggml_conv_2d(
        struct ggml_context * ctx,
        struct ggml_tensor  * a,
        struct ggml_tensor  * b,
        int                  s0,
        int                  s1,
        int                  p0,
        int                  p1,
        int                  d0,
        int                  d1) {
-    struct ggml_tensor * im2col = ggml_im2col(ctx, a, b, s0, s1, p0, p1, d0, d1, true, GGML_TYPE_F16); // [N, OH, OW, IC * KH * KW]
-
-    struct ggml_tensor * result =
-        ggml_mul_mat(ctx,
-                ggml_reshape_2d(ctx, im2col, im2col->ne[0],  im2col->ne[3] * im2col->ne[2] * im2col->ne[1]), // [N, OH, OW, IC * KH * KW] => [N*OH*OW, IC * KH * KW]
-                ggml_reshape_2d(ctx, a, (a->ne[0] * a->ne[1] * a->ne[2]),  a->ne[3]));                       // [OC,IC, KH, KW] => [OC, IC * KH * KW]
-
-    result = ggml_reshape_4d(ctx, result, im2col->ne[1], im2col->ne[2], im2col->ne[3], a->ne[3]); // [OC, N, OH, OW]
-    result = ggml_cont(ctx, ggml_permute(ctx, result, 0, 1, 3, 2)); // [N, OC, OH, OW]

+    ne = ...
+    struct ggml_tensor * result = ggml_new_tensor(ctx, a->dtype, 4, ne);
+    int32_t params[] = { s0, s1, p0, p1, d0, d1};
+    ggml_set_op_params(result, params, sizeof(params));
+    result->op = GGML_OP_CONV_2D;
+    result->grad = is_node ? ggml_dup_tensor(ctx, result) : NULL;
+    result->src[0] = a;
+    result->src[1] = b;

    return result;
}

@luoyu-intel cc our performance expert for awareness

@slaren
Copy link
Collaborator

slaren commented Aug 1, 2024

It would be good to allow backends to implement convolutions without im2col, but if you make this change as you are suggesting you are going to break conv2d support in every backend. Do you intend to fix the implementations in the backends as well?

@airMeng
Copy link
Contributor Author

airMeng commented Aug 1, 2024

yes, I will split it into several PRs, the first PR will update the ggml_conv_2d as the above and keep the current im2col implementation in each backend itself, does it make sense?

@slaren
Copy link
Collaborator

slaren commented Aug 1, 2024

Sure, that would be good, but it may require more changes than you expect. For instance, the Metal backend does not have an internal memory pool to allocate memory for the im2col, and the CUDA backend may require significant changes to adapt ggml_cuda_mul_mat to work with a temporary buffer allocated from the pool.

@airMeng
Copy link
Contributor Author

airMeng commented Aug 5, 2024

Okay it takes me a weekend to try it on the Metal backend and find it really a lot of changes. Can I only dispatch on the SYCL backend? It will introduce additional macros in the common files

@slaren
Copy link
Collaborator

slaren commented Aug 5, 2024

Until we have a better way to do this, you could add a new function and let the applications choose which one to use depending on the backend they are using. This cannot be done in the common ggml files since ggml does not know what backend will be used at the time ggml_conv_2d is called.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants