Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-thread ggml_cpy() #782

Closed
ggerganov opened this issue Apr 5, 2023 · 5 comments
Closed

Multi-thread ggml_cpy() #782

ggerganov opened this issue Apr 5, 2023 · 5 comments
Labels
enhancement New feature or request good first issue Good for newcomers performance Speed related topics

Comments

@ggerganov
Copy link
Member

This is a task suitable for new contributors

See how we multi-threaded the ggml_rope() operator.
Do the same for the ggml_cpy() operator and see if there is any benefit.

Use the ggml profiler (GGML_PERF) to measure the benefit of multi-threaded vs non-multi-threaded ggml_cpy()

@ggerganov ggerganov added enhancement New feature or request good first issue Good for newcomers performance Speed related topics labels Apr 5, 2023
@Fs77X
Copy link

Fs77X commented Apr 6, 2023

Hi, this is my first contribution to a large project so forgive me for being a newbie! I attempted to multithread cpy following the code from the rope commit but I started getting garbage output given an initial prompt. Would appreciate any guidance on what I'm doing wrong! Fs77X@3c8a304

@ggerganov
Copy link
Member Author

So I have updated ggml_cpy() in the latest commit (c3ac702) since I backported some changes from whisper.cpp - we can significantly improve the performance when dst is contiguous. The code is currently quite messy, but I think it can be easily simplified in the future.

While at it, I tried to multi-thread it and didn't observed any measurable improvements, so I guess there is no point in multi-threading it.

I will close this issue now

@slaren
Copy link
Member

slaren commented Apr 17, 2023

@ggerganov Do you still have your multi-threaded implementation of ggml_cpy? I would like to build on that to parallelize quantization in ggml_cpy for LoRA.

@ggerganov
Copy link
Member Author

No, but it was pretty much the same as: https://github.com/ggerganov/llama.cpp/pull/824/files
Just use some shorter variable names like:

    const int ith = params->ith;
    const int nth = params->nth;

    int ir = 0;

@shkim-emily
Copy link

@ggerganov Hello, Does [ggml profiler (GGML_PERF)] not support now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers performance Speed related topics
Projects
None yet
Development

No branches or pull requests

4 participants