Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use latest ggml from upstream #177

Merged
merged 13 commits into from
Jul 28, 2024
Merged

Conversation

MollySophia
Copy link
Contributor

@MollySophia MollySophia commented Jul 16, 2024

What's changed:

  • Correctly compile and inference with latest ggml.
  • Regenerated expected-logits-*.bin using old code base, with GGML_SILU_FP16 turned off. (Upstream ggml uses FP32 for SiLU now, while the old one uses FP16 for SiLU by default, which causes some logits difference in tests.)
  • Slightly refactored CMakeLists.txt according to llama.cpp, and added option to enable Metal backend.
  • Changed the way of offloading layers to gpu using the new ggml backends and scheduler.
  • Use completely unmodified ggml submodule.
  • Disabled thread sanitizer. It's broken in llama.cpp too.

TODOs:

  • Test basic inference
  • Test gpu offloading with Apple Metal
  • Test gpu offloading with CUDA (And possibily fix any problems)
  • Test gpu offloading with other backends (e.g HIP, OpenBLAS) (May not be possible myself :P)
  • Update docs and README.md

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
It's broken in upstream llama.cpp too :P

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
...until ggml GroupNorm has the eps parameter

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
@MollySophia
Copy link
Contributor Author

Update: offloading with CUDA doesn't work yet
I'm working on it

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
@LaylBongers LaylBongers merged commit d622368 into RWKV:master Jul 28, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants