-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release v2.1.2 #209
Release v2.1.2 #209
Conversation
amakropoulos
commented
Aug 16, 2024
•
edited
Loading
edited
- closes Hot-swap LoRA with updated llama.cpp #212
- closes The editor crashes when exiting playmode while it is creating the LLM service #171
I was trying to check the adapters work using the test gguf files from llama.cpp (generated by running test-lora-conversion-inference.sh, or you can find the gguf files directly here). These models are overfitted to return the same sentence for the same initial word, but I am struggling to make them work in branch "<|user|>\nHello<|end|>\n<|assistant|>\n" Instead of (as in llama.cpp tests) "<bos>Hello" Would it be helpful if I train similarly small overfitted models to test different adapters respond correctly in a chat? Or is there a mode where the user input is not sent within a chat template? |
Yes, you can use |
Nice, I tested the adapter is working correctly! I am planning to test out what happens when multiple adapters are loaded, because in that case one probably should use the param Not sure what happens now if two LLM use same base but different adapters. Do two different servers spin-up? Have you already looked into this? |
I tired this branch in Unity via github URL and it loads Llama3.1 and Gemma models fine, but only in CPU mode. Using cuda via numGPULayers variable crashes Unity/Editor for me right now, whereas the asset store version does not. Using latest Unity 6 preview 15f1, Win10. I tried running without and with full library installed via Extras button.. |
@ElevenGameStudios thanks for sending. |
You can use multiple adapters at the same time, they are all initialised with scale 1.
Yes each different LLM object starts a new LLM server. |
|
closing in favor of #220 because it is not a minor release anymore :) |