Release v2.1.2 #209

amakropoulos · 2024-08-16T14:14:48Z

ltoniazzi · 2024-08-21T08:19:09Z

I was trying to check the adapters work using the test gguf files from llama.cpp (generated by running test-lora-conversion-inference.sh, or you can find the gguf files directly here).

These models are overfitted to return the same sentence for the same initial word, but I am struggling to make them work in branch release/v2.1.2. I think it's because the prompt is templated as a chat, so a user saying Hello sends

"<|user|>\nHello<|end|>\n<|assistant|>\n"

Instead of (as in llama.cpp tests)

"<bos>Hello"

Would it be helpful if I train similarly small overfitted models to test different adapters respond correctly in a chat?
Then they can be used to test the hot-swapping as well.

Or is there a mode where the user input is not sent within a chat template?

amakropoulos · 2024-08-21T08:33:56Z

Yes, you can use Complete("<bos>Hello") instead that doesn't use template.

ltoniazzi · 2024-08-22T12:32:04Z

Yes, you can use Complete("Hello") instead that doesn't use template.

Nice, I tested the adapter is working correctly!

I am planning to test out what happens when multiple adapters are loaded, because in that case one probably should use the param --lora-init-without-apply when spinning up the server for the same base model, to be able to swap between adapters.

Not sure what happens now if two LLM use same base but different adapters. Do two different servers spin-up? Have you already looked into this?

ElevenGameStudios · 2024-08-22T21:04:45Z

I tired this branch in Unity via github URL and it loads Llama3.1 and Gemma models fine, but only in CPU mode. Using cuda via numGPULayers variable crashes Unity/Editor for me right now, whereas the asset store version does not. Using latest Unity 6 preview 15f1, Win10. I tried running without and with full library installed via Extras button..
Just wanted to let you know. Thanks for this llama.cpp Unity port, overall it works quite nice.

amakropoulos · 2024-08-23T07:07:54Z

@ElevenGameStudios thanks for sending.
What GPU do you have?
Could you send me the Editor.log file when you run the scene and crashes?
It would be very helpful if you could join the Discord channel to send you some other build to try.

amakropoulos · 2024-08-23T08:00:16Z

Yes, you can use Complete("Hello") instead that doesn't use template.

Nice, I tested the adapter is working correctly!

I am planning to test out what happens when multiple adapters are loaded, because in that case one probably should use the param --lora-init-without-apply when spinning up the server for the same base model, to be able to swap between adapters.

You can use multiple adapters at the same time, they are all initialised with scale 1.
Then you can use the SetLoraScale function to adjust the scale how you want.
I'm adapting the code to make it possible to set weights before the LLM starts

Not sure what happens now if two LLM use same base but different adapters. Do two different servers spin-up? Have you already looked into this?

Yes each different LLM object starts a new LLM server.

ElevenGameStudios · 2024-08-23T13:20:06Z

@ElevenGameStudios thanks for sending. What GPU do you have? Could you send me the Editor.log file when you run the scene and crashes?
I have a Nvidia 3070 RTX 8GB. It seems to crash before the point the "Using architecture: ..." log is sent. The 2.1.1. version runs cuda just fine.
I joined the discord, will send the Editor log file there.

amakropoulos · 2024-08-26T10:51:48Z

closing in favor of #220 because it is not a minor release anymore :)

amakropoulos and others added 19 commits August 16, 2024 17:14

fix set template for remote setup

91f74ba

update changelogs

897fbe6

Merge 897fbe6 into bccec86

064f29f

update VERSION

a9ca159

bump LlamaLib to v1.1.8

df32009

add embedding functionality

afe2752

import embeddings and lora adapters

5ee7930

add structs for embeddings and lora adapters

4a80c56

allow multiple loras

5e8d05c

implement callback functionaloty for embeddings and lora adapters

dfb0e6a

fix for lora splitting

3dc59af

update to latest LlamaLib, add embedding test

7c85dc4

update changelogs

5c4c04d

update changelogs

3ab6cb4

Read context length and warn if it is very large

b88e13b

update changelogs

56ec876

remove debug message

9fdd585

add Llama 3.1 and Gemma2 models

501577e

add Gemma chat template

ff1f300

ltoniazzi mentioned this pull request Aug 19, 2024

Hot-swap LoRA with updated llama.cpp #212

Closed

5 tasks

amakropoulos and others added 3 commits August 20, 2024 17:56

fix crash when stopping scene before LLM creation

e8cb44f

update changelogs

f99ee2f

Point to gguf format for lora

225957d

amakropoulos added 5 commits August 21, 2024 15:38

move around code

cf2f1b7

improve library setup and include build for extras

d19e7e0

use full library for cuda if set

41cea64

include full build if set, remove setup dir

1608ff1

add flash attention argument

247c8eb

amakropoulos and others added 2 commits August 22, 2024 12:46

set template before starting server

9128847

update changelogs

72e24dd

amakropoulos and others added 20 commits August 26, 2024 08:51

modify how to help

51d27b2

add relative path function, expose create empty file

34c300f

explicitly specify editor and runtime asset management

5a086d5

tests for explicit editor and runtime asset management

f4112a1

add full path function

59494fd

use the LLMUnitySetup.GetFullPath function

d9cd26e

implement lora manager to allow easy setup and switch

13f5eae

la

77ee950

fixes to lora functions, update when changed

61da5f6

shorten lora string assignment

96a3b78

add lora assignment tests

4c31871

add lora test

920e4a1

update changelogs

38c5d8b

load manager strings at start

675f02c

fix contains and fromStrings

61eaaaa

bump LlamaLib to v1.1.9

38ceb0b

rename SetLoraScale to SetLoraWeight

eb73824

add lora weights to readme

70122d8

update changelogs

f8eef51

add AI Speak game

f68a166

amakropoulos closed this Aug 26, 2024

amakropoulos deleted the release/v2.1.2 branch November 8, 2024 16:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release v2.1.2 #209

Release v2.1.2 #209

amakropoulos commented Aug 16, 2024 •

edited

Loading

ltoniazzi commented Aug 21, 2024 •

edited

Loading

amakropoulos commented Aug 21, 2024

ltoniazzi commented Aug 22, 2024

ElevenGameStudios commented Aug 22, 2024 •

edited

Loading

amakropoulos commented Aug 23, 2024

amakropoulos commented Aug 23, 2024

ElevenGameStudios commented Aug 23, 2024

amakropoulos commented Aug 26, 2024

Release v2.1.2 #209

Release v2.1.2 #209

Conversation

amakropoulos commented Aug 16, 2024 • edited Loading

ltoniazzi commented Aug 21, 2024 • edited Loading

amakropoulos commented Aug 21, 2024

ltoniazzi commented Aug 22, 2024

ElevenGameStudios commented Aug 22, 2024 • edited Loading

amakropoulos commented Aug 23, 2024

amakropoulos commented Aug 23, 2024

ElevenGameStudios commented Aug 23, 2024

amakropoulos commented Aug 26, 2024

amakropoulos commented Aug 16, 2024 •

edited

Loading

ltoniazzi commented Aug 21, 2024 •

edited

Loading

ElevenGameStudios commented Aug 22, 2024 •

edited

Loading