Add copilot server example #23

chenhunghan · 2023-09-13T15:07:40Z

This PR adds an example HTTP server wrapping exllamav2, which can be used as the server replacing Github Copilot backend.

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

19h · 2023-09-14T12:20:13Z

Wow, this is pretty cool 👍

KaruroChori · 2023-09-15T00:47:04Z

I am having issues with cache = ExLlamaV2Cache(model) failing because of

File "/app/exllamav2/cache.py", line 25, in __init__
    p_key_states = torch.zeros(self.batch_size, self.max_seq_len, num_key_value_heads, head_dim, dtype = torch.float16, device = self.model.cache_map[i])
KeyError: 0

I made some minor modifications to the code so that the model is not downloaded from huggingface, and changed the path to match the one I used for the docker config I added. Still, I don't think this was the cause.
The model used is a 13B llama v2 which is working fine with chat.py.

Was anyone else successful with the original sourcecode in this PR?

KaruroChori · 2023-09-15T22:06:45Z

I found the issue. You code was missing

model.load()

before cache is reserved.
I need to test if the rest is working, but at least now it does not halt.

chenhunghan · 2023-09-16T08:22:20Z

I found the issue. You code was missing
model.load()
before cache is reserved. I need to test if the rest is working, but at least now it does not halt.

Thank you :)

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

chenhunghan · 2023-09-17T08:47:07Z

I have fixed few bugs, it's more or less in working status. Has been tested with cloud GPUs.

SinanAkkoyun · 2023-10-04T10:11:16Z

That is very cool. How do insertions work here? Copilot is trained to insert code "in the middle", is that also possible with this endpoint or is the only thing it receives the previous code?

chenhunghan · 2023-10-04T11:37:35Z

That is very cool. How do insertions work here? Copilot is trained to insert code "in the middle", is that also possible with this endpoint or is the only thing it receives the previous code?

Only the previous code.

Skoolin · 2023-12-19T11:19:19Z

This is great! Is there a plan to include code insertion / infilling? Codellama has been trained with infilling, but would need the special tokens <PRE>, <MID> and <SUF>. Does the llama tokenizer implementation of exllamav2 support those tokens? Otherwise I might try to implement those, as that would be really useful to me.

chenhunghan · 2023-12-19T12:55:18Z

I don't have plan as this was just for fun. You can definitely add yours, enjoy the hack 😀

Add copilot example

03be5cc

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

Merge branch 'turboderp:master' into master

d5339c3

chenhunghan added 4 commits September 16, 2023 11:45

Add missing model.load()

5c7fbe7

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

Fix cwd path

590076e

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

Finally a working version

5342072

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

34B

dee6afe

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

Merge branch 'turboderp:master' into master

cd44d34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add copilot server example #23

Add copilot server example #23

chenhunghan commented Sep 13, 2023

19h commented Sep 14, 2023

KaruroChori commented Sep 15, 2023

KaruroChori commented Sep 15, 2023 •

edited

Loading

chenhunghan commented Sep 16, 2023

chenhunghan commented Sep 17, 2023

SinanAkkoyun commented Oct 4, 2023

chenhunghan commented Oct 4, 2023

Skoolin commented Dec 19, 2023 •

edited

Loading

chenhunghan commented Dec 19, 2023

Add copilot server example #23

Are you sure you want to change the base?

Add copilot server example #23

Conversation

chenhunghan commented Sep 13, 2023

19h commented Sep 14, 2023

KaruroChori commented Sep 15, 2023

KaruroChori commented Sep 15, 2023 • edited Loading

chenhunghan commented Sep 16, 2023

chenhunghan commented Sep 17, 2023

SinanAkkoyun commented Oct 4, 2023

chenhunghan commented Oct 4, 2023

Skoolin commented Dec 19, 2023 • edited Loading

chenhunghan commented Dec 19, 2023

KaruroChori commented Sep 15, 2023 •

edited

Loading

Skoolin commented Dec 19, 2023 •

edited

Loading