-
-
Notifications
You must be signed in to change notification settings - Fork 283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add copilot server example #23
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>
Wow, this is pretty cool 👍 |
I am having issues with
I made some minor modifications to the code so that the model is not downloaded from huggingface, and changed the path to match the one I used for the docker config I added. Still, I don't think this was the cause. Was anyone else successful with the original sourcecode in this PR? |
I found the issue. You code was missing
before cache is reserved. |
Thank you :) |
Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>
Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>
Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>
I have fixed few bugs, it's more or less in working status. Has been tested with cloud GPUs. |
That is very cool. How do insertions work here? Copilot is trained to insert code "in the middle", is that also possible with this endpoint or is the only thing it receives the previous code? |
Only the previous code. |
This is great! Is there a plan to include code insertion / infilling? Codellama has been trained with infilling, but would need the special tokens <PRE>, <MID> and <SUF>. Does the llama tokenizer implementation of exllamav2 support those tokens? Otherwise I might try to implement those, as that would be really useful to me. |
I don't have plan as this was just for fun. You can definitely add yours, enjoy the hack 😀 |
This PR adds an example HTTP server wrapping
exllamav2
, which can be used as the server replacing Github Copilot backend.