Skip to content
This repository has been archived by the owner on Apr 28, 2024. It is now read-only.

Commit

Permalink
update supported models information
Browse files Browse the repository at this point in the history
  • Loading branch information
rjmacarthy authored Aug 29, 2023
1 parent 1768e4d commit 0f30e0f
Showing 1 changed file with 5 additions and 4 deletions.
9 changes: 5 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

**Supported models**

- [CodeLlama](https://huggingface.co/codellama) (Huggingface and GPTQ versions)
- [Code Llama](https://huggingface.co/codellama) (Huggingface and GPTQ versions)
- [StarCoder](https://huggingface.co/bigcode/starcoder) (Huggingface and GPTQ versions)

#### 📥 Usage
Expand Down Expand Up @@ -39,8 +39,9 @@ Enjoy personalized and private code completion. 🎉

#### System requirements

For a general idea a single nvidia 3090 can run [bigcode/starcoderbase-3b](https://huggingface.co/bigcode/starcoderbase-3b) in 8Bit comfortably.
The models below have been tested and run comfortably on a single nvidia 3090 with decent accuracy and speed, although the GPTQ models run most efficiently from personal experience.

An nvidia 3090 can run [CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf) in full, 8Bit or 4Bit.
- [bigcode/starcoderbase-3b](https://huggingface.co/bigcode/starcoderbase-3b)
- [CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf)
- [Code Llama 13b GPTQ](https://huggingface.co/TheBloke/CodeLlama-13B-GPTQ)

All models using StarCode tokenizer below 3B are probably working. The 1B models provide faster and more realistically useable inference speed depending on your hardware.

0 comments on commit 0f30e0f

Please sign in to comment.