|
6 | 6 | "source": [
|
7 | 7 | "# Llama.cpp\n",
|
8 | 8 | "\n",
|
9 |
| - "[llama-cpp-python](https://github.com/abetlen/llama-cpp-python) is a Python binding for [llama.cpp](https://github.com/ggerganov/llama.cpp). \n", |
| 9 | + "[llama-cpp-python](https://github.com/abetlen/llama-cpp-python) is a Python binding for [llama.cpp](https://github.com/ggerganov/llama.cpp).\n", |
10 | 10 | "\n",
|
11 |
| - "It supports inference for [many LLMs](https://github.com/ggerganov/llama.cpp), which can be accessed on [HuggingFace](https://huggingface.co/TheBloke).\n", |
| 11 | + "It supports inference for [many LLMs](https://github.com/ggerganov/llama.cpp#description) models, which can be accessed on [HuggingFace](https://huggingface.co/TheBloke).\n", |
12 | 12 | "\n",
|
13 | 13 | "This notebook goes over how to run `llama-cpp-python` within LangChain.\n",
|
14 | 14 | "\n",
|
|
54 | 54 | "source": [
|
55 | 55 | "### Installation with OpenBLAS / cuBLAS / CLBlast\n",
|
56 | 56 | "\n",
|
57 |
| - "`lama.cpp` supports multiple BLAS backends for faster processing. Use the `FORCE_CMAKE=1` environment variable to force the use of cmake and install the pip package for the desired BLAS backend ([source](https://github.com/abetlen/llama-cpp-python#installation-with-openblas--cublas--clblast)).\n", |
| 57 | + "`llama.cpp` supports multiple BLAS backends for faster processing. Use the `FORCE_CMAKE=1` environment variable to force the use of cmake and install the pip package for the desired BLAS backend ([source](https://github.com/abetlen/llama-cpp-python#installation-with-openblas--cublas--clblast)).\n", |
58 | 58 | "\n",
|
59 | 59 | "Example installation with cuBLAS backend:"
|
60 | 60 | ]
|
|
177 | 177 | "\n",
|
178 | 178 | "You don't need an `API_TOKEN` as you will run the LLM locally.\n",
|
179 | 179 | "\n",
|
180 |
| - "It is worth understanding which models are suitable to be used on the desired machine." |
| 180 | + "It is worth understanding which models are suitable to be used on the desired machine.\n", |
| 181 | + "\n", |
| 182 | + "[TheBloke's](https://huggingface.co/TheBloke) Hugging Face models have a `Provided files` section that exposes the RAM required to run models of different quantisation sizes and methods (eg: [Llama2-7B-Chat-GGUF](https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF#provided-files)).\n", |
| 183 | + "\n", |
| 184 | + "This [github issue](https://github.com/facebookresearch/llama/issues/425) is also relevant to find the right model for your machine." |
181 | 185 | ]
|
182 | 186 | },
|
183 | 187 | {
|
|
0 commit comments