From d0308d4793165e1c89df152c3548a8fcb003ed42 Mon Sep 17 00:00:00 2001 From: John Morrison Date: Sun, 11 Feb 2024 15:19:43 +0000 Subject: [PATCH] Add post on local llms --- content-org/local-llama-models.org | 92 +++++++++++++++++++++++++ content/posts/local-llama-models.md | 103 ++++++++++++++++++++++++++++ 2 files changed, 195 insertions(+) create mode 100644 content-org/local-llama-models.org create mode 100644 content/posts/local-llama-models.md diff --git a/content-org/local-llama-models.org b/content-org/local-llama-models.org new file mode 100644 index 0000000..c424225 --- /dev/null +++ b/content-org/local-llama-models.org @@ -0,0 +1,92 @@ +#+hugo_base_dir: ~/development/web/jslmorrison.github.io +#+hugo_section: posts +#+options: author:nil + +* Local Llama models +:PROPERTIES: +:EXPORT_FILE_NAME: local-llama-models +:EXPORT_DATE: 2024-02-11 +:END: +How to install and run open source LLM's locally using [[https://ollama.com/][Ollama]] and integrate it into VSCode editor for assisted code completion etc. + +#+hugo: more +#+begin_quote +Ollama is a tool that allows you to run open-source large language models (LLMs) locally on your machine to provide flexibility in working with different models +#+end_quote +You can view a list of [[https://ollama.com/library][supported models here]]. + +** Running Ollama +For me running Ollama locally is a simple as executing the following in a terminal: +#+begin_src bash :noeval +nix shell nixpkgs#ollama --command ollama serve +#+end_src +This will download Ollama and start the server. If all good at this point you should see in the terminal output that it is =Listening on 127.0.0.1:11434=. If you open that URL in a browser then you should see =Ollama is running=. +To run a specific model, browse to the [[https://ollama.com/library][Ollama models library]] and pick one that suits your needs. For example: +#+begin_src bash :noeval +nix shell nixpkgs#ollama run llama2 +#+end_src +#+begin_quote +Llama 2 is released by Meta Platforms, Inc. This model is trained on 2 trillion tokens, and by default supports a context length of 4096. Llama 2 Chat models are fine-tuned on over 1 million human annotations, and are made for chat. +#+end_quote + +** Interaction +After running the model as shown above, once it finishes downloading it will provide you with a prompt to start chatting: +#+begin_src bash :noeval +>>> hello +Hello! It's nice to meet you. Is there something I can help you with or would you like +to chat? + +>>> Send a message (/? for help) +#+end_src +There is also an API available that you can send requests to: +#+begin_src bash :noeval +curl http://localhost:11434/api/generate -d '{ + "model": "llama2", + "prompt": "Hello" +}' +#+end_src +You can [[https://github.com/ollama/ollama/blob/main/docs/api.md][view the API docs here]]. + +To install a different model, repeat the run command above, specifying a different model: +#+begin_src bash :noeval +nix shell nixpkgs#ollama --command ollama run codellama "Write me a PHP function that outputs the fibonacci sequence" + +Here is a PHP function that outputs the Fibonacci sequence: +``` +function fibonacci($n) { + if ($n <= 1) { + return $n; + } else { + return fibonacci($n-1) + fibonacci($n-2); + } +} +``` +This function takes an integer `$n` as input and returns the `n`-th number in the +Fibonacci sequence. The function is based on the recurrence relation for the Fibonacci +sequence, which states that each number is equal to the sum of the previous two numbers. +The function uses a recursive approach, where it calls itself with the previous two +numbers as input until it reaches the desired output. + +For example, if we call the function with `$n = 5`, it will return `8`, since `8` is the +fifth number in the Fibonacci sequence. +``` +echo fibonacci(5); // Output: 8 +``` +Note that this function has a time complexity of O(`2^n`), which means that the running +time grows very quickly as the input increases. This is because each call to the +function creates a new stack frame, and the function calls itself with smaller inputs +until it reaches the base case. As a result, the function can become very slow for large +values of `n`. +#+end_src + +To see a list of installed models: +#+begin_src bash :noeval +nix shell nixpkgs#ollama --command ollama list +#+end_src + +** Integration with VSCode +Install and configure the =llama-coder= extension from the [[https://marketplace.visualstudio.com/items?itemName=ex3ndr.llama-coder][VSCode marketplace]]. +#+begin_quote +Llama Coder is a better and self-hosted Github Copilot replacement for VS Studio Code. Llama Coder uses Ollama and codellama to provide autocomplete that runs on your hardware. +#+end_quote +I'll do a follow up on this with my findings later, after I have some more time using this and compared other models. diff --git a/content/posts/local-llama-models.md b/content/posts/local-llama-models.md new file mode 100644 index 0000000..10643a3 --- /dev/null +++ b/content/posts/local-llama-models.md @@ -0,0 +1,103 @@ ++++ +title = "Local Llama models" +date = 2024-02-11 +draft = false ++++ + +How to install and run open source LLM's locally using [Ollama](https://ollama.com/) and integrate it into VSCode editor for assisted code completion etc. + + + +> Ollama is a tool that allows you to run open-source large language models (LLMs) locally on your machine to provide flexibility in working with different models + +You can view a list of [supported models here](https://ollama.com/library). + + +## Running Ollama {#running-ollama} + +For me running Ollama locally is a simple as executing the following in a terminal: + +```bash +nix shell nixpkgs#ollama --command ollama serve +``` + +This will download Ollama and start the server. If all good at this point you should see in the terminal output that it is `Listening on 127.0.0.1:11434`. If you open that URL in a browser then you should see `Ollama is running`. +To run a specific model, browse to the [Ollama models library](https://ollama.com/library) and pick one that suits your needs. For example: + +```bash +nix shell nixpkgs#ollama run llama2 +``` + +> Llama 2 is released by Meta Platforms, Inc. This model is trained on 2 trillion tokens, and by default supports a context length of 4096. Llama 2 Chat models are fine-tuned on over 1 million human annotations, and are made for chat. + + +## Interaction {#interaction} + +After running the model as shown above, once it finishes downloading it will provide you with a prompt to start chatting: + +```bash +>>> hello +Hello! It's nice to meet you. Is there something I can help you with or would you like +to chat? + +>>> Send a message (/? for help) +``` + +There is also an API available that you can send requests to: + +```bash +curl http://localhost:11434/api/generate -d '{ + "model": "llama2", + "prompt": "Hello" +}' +``` + +You can [view the API docs here](https://github.com/ollama/ollama/blob/main/docs/api.md). + +To install a different model, repeat the run command above, specifying a different model: + +````bash +nix shell nixpkgs#ollama --command ollama run codellama "Write me a PHP function that outputs the fibonacci sequence" + +Here is a PHP function that outputs the Fibonacci sequence: +``` +function fibonacci($n) { + if ($n <= 1) { + return $n; + } else { + return fibonacci($n-1) + fibonacci($n-2); + } +} +``` +This function takes an integer `$n` as input and returns the `n`-th number in the +Fibonacci sequence. The function is based on the recurrence relation for the Fibonacci +sequence, which states that each number is equal to the sum of the previous two numbers. +The function uses a recursive approach, where it calls itself with the previous two +numbers as input until it reaches the desired output. + +For example, if we call the function with `$n = 5`, it will return `8`, since `8` is the +fifth number in the Fibonacci sequence. +``` +echo fibonacci(5); // Output: 8 +``` +Note that this function has a time complexity of O(`2^n`), which means that the running +time grows very quickly as the input increases. This is because each call to the +function creates a new stack frame, and the function calls itself with smaller inputs +until it reaches the base case. As a result, the function can become very slow for large +values of `n`. +```` + +To see a list of installed models: + +````bash +nix shell nixpkgs#ollama --command ollama list +```` + + +## Integration with VSCode {#integration-with-vscode} + +Install and configure the `llama-coder` extension from the [VSCode marketplace](https://marketplace.visualstudio.com/items?itemName=ex3ndr.llama-coder). + +> Llama Coder is a better and self-hosted Github Copilot replacement for VS Studio Code. Llama Coder uses Ollama and codellama to provide autocomplete that runs on your hardware. + +I'll do a follow up on this with my findings later, after I have some more time using this and compared other models.