From d0308d4793165e1c89df152c3548a8fcb003ed42 Mon Sep 17 00:00:00 2001
From: John Morrison <jslmorrison@gmail.com>
Date: Sun, 11 Feb 2024 15:19:43 +0000
Subject: [PATCH] Add post on local llms

---
 content-org/local-llama-models.org  |  92 +++++++++++++++++++++++++
 content/posts/local-llama-models.md | 103 ++++++++++++++++++++++++++++
 2 files changed, 195 insertions(+)
 create mode 100644 content-org/local-llama-models.org
 create mode 100644 content/posts/local-llama-models.md

diff --git a/content-org/local-llama-models.org b/content-org/local-llama-models.org
new file mode 100644
index 0000000..c424225
--- /dev/null
+++ b/content-org/local-llama-models.org
@@ -0,0 +1,92 @@
+#+hugo_base_dir: ~/development/web/jslmorrison.github.io
+#+hugo_section: posts
+#+options: author:nil
+
+* Local Llama models
+:PROPERTIES:
+:EXPORT_FILE_NAME: local-llama-models
+:EXPORT_DATE: 2024-02-11
+:END:
+How to install and run open source LLM's locally using [[https://ollama.com/][Ollama]] and integrate it into VSCode editor for assisted code completion etc.
+
+#+hugo: more
+#+begin_quote
+Ollama is a tool that allows you to run open-source large language models (LLMs) locally on your machine to provide flexibility in working with different models
+#+end_quote
+You can view a list of [[https://ollama.com/library][supported models here]].
+
+** Running Ollama
+For me running Ollama locally is a simple as executing the following in a terminal:
+#+begin_src bash :noeval
+nix shell nixpkgs#ollama --command ollama serve
+#+end_src
+This will download Ollama and start the server. If all good at this point you should see in the terminal output that it is =Listening on 127.0.0.1:11434=. If you open that URL in a browser then you should see =Ollama is running=.
+To run a specific model, browse to the [[https://ollama.com/library][Ollama models library]] and pick one that suits your needs. For example:
+#+begin_src bash :noeval
+nix shell nixpkgs#ollama run llama2
+#+end_src
+#+begin_quote
+Llama 2 is released by Meta Platforms, Inc. This model is trained on 2 trillion tokens, and by default supports a context length of 4096. Llama 2 Chat models are fine-tuned on over 1 million human annotations, and are made for chat.
+#+end_quote
+
+** Interaction
+After running the model as shown above, once it finishes downloading it will provide you with a prompt to start chatting:
+#+begin_src bash :noeval
+>>> hello
+Hello! It's nice to meet you. Is there something I can help you with or would you like
+to chat?
+
+>>> Send a message (/? for help)
+#+end_src
+There is also an API available that you can send requests to:
+#+begin_src bash :noeval
+curl http://localhost:11434/api/generate -d '{
+  "model": "llama2",
+  "prompt": "Hello"
+}'
+#+end_src
+You can [[https://github.com/ollama/ollama/blob/main/docs/api.md][view the API docs here]].
+
+To install a different model, repeat the run command above, specifying a different model:
+#+begin_src bash :noeval
+nix shell nixpkgs#ollama --command ollama run codellama "Write me a PHP function that outputs the fibonacci sequence"
+
+Here is a PHP function that outputs the Fibonacci sequence:
+```
+function fibonacci($n) {
+    if ($n <= 1) {
+        return $n;
+    } else {
+        return fibonacci($n-1) + fibonacci($n-2);
+    }
+}
+```
+This function takes an integer `$n` as input and returns the `n`-th number in the
+Fibonacci sequence. The function is based on the recurrence relation for the Fibonacci
+sequence, which states that each number is equal to the sum of the previous two numbers.
+The function uses a recursive approach, where it calls itself with the previous two
+numbers as input until it reaches the desired output.
+
+For example, if we call the function with `$n = 5`, it will return `8`, since `8` is the
+fifth number in the Fibonacci sequence.
+```
+echo fibonacci(5); // Output: 8
+```
+Note that this function has a time complexity of O(`2^n`), which means that the running
+time grows very quickly as the input increases. This is because each call to the
+function creates a new stack frame, and the function calls itself with smaller inputs
+until it reaches the base case. As a result, the function can become very slow for large
+values of `n`.
+#+end_src
+
+To see a list of installed models:
+#+begin_src bash :noeval
+nix shell nixpkgs#ollama --command ollama list
+#+end_src
+
+** Integration with VSCode
+Install and configure the =llama-coder= extension from the [[https://marketplace.visualstudio.com/items?itemName=ex3ndr.llama-coder][VSCode marketplace]].
+#+begin_quote
+Llama Coder is a better and self-hosted Github Copilot replacement for VS Studio Code. Llama Coder uses Ollama and codellama to provide autocomplete that runs on your hardware.
+#+end_quote
+I'll do a follow up on this with my findings later, after I have some more time using this and compared other models.
diff --git a/content/posts/local-llama-models.md b/content/posts/local-llama-models.md
new file mode 100644
index 0000000..10643a3
--- /dev/null
+++ b/content/posts/local-llama-models.md
@@ -0,0 +1,103 @@
++++
+title = "Local Llama models"
+date = 2024-02-11
+draft = false
++++
+
+How to install and run open source LLM's locally using [Ollama](https://ollama.com/) and integrate it into VSCode editor for assisted code completion etc.
+
+<!--more-->
+
+> Ollama is a tool that allows you to run open-source large language models (LLMs) locally on your machine to provide flexibility in working with different models
+
+You can view a list of [supported models here](https://ollama.com/library).
+
+
+## Running Ollama {#running-ollama}
+
+For me running Ollama locally is a simple as executing the following in a terminal:
+
+```bash
+nix shell nixpkgs#ollama --command ollama serve
+```
+
+This will download Ollama and start the server. If all good at this point you should see in the terminal output that it is `Listening on 127.0.0.1:11434`. If you open that URL in a browser then you should see `Ollama is running`.
+To run a specific model, browse to the [Ollama models library](https://ollama.com/library) and pick one that suits your needs. For example:
+
+```bash
+nix shell nixpkgs#ollama run llama2
+```
+
+> Llama 2 is released by Meta Platforms, Inc. This model is trained on 2 trillion tokens, and by default supports a context length of 4096. Llama 2 Chat models are fine-tuned on over 1 million human annotations, and are made for chat.
+
+
+## Interaction {#interaction}
+
+After running the model as shown above, once it finishes downloading it will provide you with a prompt to start chatting:
+
+```bash
+>>> hello
+Hello! It's nice to meet you. Is there something I can help you with or would you like
+to chat?
+
+>>> Send a message (/? for help)
+```
+
+There is also an API available that you can send requests to:
+
+```bash
+curl http://localhost:11434/api/generate -d '{
+  "model": "llama2",
+  "prompt": "Hello"
+}'
+```
+
+You can [view the API docs here](https://github.com/ollama/ollama/blob/main/docs/api.md).
+
+To install a different model, repeat the run command above, specifying a different model:
+
+````bash
+nix shell nixpkgs#ollama --command ollama run codellama "Write me a PHP function that outputs the fibonacci sequence"
+
+Here is a PHP function that outputs the Fibonacci sequence:
+```
+function fibonacci($n) {
+    if ($n <= 1) {
+        return $n;
+    } else {
+        return fibonacci($n-1) + fibonacci($n-2);
+    }
+}
+```
+This function takes an integer `$n` as input and returns the `n`-th number in the
+Fibonacci sequence. The function is based on the recurrence relation for the Fibonacci
+sequence, which states that each number is equal to the sum of the previous two numbers.
+The function uses a recursive approach, where it calls itself with the previous two
+numbers as input until it reaches the desired output.
+
+For example, if we call the function with `$n = 5`, it will return `8`, since `8` is the
+fifth number in the Fibonacci sequence.
+```
+echo fibonacci(5); // Output: 8
+```
+Note that this function has a time complexity of O(`2^n`), which means that the running
+time grows very quickly as the input increases. This is because each call to the
+function creates a new stack frame, and the function calls itself with smaller inputs
+until it reaches the base case. As a result, the function can become very slow for large
+values of `n`.
+````
+
+To see a list of installed models:
+
+````bash
+nix shell nixpkgs#ollama --command ollama list
+````
+
+
+## Integration with VSCode {#integration-with-vscode}
+
+Install and configure the `llama-coder` extension from the [VSCode marketplace](https://marketplace.visualstudio.com/items?itemName=ex3ndr.llama-coder).
+
+> Llama Coder is a better and self-hosted Github Copilot replacement for VS Studio Code. Llama Coder uses Ollama and codellama to provide autocomplete that runs on your hardware.
+
+I'll do a follow up on this with my findings later, after I have some more time using this and compared other models.