readme: add Azure deployment description + single Docker improvements…

… and fixes (cohere-ai#76) * readme: add Azure deploy description + single Docker improvements * readme: add Azure deploy description + single Docker improvements * Add Sqlite v.3.45.3 for Chroma DB deployment: add docker compose down command in Makefile. (cohere-ai#65) Signed-off-by: ifuryst <ifuryst@gmail.com> coral_web: Add is_available check to tools (cohere-ai#82) * add is_available check to tools * add tool error message as tooltip * disable unavailable tools, show error message if description does not exist Setup: fix key error (cohere-ai#84) docs: Update README.md links (cohere-ai#83) Update README.md links Some links were still pointing to the old `cohere-ia/toolkit` repository, instead of `cohere-ai/cohere-toolkit`. docs: clarify setup env for development. (cohere-ai#64) Signed-off-by: ifuryst <ifuryst@gmail.com> coral-web: update the starter card options (cohere-ai#73) * add new start options * set start option prompts * clean up * remove welcome message * remove notification message * visual nits * center start options, fade out when convo is populated * remove streaming message check coral-web: include conversationId in file upload (cohere-ai#85) include conversationId in file upload Deployment: add local model deployment option (cohere-ai#77) * Deployment: add local model deployment option * lint * add tests * lint * fix cohere prompts Docs: add env setup instructions (cohere-ai#88) Cli: add dummy tests (cohere-ai#89) * Cli: add dummy tests * move cli to backend backend: Set up next.js to proxy requests to the API (cohere-ai#86) Set up next.js to proxy requests to the API tools: Update default NEXT_PUBLIC_API_HOSTNAME for the new api routing (cohere-ai#94) * Update default NEXT_PUBLIC_API_HOSTNAME for the new api routing * Also update NEXT_PUBLIC_API_HOSTNAME in README and .env-template fix: broken backend URL in cli (cohere-ai#93) Update main.py Co-authored-by: Tianjing Li <tianjinglimail@gmail.com> changed logo and pager header implemented openAI adapter added env variable for oai key implemented working chatgpt 1 fixed fix fixed conversation order bug fixed bugs with incorrect chat history. added exrract script other changes to msgs msgRow has text duplication bug fix? big update pls close! cool cool font change big fixes for message row highlight selection now working perfectly. small fix for code selection (remaining bug in nodes textContent tooltip) big commit big commit HUGE bug fix for laggy composer ! HUGE bug fix for laggy composer ! implemented annotation prompting less spam working annotaiton schema setup WORKING annotation saving! working annotation saving in db huge update, cookies, ua and many fixes for highlights
EnronMusk · Jul 22, 2024 · 2503714 · 2503714
1 parent 2764fc2
commit 2503714
Show file tree

Hide file tree

Showing 83 changed files with 4,638 additions and 660 deletions.
diff --git a/.env-template b/.env-template
@@ -1,5 +1,5 @@
 # REQUIRED VARIABLES
-NEXT_PUBLIC_API_HOSTNAME=http://localhost:8000
+NEXT_PUBLIC_API_HOSTNAME=http://backend:8000
 DATABASE_URL=postgresql+psycopg2://postgres:postgres@db:5432
 
 # TOOLS
@@ -13,6 +13,7 @@ WOLFRAM_ALPHA_APP_ID=<APP_ID_HERE>
 
 # 1 - Cohere Platform 
 COHERE_API_KEY=<API_KEY_HERE>
+OPENAI_API_KEY=<OAI_KEY_HERE>
 
 # 2 - SageMaker
 SAGE_MAKER_PROFILE_NAME=<PROFILE NAME>

diff --git a/Makefile b/Makefile
@@ -1,5 +1,7 @@
 dev:
 	@docker compose watch
+down:
+	@docker compose down
 run-tests:
 	docker compose run --build backend poetry run pytest src/backend/tests/$(file)
 run-community-tests:
@@ -19,7 +21,7 @@ reset-db:
 	docker volume rm cohere_toolkit_db
 setup:
 	poetry install --only setup --verbose
-	poetry run python3 cli/main.py
+	poetry run python3 src/backend/cli/main.py
 lint:
 	poetry run black .
 	poetry run isort .

diff --git a/README.md b/README.md
@@ -37,14 +37,98 @@ make first-run
 
 Follow the instructions to configure the model - either AWS Sagemaker, Azure, or Cohere's platform. This can also be done by running `make setup` (See Option 2 below), which will help generate a file for you, or by manually creating a `.env` file and copying the contents of the provided `.env-template`. Then replacing the values with the correct ones.
 
+#### Detailed environment setup
+
+<details>
+  <summary>Windows</summary>
+
+1. Install [docker](https://docs.docker.com/desktop/install/windows-install/)
+2. Install [git]https://git-scm.com/download/win
+3. In PowerShell (Terminal), install [scoop](https://scoop.sh/). After installing, run scoop bucket add extras
+4. Install pipx
+```bash
+scoop install pipx
+pipx ensurepath
+```
+5. Install poetry >= 1.7.1 using 
+```bash
+pipx install poetry
+```
+6. Install miniconda using
+```bash
+scoop install miniconda3
+conda init powershell
+```
+7. Restart PowerShell
+8. Install the following:
+```bash
+scoop install postgresql
+scoop install make
+```
+9. Create a new virtual environment with Python 3.11
+```bash
+conda create -n toolkit python=3.11
+conda activate toolkit
+```
+10. Clone the repo
+11. Alternatively to `make first-run` or `make setup`, run
+```bash
+poetry install --only setup --verbose
+poetry run python cli/main.py
+make migrate
+make dev
+```
+12. Navigate to https://localhost:4000 in your browser
+
+</details>
+
+<details>
+  <summary>MacOS</summary>
+
+1. Install Xcode. This can be done from the App Store or terminal 
+```bash
+xcode-select --install
+```
+2. Install [docker desktop](https://docs.docker.com/desktop/install/mac-install/)
+3. Install [homebrew](https://brew.sh/)
+4. Install [pipx](https://github.com/pypa/pipx). This is useful for installing poetry later.
+```bash
+brew install pipx
+pipx ensurepath
+```
+5. Install [postgres](brew install postgresql)
+6. Install conda using [miniconda](https://docs.anaconda.com/free/miniconda/index.html)
+7. Use your environment manager to create a new virtual environment with Python 3.11
+```bash
+conda create -n toolkit python=3.11
+```
+8. Install [poetry >= 1.7.1](https://python-poetry.org/docs/#installing-with-pipx)
+```bash
+pipx install poetry
+```
+To test if poetry has been installed correctly,
+```bash
+conda activate toolkit
+poetry --version
+```
+You should see the version of poetry (e.g. 1.8.2). If poetry is not found, try
+```bash
+export PATH="$HOME/.local/bin:$PATH"
+```
+And then retry `poetry --version`
+9. Clone the repo and run `make first-run`
+10. Navigate to https://localhost:4000 in your browser
+
+</details>
+
 <details>
   <summary>Environment variables</summary>
 
 ### Cohere Platform
 
 - `COHERE_API_KEY`: If your application will interface with Cohere's API, you will need to supply an API key. Not required if using AWS Sagemaker or Azure.
   Sign up at https://dashboard.cohere.com/ to create an API key.
-- `NEXT_PUBLIC_API_HOSTNAME`: The backend URL which the frontend will communicate with. Defaults to http://localhost:8000
+- `NEXT_PUBLIC_API_HOSTNAME`: The backend URL which the frontend will communicate with. Defaults to http://backend:8000 for use with `docker compose`
 - `DATABASE_URL`: Your PostgreSQL database connection string for SQLAlchemy, should follow the format `postgresql+psycopg2://USER:PASSWORD@HOST:PORT`.
 
 ### AWS Sagemaker
@@ -143,6 +227,15 @@ You can deploy Toolkit with one click to Microsoft Azure Platform:
 
 [<img src="https://aka.ms/deploytoazurebutton" height="48px">](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2Fcohere-ai%2Fcohere-toolkit%2Fmain%2Fazuredeploy.json)
 
+This deployment type uses Azure Container Instances to host the Toolkit.
+After your deployment is complete click "Go to resource" button.
+1) Check the logs to see if the container is running successfully:
+   - click on the "Containers" button on the left side of the screen
+   - click on the container name
+   - click on "Logs" tab to see the logs
+2) Navigate to the "Overview" tab to see the FQDN of the container instance
+3) Open the \<FQDN\>:4000 in your browser to access the Toolkit
+
 ## Setup for Development
 
 ### Setting up Poetry
@@ -162,6 +255,13 @@ poetry run black .
 poetry run isort .
 ```
 
+## Setting up the Environment Variables
+**Please confirm that you have at least one configuration of the Cohere Platform, SageMaker or Azure.**
+
+You have two methods to set up the environment variables:
+1. Run `make setup` and follow the instructions to configure it.
+2. Run `cp .env-template .env` and adjust the values in the `.env` file according to your situation.
+
 ### Setting up Your Local Database
 
 The docker-compose file should spin up a local `db` container with a PostgreSQL server. The first time you setup this project, and whenever new migrations are added, you will need to run:
@@ -284,9 +384,11 @@ A model deployment is a running version of one of the Cohere command models. The
   - This model deployment calls into your Azure deployment. To get an Azure deployment [follow these steps](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/deploy-models-cohere-command). Once you have a model deployed you will need to get the endpoint URL and API key from the azure AI studio https://ai.azure.com/build/ -> Project -> Deployments -> Click your deployment -> You will see your URL and API Key. Note to use the Cohere SDK you need to add `/v1` to the end of the url.
 - SageMaker (model_deployments/sagemaker.py)
   - This deployment option calls into your SageMaker deployment. To create a SageMaker endpoint [follow the steps here](https://docs.cohere.com/docs/amazon-sagemaker-setup-guide), alternatively [follow a command notebook here](https://github.com/cohere-ai/cohere-aws/tree/main/notebooks/sagemaker). Note your region and endpoint name when executing the notebook as these will be needed in the environment variables.
+- Local models with LlamaCPP (community/model_deployments/local_model.py)
+  - This deployment option calls into a local model. To use this deployment you will need to download a model. You can use Cohere command models or choose between a range of other models that you can see [here](https://github.com/ggerganov/llama.cpp). You will need to enable community features to use this deployment by setting `USE_COMMUNITY_FEATURES=True` in your .env file.
 - To add your own deployment:
-  1. Create a deployment file, add it to [/community/model_deployments](https://github.com/cohere-ai/toolkit/tree/main/src/community/model_deployments) folder, implement the function calls from `BaseDeployment` similar to the other deployments.
-  2. Add the deployment to [src/community/config/deployments.py](https://github.com/cohere-ai/toolkit/blob/main/src/community/config/deployments.py)
+  1. Create a deployment file, add it to [/community/model_deployments](https://github.com/cohere-ai/cohere-toolkit/tree/main/src/community/model_deployments) folder, implement the function calls from `BaseDeployment` similar to the other deployments.
+  2. Add the deployment to [src/community/config/deployments.py](https://github.com/cohere-ai/cohere-toolkit/blob/main/src/community/config/deployments.py)
   3. Add the environment variables required to the env template.
 - To add a Cohere private deployment, use the steps above copying the cohere platform implementation changing the base_url for your private deployment and add in custom auth steps.
 
@@ -310,7 +412,7 @@ Currently the core chat interface is the Coral frontend. To add your own interfa
 
 ### How to add a connector to the Toolkit
 
-If you have already created a [connector](https://docs.cohere.com/docs/connectors), it can be used in the toolkit with `ConnectorRetriever`. Add in your configuration and then add the definition in [community/config/tools.py](https://github.com/cohere-ai/toolkit/blob/main/src/community/config/tools.py) similar to `Arxiv` implementation with the category `Category.DataLoader`. You can now use the Coral frontend and API with the connector.
+If you have already created a [connector](https://docs.cohere.com/docs/connectors), it can be used in the toolkit with `ConnectorRetriever`. Add in your configuration and then add the definition in [community/config/tools.py](https://github.com/cohere-ai/cohere-toolkit/blob/main/src/community/config/tools.py) similar to `Arxiv` implementation with the category `Category.DataLoader`. You can now use the Coral frontend and API with the connector.
 
 ### How to set up web search with the Toolkit
 

diff --git a/docker_scripts/env-defaults b/docker_scripts/env-defaults
@@ -13,6 +13,6 @@ DB_TEMPLATE=${DB_TEMPLATE:-template1}
 DB_EXTENSION=${DB_EXTENSION:-}
 
 # Defaults for the toolkit
-NEXT_PUBLIC_API_HOSTNAME=${NEXT_PUBLIC_API_HOSTNAME:-http://localhost:8000}
-PYTHON_INTERPRETER_URL=${PYTHON_INTERPRETER_URL:-http://localhost:8080}
-DATABASE_URL=${DATABASE_URL:-postgresql+psycopg2://postgre:postgre@localhost:5432/toolkit}
+export NEXT_PUBLIC_API_HOSTNAME=${NEXT_PUBLIC_API_HOSTNAME:-http://localhost:8000}
+export PYTHON_INTERPRETER_URL=${PYTHON_INTERPRETER_URL:-http://localhost:8080}
+export DATABASE_URL=${DATABASE_URL:-postgresql+psycopg2://postgre:postgre@localhost:5432/toolkit}
diff --git a/docs/deployment_guides/single_container.md b/docs/deployment_guides/single_container.md
@@ -11,7 +11,16 @@ You can deploy Toolkit with one click in Microsoft Azure Platform:
 
 [<img src="https://aka.ms/deploytoazurebutton" height="30px">](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2Fcohere-ai%2Fcohere-toolkit%2Fmain%2Fazuredeploy.json)
 
-### AWS ECS(Fargate) Deployment guide
+This deployment type uses Azure Container Instances to host the Toolkit.
+After your deployment is complete click "Go to resource" button.
+1) Check the logs to see if the container is running successfully:
+   - click on the "Containers" button on the left side of the screen
+   - click on the container name
+   - click on "Logs" tab to see the logs
+2) Navigate to the "Overview" tab to see the FQDN of the container instance
+3) Open the \<FQDN\>:4000 in your browser to access the Toolkit
+
+### AWS ECS Deployment guides
 - [AWS ECS Fargate Deployment](aws_ecs_single_container.md): Deploy the Toolkit single container to AWS ECS(Fargate).
 - [AWS ECS EC2 Deployment](docs/deployment_guides/aws_ecs_single_container_ec2.md): Deploy the Toolkit single container to AWS ECS(EC2).
 

diff --git a/docs/postman/Toolkit.postman_collection.json b/docs/postman/Toolkit.postman_collection.json
@@ -455,7 +455,7 @@
 								{
 									"key": "file",
 									"type": "file",
-									"src": "/Users/luisa/Downloads/Aya_dataset__ACL_edition.pdf"
+									"src": "/Users/luisa/Downloads/Aya_dataset.pdf"
 								}
 							]
 						},

diff --git a/extract.py b/extract.py
@@ -0,0 +1,77 @@
+from src.backend.crud.conversation import extract_conversations, Conversation
+
+from dotenv import load_dotenv
+from sqlalchemy import create_engine, text
+from sqlalchemy.orm import Session, sessionmaker
+
+import json
+from datetime import datetime
+
+load_dotenv()
+
+SQLALCHEMY_DATABASE_URL = "postgresql+psycopg2://postgres:postgres@localhost:5433"
+
+engine = create_engine(
+    SQLALCHEMY_DATABASE_URL, echo=False
+)
+
+db = Session(autocommit=False, autoflush=False, bind=engine)
+
+def run_script():
+
+    """
+    
+    Saves all conversations in the database in format:
+    \n`conv_id` : {conversation attributes}
+
+    """
+
+    conversations = extract_conversations(db)
+
+    file_path = "conversations.txt"
+
+    data = {}
+
+    #Format the data and assemble the new conversation dictionary
+    for conv in conversations:
+
+        id, p_conv = parse_conversation(conv)
+        data[id] = p_conv
+
+    print(conversations[-1].description)
+    print(conversations[-1].messages[-1].text)
+
+    #Save it
+    with open(file_path, "w") as file:
+        json.dump(data, file)
+
+    print(f"Succesfully saved file at {file_path}! Saved {len(conversations)} conversations!")
+    print("Checking if data can be successfully loaded . . .")
+
+    #Check to see if we can load data without errors.
+    try:
+        with open(file_path, "r") as file:
+            loaded_data = json.load(file)
+            print("Sucess!")
+    except Exception as e:
+        print("We were unable to load the data, this means it isnt being saved properly and is corrupted.")
+        print(f"Error message: {e}")
+
+#Turns a conversation into something we can store.
+def parse_conversation(conv : Conversation) -> tuple[str, dict]:
+    """
+    
+    Returns a conversation_id and dictionary of all conversation data.
+    
+    """
+
+    parsed_messages = [{'role' : msg.agent, 'text' : msg.text, 'm_id' : msg.id, 'annotations' : [{'a_id' : annot.id, 'htext' : annot.htext, 'annotation' : annot.annotation, 'start' : annot.start, 'end' : annot.end} for annot in msg.annotations], 'position' : msg.position} for msg in conv.messages]
+
+    return conv.id, {
+        'date' : conv.created_at.strftime("%Y-%m-%d"),
+        'user_id' : conv.user_id,
+        'messages' : parsed_messages
+    }
+
+if __name__ == "__main__":
+    run_script()
diff --git a/poetry.lock b/poetry.lock
diff --git a/pyproject.toml b/pyproject.toml
@@ -62,6 +62,7 @@ llama-index = "^0.10.11"
 wolframalpha = "^5.0.0"
 transformers = "^4.40.1"
 torch = "^2.3.0"
+llama-cpp-python = "^0.2.67"
 
 [build-system]
 requires = ["poetry-core"]