Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run ruff, add to precommit #491

Merged
merged 2 commits into from
Aug 2, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/python-test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -26,5 +26,7 @@ jobs:
run: |
python -m pip install --upgrade pip
pip install -r requirements-dev.txt
- name: Lint with ruff
run: ruff .
- name: Run Python tests
run: python3 -m pytest
11 changes: 11 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
- id: check-yaml
- id: end-of-file-fixer
- id: trailing-whitespace
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.0.282
hooks:
- id: ruff
8 changes: 7 additions & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,12 @@ Install the development dependencies:
python3 -m pip install -r requirements-dev.txt
```

Install the pre-commit hooks:

```
pre-commit install
```

Run the tests:

```
Expand Down Expand Up @@ -105,4 +111,4 @@ Run `black` to format a file:

```
python3 -m black <path-to-file>
```
```
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,15 +29,15 @@ The repo includes sample data so it's ready to try end to end. In this sample ap
#### To Run Locally

* [Azure Developer CLI](https://aka.ms/azure-dev/install)
* [Python 3+](https://www.python.org/downloads/)
* [Python 3.8+](https://www.python.org/downloads/)
* **Important**: Python and the pip package manager must be in the path in Windows for the setup scripts to work.
* **Important**: Ensure you can run `python --version` from console. On Ubuntu, you might need to run `sudo apt install python-is-python3` to link `python` to `python3`.
* [Node.js 14+](https://nodejs.org/en/download/)
* [Git](https://git-scm.com/downloads)
* [Powershell 7+ (pwsh)](https://github.com/powershell/powershell) - For Windows users only.
* **Important**: Ensure you can run `pwsh.exe` from a PowerShell command. If this fails, you likely need to upgrade PowerShell.

>NOTE: Your Azure Account must have `Microsoft.Authorization/roleAssignments/write` permissions, such as [User Access Administrator](https://learn.microsoft.com/azure/role-based-access-control/built-in-roles#user-access-administrator) or [Owner](https://learn.microsoft.com/azure/role-based-access-control/built-in-roles#owner).
>NOTE: Your Azure Account must have `Microsoft.Authorization/roleAssignments/write` permissions, such as [User Access Administrator](https://learn.microsoft.com/azure/role-based-access-control/built-in-roles#user-access-administrator) or [Owner](https://learn.microsoft.com/azure/role-based-access-control/built-in-roles#owner).

#### To Run in GitHub Codespaces or VS Code Remote Containers

Expand All @@ -61,7 +61,7 @@ Execute the following command, if you don't have any pre-existing Azure services

1. Run `azd up` - This will provision Azure resources and deploy this sample to those resources, including building the search index based on the files found in the `./data` folder.
* For the target location, the regions that currently support the models used in this sample are **East US**, **South Central US**, and **West Europe**. For an up-to-date list of regions and models, check [here](https://learn.microsoft.com/azure/cognitive-services/openai/concepts/models#model-summary-table-and-region-availability).
1. After the application has been successfully deployed you will see a URL printed to the console. Click that URL to interact with the application in your browser.
1. After the application has been successfully deployed you will see a URL printed to the console. Click that URL to interact with the application in your browser.

It will look like the following:

Expand Down
35 changes: 18 additions & 17 deletions app/backend/app.py
Original file line number Diff line number Diff line change
@@ -1,17 +1,18 @@
import os
import io
import logging
import mimetypes
import os
import time
import logging

import openai
from flask import Flask, request, jsonify, send_file, abort
from approaches.chatreadretrieveread import ChatReadRetrieveReadApproach
from approaches.readdecomposeask import ReadDecomposeAsk
from approaches.readretrieveread import ReadRetrieveReadApproach
from approaches.retrievethenread import RetrieveThenReadApproach
from azure.identity import DefaultAzureCredential
from azure.search.documents import SearchClient
from approaches.retrievethenread import RetrieveThenReadApproach
from approaches.readretrieveread import ReadRetrieveReadApproach
from approaches.readdecomposeask import ReadDecomposeAsk
from approaches.chatreadretrieveread import ChatReadRetrieveReadApproach
from azure.storage.blob import BlobServiceClient
from flask import Flask, abort, jsonify, request, send_file

# Replace these with your own values, either in environment variables or directly here
AZURE_STORAGE_ACCOUNT = os.environ.get("AZURE_STORAGE_ACCOUNT") or "mystorageaccount"
Expand All @@ -28,8 +29,8 @@
KB_FIELDS_CATEGORY = os.environ.get("KB_FIELDS_CATEGORY") or "category"
KB_FIELDS_SOURCEPAGE = os.environ.get("KB_FIELDS_SOURCEPAGE") or "sourcepage"

# Use the current user identity to authenticate with Azure OpenAI, Cognitive Search and Blob Storage (no secrets needed,
# just use 'az login' locally, and managed identity when deployed on Azure). If you need to use keys, use separate AzureKeyCredential instances with the
# Use the current user identity to authenticate with Azure OpenAI, Cognitive Search and Blob Storage (no secrets needed,
# just use 'az login' locally, and managed identity when deployed on Azure). If you need to use keys, use separate AzureKeyCredential instances with the
# keys for each service
# If you encounter a blocking error during a DefaultAzureCredntial resolution, you can exclude the problematic credential by using a parameter (ex. exclude_shared_token_cache_credential=True)
azure_credential = DefaultAzureCredential(exclude_shared_token_cache_credential = True)
Expand All @@ -50,7 +51,7 @@
index_name=AZURE_SEARCH_INDEX,
credential=azure_credential)
blob_client = BlobServiceClient(
account_url=f"https://{AZURE_STORAGE_ACCOUNT}.blob.core.windows.net",
account_url=f"https://{AZURE_STORAGE_ACCOUNT}.blob.core.windows.net",
credential=azure_credential)
blob_container = blob_client.get_container_client(AZURE_STORAGE_CONTAINER)

Expand All @@ -63,11 +64,11 @@
}

chat_approaches = {
"rrr": ChatReadRetrieveReadApproach(search_client,
"rrr": ChatReadRetrieveReadApproach(search_client,
AZURE_OPENAI_CHATGPT_DEPLOYMENT,
AZURE_OPENAI_CHATGPT_MODEL,
AZURE_OPENAI_CHATGPT_MODEL,
AZURE_OPENAI_EMB_DEPLOYMENT,
KB_FIELDS_SOURCEPAGE,
KB_FIELDS_SOURCEPAGE,
KB_FIELDS_CONTENT)
}

Expand All @@ -78,7 +79,7 @@
def static_file(path):
return app.send_static_file(path)

# Serve content files from blob storage from within the app to keep the example self-contained.
# Serve content files from blob storage from within the app to keep the example self-contained.
# *** NOTE *** this assumes that the content files are public, or at least that all users of the app
# can access all the files. This is also slow and memory hungry.
@app.route("/content/<path>")
Expand All @@ -93,7 +94,7 @@ def content_file(path):
blob.readinto(blob_file)
blob_file.seek(0)
return send_file(blob_file, mimetype=mime_type, as_attachment=False, download_name=path)

@app.route("/ask", methods=["POST"])
def ask():
if not request.json:
Expand All @@ -108,7 +109,7 @@ def ask():
except Exception as e:
logging.exception("Exception in /ask")
return jsonify({"error": str(e)}), 500

@app.route("/chat", methods=["POST"])
def chat():
if not request.json:
Expand All @@ -130,6 +131,6 @@ def ensure_openai_token():
if openai_token.expires_on < time.time() + 60:
openai_token = azure_credential.get_token("https://cognitiveservices.azure.com/.default")
openai.api_key = openai_token.token

if __name__ == "__main__":
app.run()
59 changes: 29 additions & 30 deletions app/backend/approaches/chatreadretrieveread.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,13 @@
from typing import Any, Sequence

import openai
import tiktoken
from approaches.approach import Approach
from azure.search.documents import SearchClient
from azure.search.documents.models import QueryType
from approaches.approach import Approach
from text import nonewlines

from core.messagebuilder import MessageBuilder
from core.modelhelper import get_token_limit
from text import nonewlines


class ChatReadRetrieveReadApproach(Approach):
# Chat roles
Expand All @@ -28,13 +27,13 @@ class ChatReadRetrieveReadApproach(Approach):
{follow_up_questions_prompt}
{injected_prompt}
"""
follow_up_questions_prompt_content = """Generate three very brief follow-up questions that the user would likely ask next about their healthcare plan and employee handbook.
follow_up_questions_prompt_content = """Generate three very brief follow-up questions that the user would likely ask next about their healthcare plan and employee handbook.
Use double angle brackets to reference the questions, e.g. <<Are there exclusions for prescriptions?>>.
Try not to repeat questions that have already been asked.
Only generate questions and do not generate any text before or after the questions, such as 'Next Questions'"""

query_prompt_template = """Below is a history of the conversation so far, and a new question asked by the user that needs to be answered by searching in a knowledge base about employee healthcare plans and the employee handbook.
Generate a search query based on the conversation and the new question.
Generate a search query based on the conversation and the new question.
Do not include cited source filenames and document names e.g info.txt or doc.pdf in the search query terms.
Do not include any text inside [] or <<>> in the search query terms.
Do not include any special characters like '+'.
Expand Down Expand Up @@ -80,11 +79,11 @@ def run(self, history: Sequence[dict[str, str]], overrides: dict[str, Any]) -> A
chat_completion = openai.ChatCompletion.create(
deployment_id=self.chatgpt_deployment,
model=self.chatgpt_model,
messages=messages,
temperature=0.0,
max_tokens=32,
messages=messages,
temperature=0.0,
max_tokens=32,
n=1)

query_text = chat_completion.choices[0].message.content
if query_text.strip() == "0":
query_text = history[-1]["user"] # Use the last user input if we failed to generate a better query
Expand All @@ -103,23 +102,23 @@ def run(self, history: Sequence[dict[str, str]], overrides: dict[str, Any]) -> A

# Use semantic L2 reranker if requested and if retrieval mode is text or hybrid (vectors + text)
if overrides.get("semantic_ranker") and has_text:
r = self.search_client.search(query_text,
r = self.search_client.search(query_text,
filter=filter,
query_type=QueryType.SEMANTIC,
query_language="en-us",
query_speller="lexicon",
semantic_configuration_name="default",
top=top,
query_type=QueryType.SEMANTIC,
query_language="en-us",
query_speller="lexicon",
semantic_configuration_name="default",
top=top,
query_caption="extractive|highlight-false" if use_semantic_captions else None,
vector=query_vector,
vector=query_vector,
top_k=50 if query_vector else None,
vector_fields="embedding" if query_vector else None)
else:
r = self.search_client.search(query_text,
filter=filter,
top=top,
vector=query_vector,
top_k=50 if query_vector else None,
r = self.search_client.search(query_text,
filter=filter,
top=top,
vector=query_vector,
top_k=50 if query_vector else None,
vector_fields="embedding" if query_vector else None)
if use_semantic_captions:
results = [doc[self.sourcepage_field] + ": " + nonewlines(" . ".join([c.text for c in doc['@search.captions']])) for doc in r]
Expand All @@ -128,7 +127,7 @@ def run(self, history: Sequence[dict[str, str]], overrides: dict[str, Any]) -> A
content = "\n".join(results)

follow_up_questions_prompt = self.follow_up_questions_prompt_content if overrides.get("suggest_followup_questions") else ""

# STEP 3: Generate a contextual and content specific answer using the search results and chat history

# Allow client to replace the entire prompt, or to inject into the exiting prompt using >>>
Expand All @@ -139,7 +138,7 @@ def run(self, history: Sequence[dict[str, str]], overrides: dict[str, Any]) -> A
system_message = self.system_message_chat_conversation.format(injected_prompt=prompt_override[3:] + "\n", follow_up_questions_prompt=follow_up_questions_prompt)
else:
system_message = prompt_override.format(follow_up_questions_prompt=follow_up_questions_prompt)

messages = self.get_messages_from_history(
system_message + "\n\nSources:\n" + content,
self.chatgpt_model,
Expand All @@ -150,17 +149,17 @@ def run(self, history: Sequence[dict[str, str]], overrides: dict[str, Any]) -> A
chat_completion = openai.ChatCompletion.create(
deployment_id=self.chatgpt_deployment,
model=self.chatgpt_model,
messages=messages,
temperature=overrides.get("temperature") or 0.7,
max_tokens=1024,
messages=messages,
temperature=overrides.get("temperature") or 0.7,
max_tokens=1024,
n=1)

chat_content = chat_completion.choices[0].message.content

msg_to_display = '\n\n'.join([str(message) for message in messages])

return {"data_points": results, "answer": chat_content, "thoughts": f"Searched for:<br>{query_text}<br><br>Conversations:<br>" + msg_to_display.replace('\n', '<br>')}

def get_messages_from_history(self, system_prompt: str, model_id: str, history: Sequence[dict[str, str]], user_conv: str, few_shots = [], max_tokens: int = 4096) -> []:
message_builder = MessageBuilder(system_prompt, model_id)

Expand All @@ -179,6 +178,6 @@ def get_messages_from_history(self, system_prompt: str, model_id: str, history:
message_builder.append_message(self.USER, h.get('user'), index=append_index)
if message_builder.token_length > max_tokens:
break

messages = message_builder.messages
return messages
return messages
Loading
Loading