Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azdevify chat-with-your-data-solution-accelerator. #2

Merged
merged 45 commits into from
Feb 6, 2024
Merged
Show file tree
Hide file tree
Changes from 36 commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
24be759
azdevify this sample and add azd new feature in devcontainer
Nov 5, 2023
fbfd9b7
instead parameters.json file with .bicepparam
Nov 5, 2023
d32d977
processing format
Nov 7, 2023
552f254
keep consistent casing for params
Nov 7, 2023
c00f55a
move appsettings
Nov 7, 2023
b2e62b2
remove some invalid changes
Nov 9, 2023
b1a97b7
fix format issues
Nov 9, 2023
625487e
Remove depandsOn
Nov 9, 2023
65ac3c5
remove resources file
Nov 10, 2023
43b09e7
remov keys from output
Nov 10, 2023
bd29d31
remove existing reference in module
Nov 10, 2023
8587450
update ReadMe
Nov 15, 2023
afd3ca8
add rgName in listkeys
Nov 22, 2023
654762e
add use keyvault option
Nov 23, 2023
cc3e957
set default value in .bicepparam
Nov 23, 2023
9dc6a78
fix some bugs
Nov 28, 2023
9bcb2a7
update rbac
Dec 14, 2023
1a9212b
update rbac
Dec 21, 2023
eb11eab
fix some format and update infra core
Jan 3, 2024
a403d25
Update openai to v1
ChenxiJiang333 Jan 3, 2024
b88ee5f
use resourcegroup.name() instead pass in rhName
Jan 3, 2024
4bdf8c3
Merge pull request #1 from zedy-wj/openaiv1
zedy-wj Jan 3, 2024
1ecb9be
Revert "Update openai to v1"
zedy-wj Jan 3, 2024
75f230e
Merge pull request #2 from zedy-wj/revert-1-openaiv1
zedy-wj Jan 3, 2024
969aa25
add token provider
ChenxiJiang333 Jan 4, 2024
ffee3f8
Merge branch 'azdevify' into openaiv1
ChenxiJiang333 Jan 4, 2024
778e1d4
Merge pull request #3 from zedy-wj/openaiv1
zedy-wj Jan 4, 2024
5892d5d
Update requirements.txt
ChenxiJiang333 Jan 4, 2024
ce7d48d
Merge pull request #4 from zedy-wj/ChenxiJiang333-patch-1
zedy-wj Jan 4, 2024
2b683de
revert
Jan 4, 2024
397ab11
fix chat
ChenxiJiang333 Jan 4, 2024
1e21f2a
Update TextProcessingTool.py
ChenxiJiang333 Jan 4, 2024
9481a66
Merge pull request #5 from zedy-wj/newcommit
zedy-wj Jan 5, 2024
b25cce5
update KeyVault options
Jan 10, 2024
332402a
abstract auth type with define method
Jan 11, 2024
bd092df
remove invalid infra/core
Jan 11, 2024
5406c5a
update readme and storage.bicep in infra/core
Jan 12, 2024
d2145ec
Update output format
Jan 17, 2024
c83a409
default to use rbac
Jan 22, 2024
aa86081
add some comments
Jan 26, 2024
06bb879
solve conflicts
ChenxiJiang333 Jan 31, 2024
1a7303a
add readme and speech service related code in main.bicep
Feb 1, 2024
2c8abfb
fix azd deploy problem
Feb 2, 2024
5dcf10c
move code folder structure
Feb 5, 2024
779065f
fix conflict
Feb 6, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
{
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you investigate this flow:

  1. Use RBAC
  2. Store and Use Keys in KeyVault. If the KV isn't specified, then try to read the values from the service with listKeys.

The above adds more complexity, but allows the user to store the keys in KV as an option.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just linking this as we should revisit the Speech Service using this method too: - Azure-Samples#101 (not for this PR!)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use of KeyVault has been added and is currently testing well.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the back and forth on this. Let's do this:

Default to using rbac.

If AUTH_TYPE is rbac, then assume the developer wants to also use key vault. Put the keys in keyvault and then read them from keyvault.

If the user sets AUTH_TYPE to keys, then read the keys from the environment and set the keys using listKeys function.

I think that will be cleaner.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our understanding, if AUTH_TYPE is rbac, then some App Services and User should be assigned corresponding permissions in the main.bicep file. This means that keys will no longer be needed in the code, but the service will be accessed directly through something like DefaultAzureCredential().

So regarding what you mention here, If AUTH_TYPE is rbac, then assume the developer wants to also use key vault., which confuses us. Because in our concept, one of them is enough.

If we have any misunderstandings about rbac or your comments, please let me know, thanks.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, only use a key vault if there's no RBAC option for the service.

Copy link
Collaborator Author

@zedy-wj zedy-wj Jan 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the logic in our code so far, please re-review it, thanks!

Yes, only use a key vault if there's no RBAC option for the service.

Default to using rbac: https://github.com/jongio/chat-with-your-data-solution-accelerator/pull/2/files#diff-c274a6091f4ca06948f0fe1ab8681ec4eb6b4e98c4be09717a2e3cacd1344727R9-R12

If AUTH_TYPE is rbac, the developer must choose not to use the KeyVault, if AUTH_TYPE is keys, the developer can choose to use the KeyVault or directly use listKeys().

"name": "Chat with your data Solution Accelerator",
"image": "mcr.microsoft.com/devcontainers/python:3.11",
"features": {
"ghcr.io/devcontainers/features/node:1": {
"version": "16",
"nodeGypDependencies": false
},
"ghcr.io/devcontainers/features/powershell:1.1.1": {},
"ghcr.io/devcontainers/features/azure-cli:1.2.1": {},
"ghcr.io/azure/azure-dev/azd:latest": {}
},
"customizations": {
"vscode": {
"extensions": [
"ms-azuretools.azure-dev",
"ms-azuretools.vscode-bicep",
"ms-python.python",
"esbenp.prettier-vscode"
]
}
},
"forwardPorts": [
50505
],
"postCreateCommand": "",
"remoteUser": "vscode",
"hostRequirements": {
"memory": "8gb"
}
}
40 changes: 39 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,8 +104,46 @@ Out-of-the-box, you can upload the following file types:

![A screenshot of the chat app.](./media/chat-app.png)

## Running the sample using the Azure Developer CLI (azd)

## Development and run the accelerator locally
The Azure Developer CLI (`azd`) is a developer-centric command-line interface (CLI) tool for creating Azure applications.

You need to install it before running and deploying with the Azure Developer CLI.

### Windows

```powershell
powershell -ex AllSigned -c "Invoke-RestMethod 'https://aka.ms/install-azd.ps1' | Invoke-Expression"
```

### Linux/MacOS

```
curl -fsSL https://aka.ms/install-azd.sh | bash
```

After logging in with the following command, you will be able to use azd cli to quickly provision and deploy the application.
zedy-wj marked this conversation as resolved.
Show resolved Hide resolved

```
azd auth login
```

Then, execute the `azd init` command to initialize the environment.
zedy-wj marked this conversation as resolved.
Show resolved Hide resolved
```
azd init -t chat-with-your-data-solution-accelerator
```
According to the prompt, enter an environment name.
zedy-wj marked this conversation as resolved.
Show resolved Hide resolved

Run `azd up` to provision all the resources to Azure and deploy the code to those resources.
```
azd up
```

According to the prompt, select `subscription` and `location`, these are the necessary parameters when you create resources. After that, choose a resource group or create a new resource group. Wait a moment for the resource deployment to complete, click the Website endpoint and you will see the web app page.
zedy-wj marked this conversation as resolved.
Show resolved Hide resolved

You can also run the sample directly locally (See below).

## Develop and run the accelerator locally

To customize the accelerator or run it locally, first, copy the `.env.sample` file to your development environment's `.env` file, and edit it according to [environment variable values table](#environment-variables) below.

Expand Down
30 changes: 21 additions & 9 deletions app.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import os
import logging
import requests
import openai
from openai import AzureOpenAI

# Fixing MIME types for static files under Windows
import mimetypes
Expand All @@ -13,6 +13,8 @@
from flask import Flask, Response, request, jsonify
from dotenv import load_dotenv
from backend.utilities.QuestionHandler import QuestionHandler
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
from azure.keyvault.secrets import SecretClient

load_dotenv()

Expand All @@ -23,10 +25,20 @@
def static_file(path):
return app.send_static_file(path)

AUTH_TYPE = os.environ.get("AUTH_TYPE")
zedy-wj marked this conversation as resolved.
Show resolved Hide resolved

zedy-wj marked this conversation as resolved.
Show resolved Hide resolved
if not AUTH_TYPE == 'rbac' and os.environ.get("USE_KEY_VAULT"):
credential = DefaultAzureCredential()
secret_client = SecretClient(os.environ.get("AZURE_KEY_VAULT_ENDPOINT"), credential)
AZURE_SEARCH_KEY = secret_client.get_secret(os.environ.get("AZURE_SEARCH_KEY")).value
AZURE_OPENAI_KEY = secret_client.get_secret(os.environ.get("AZURE_OPENAI_KEY")).value
else:
AZURE_SEARCH_KEY = None if AUTH_TYPE == 'rbac' else os.environ.get("AZURE_SEARCH_KEY")
AZURE_OPENAI_KEY = "" if AUTH_TYPE == 'rbac' else os.environ.get("AZURE_OPENAI_KEY")

# ACS Integration Settings
AZURE_SEARCH_SERVICE = os.environ.get("AZURE_SEARCH_SERVICE")
AZURE_SEARCH_INDEX = os.environ.get("AZURE_SEARCH_INDEX")
AZURE_SEARCH_KEY = os.environ.get("AZURE_SEARCH_KEY")
AZURE_SEARCH_USE_SEMANTIC_SEARCH = os.environ.get("AZURE_SEARCH_USE_SEMANTIC_SEARCH", False)
AZURE_SEARCH_SEMANTIC_SEARCH_CONFIG = os.environ.get("AZURE_SEARCH_SEMANTIC_SEARCH_CONFIG", "default")
AZURE_SEARCH_TOP_K = os.environ.get("AZURE_SEARCH_TOP_K", 5)
Expand All @@ -39,7 +51,6 @@ def static_file(path):
# AOAI Integration Settings
AZURE_OPENAI_RESOURCE = os.environ.get("AZURE_OPENAI_RESOURCE")
AZURE_OPENAI_MODEL = os.environ.get("AZURE_OPENAI_MODEL")
AZURE_OPENAI_KEY = os.environ.get("AZURE_OPENAI_KEY")
AZURE_OPENAI_TEMPERATURE = os.environ.get("AZURE_OPENAI_TEMPERATURE", 0)
AZURE_OPENAI_TOP_P = os.environ.get("AZURE_OPENAI_TOP_P", 1.0)
AZURE_OPENAI_MAX_TOKENS = os.environ.get("AZURE_OPENAI_MAX_TOKENS", 1000)
Expand All @@ -48,6 +59,7 @@ def static_file(path):
AZURE_OPENAI_API_VERSION = os.environ.get("AZURE_OPENAI_API_VERSION", "2023-06-01-preview")
AZURE_OPENAI_STREAM = os.environ.get("AZURE_OPENAI_STREAM", "true")
AZURE_OPENAI_MODEL_NAME = os.environ.get("AZURE_OPENAI_MODEL_NAME", "gpt-35-turbo") # Name of the model, e.g. 'gpt-35-turbo' or 'gpt-4'
AZURE_TOKEN_PROVIDER = get_bearer_token_provider(DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default")

SHOULD_STREAM = True if AZURE_OPENAI_STREAM.lower() == "true" else False

Expand Down Expand Up @@ -191,10 +203,10 @@ def stream_without_data(response):


def conversation_without_data(request):
openai.api_type = "azure"
openai.api_base = f"https://{AZURE_OPENAI_RESOURCE}.openai.azure.com/"
openai.api_version = "2023-03-15-preview"
openai.api_key = AZURE_OPENAI_KEY
if AUTH_TYPE == 'rbac':
openai_client = AzureOpenAI(azure_endpoint=f"https://{AZURE_OPENAI_RESOURCE}.openai.azure.com/", api_version=AZURE_OPENAI_API_VERSION, azure_ad_token_provider=AZURE_TOKEN_PROVIDER)
else:
openai_client = AzureOpenAI(azure_endpoint=f"https://{AZURE_OPENAI_RESOURCE}.openai.azure.com/", api_version=AZURE_OPENAI_API_VERSION, api_key=AZURE_OPENAI_KEY)
zedy-wj marked this conversation as resolved.
Show resolved Hide resolved

request_messages = request.json["messages"]
messages = [
Expand All @@ -210,8 +222,8 @@ def conversation_without_data(request):
"content": message["content"]
})

response = openai.ChatCompletion.create(
engine=AZURE_OPENAI_MODEL,
response = openai_client.chat.completions.create(
model=AZURE_OPENAI_MODEL,
messages = messages,
temperature=float(AZURE_OPENAI_TEMPERATURE),
max_tokens=int(AZURE_OPENAI_MAX_TOKENS),
Expand Down
27 changes: 27 additions & 0 deletions azure.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# yaml-language-server: $schema=https://raw.githubusercontent.com/Azure/azure-dev/main/schemas/v1.0/azure.yaml.json

name: chat-with-your-data-solution-accelerator
metadata:
template: chat-with-your-data-solution-accelerator@0.0.1-beta
services:
web:
project: .
language: py
host: appservice
hooks:
prepackage:
windows:
shell: pwsh
run: cd ./frontend;npm install;npm run build
interactive: true
continueOnError: false

adminweb:
project: ./backend
language: py
host: appservice

function:
project: ./backend
language: py
host: function
53 changes: 40 additions & 13 deletions backend/pages/01_Ingest_Data.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,9 @@
from datetime import datetime, timedelta
import logging
import requests
from azure.storage.blob import BlobServiceClient, generate_blob_sas, ContentSettings
from azure.identity import DefaultAzureCredential
from azure.keyvault.secrets import SecretClient
from azure.storage.blob import BlobServiceClient, generate_blob_sas, ContentSettings, UserDelegationKey
import urllib.parse
from utilities.helpers.ConfigHelper import ConfigHelper
from dotenv import load_dotenv
Expand All @@ -24,6 +26,18 @@
"""
st.markdown(mod_page_style, unsafe_allow_html=True)

def request_user_delegation_key(blob_service_client: BlobServiceClient) -> UserDelegationKey:
# Get a user delegation key that's valid for 1 day
delegation_key_start_time = datetime.utcnow()
delegation_key_expiry_time = delegation_key_start_time + timedelta(days=1)

user_delegation_key = blob_service_client.get_user_delegation_key(
key_start_time=delegation_key_start_time,
key_expiry_time=delegation_key_expiry_time
)

return user_delegation_key

def remote_convert_files_and_add_embeddings(process_all=False):
backend_url = urllib.parse.urljoin(os.getenv('BACKEND_URL','http://localhost:7071'), "/api/BatchStartProcessing")
params = {}
Expand Down Expand Up @@ -65,18 +79,31 @@ def upload_file(bytes_data: bytes, file_name: str, content_type: Optional[str] =
charset = f"; charset={chardet.detect(bytes_data)['encoding']}" if content_type == 'text/plain' else ''
content_type = content_type if content_type != None else 'text/plain'
account_name = os.getenv('AZURE_BLOB_ACCOUNT_NAME')
account_key = os.getenv('AZURE_BLOB_ACCOUNT_KEY')
container_name = os.getenv('AZURE_BLOB_CONTAINER_NAME')
if account_name == None or account_key == None or container_name == None:
raise ValueError("Please provide values for AZURE_BLOB_ACCOUNT_NAME, AZURE_BLOB_ACCOUNT_KEY and AZURE_BLOB_CONTAINER_NAME")
connect_str = f"DefaultEndpointsProtocol=https;AccountName={account_name};AccountKey={account_key};EndpointSuffix=core.windows.net"
blob_service_client : BlobServiceClient = BlobServiceClient.from_connection_string(connect_str)
# Create a blob client using the local file name as the name for the blob
blob_client = blob_service_client.get_blob_client(container=container_name, blob=file_name)
# Upload the created file
blob_client.upload_blob(bytes_data, overwrite=True, content_settings=ContentSettings(content_type=content_type+charset))
# Generate a SAS URL to the blob and return it
st.session_state['file_url'] = blob_client.url + '?' + generate_blob_sas(account_name, container_name, file_name,account_key=account_key, permission="r", expiry=datetime.utcnow() + timedelta(hours=3))
if os.environ.get("AUTH_TYPE") == 'rbac':
credential = DefaultAzureCredential()
account_url = f"https://{account_name}.blob.core.windows.net/"
blob_service_client = BlobServiceClient(account_url=account_url, credential=credential)
user_delegation_key = request_user_delegation_key(blob_service_client=blob_service_client)
container_name = os.getenv('AZURE_BLOB_CONTAINER_NAME')
blob_client = blob_service_client.get_blob_client(container=container_name, blob=file_name)
blob_client.upload_blob(bytes_data, overwrite=True, content_settings=ContentSettings(content_type=content_type+charset))
st.session_state['file_url'] = blob_client.url + '?' + generate_blob_sas(account_name, container_name, file_name,user_delegation_key=user_delegation_key, permission="r", expiry=datetime.utcnow() + timedelta(hours=3))
else:
if os.environ.get("USE_KEY_VAULT"):
credential = DefaultAzureCredential()
secret_client = SecretClient(os.environ.get("AZURE_KEY_VAULT_ENDPOINT"), credential)
account_key = secret_client.get_secret(os.getenv("AZURE_BLOB_ACCOUNT_KEY")).value if os.getenv("USE_KEY_VAULT") else os.getenv("AZURE_BLOB_ACCOUNT_KEY")
container_name = os.getenv('AZURE_BLOB_CONTAINER_NAME')
if account_name == None or account_key == None or container_name == None:
raise ValueError("Please provide values for AZURE_BLOB_ACCOUNT_NAME, AZURE_BLOB_ACCOUNT_KEY and AZURE_BLOB_CONTAINER_NAME")
connect_str = f"DefaultEndpointsProtocol=https;AccountName={account_name};AccountKey={account_key};EndpointSuffix=core.windows.net"
blob_service_client : BlobServiceClient = BlobServiceClient.from_connection_string(connect_str)
# Create a blob client using the local file name as the name for the blob
blob_client = blob_service_client.get_blob_client(container=container_name, blob=file_name)
# Upload the created file
blob_client.upload_blob(bytes_data, overwrite=True, content_settings=ContentSettings(content_type=content_type+charset))
# Generate a SAS URL to the blob and return it
st.session_state['file_url'] = blob_client.url + '?' + generate_blob_sas(account_name, container_name, file_name,account_key=account_key, permission="r", expiry=datetime.utcnow() + timedelta(hours=3))

try:
with st.expander("Add documents in Batch", expanded=True):
Expand Down
11 changes: 6 additions & 5 deletions backend/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
azure-functions

streamlit==1.24.0
openai==0.27.8
openai==1.6.1
matplotlib==3.6.3
plotly==5.12.0
scipy==1.10.0
Expand All @@ -10,17 +10,18 @@ transformers==4.30.0
python-dotenv==1.0.0
azure-ai-formrecognizer==3.2.0
azure-storage-blob==12.14.1
azure-identity==1.12.0
azure-ai-contentsafety==1.0.0b1
azure-identity==1.15.0
azure-ai-contentsafety==1.0.0
requests==2.31.0
tiktoken==0.2.0
azure-storage-queue==12.5.0
langchain==0.0.274
langchain==0.0.354
beautifulsoup4==4.12.0
fake-useragent==1.1.3
chardet==5.1.0
--extra-index-url https://pkgs.dev.azure.com/azure-sdk/public/_packaging/azure-sdk-for-python/pypi/simple/
azure-search-documents==11.4.0b8
opencensus-ext-azure==1.1.9
pandas==1.5.1
python-docx==0.8.11
python-docx==0.8.11
azure-keyvault-secrets==4.4.*
1 change: 0 additions & 1 deletion backend/utilities/QuestionHandler.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
import os
import openai
import logging
import re
import json
Expand Down
Loading