Skip to content

Commit

Permalink
add screenshot (Azure-Samples#870)
Browse files Browse the repository at this point in the history
  • Loading branch information
pamelafox authored Oct 27, 2023
1 parent a64a12e commit c989048
Show file tree
Hide file tree
Showing 4 changed files with 98 additions and 47 deletions.
35 changes: 2 additions & 33 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -289,38 +289,7 @@ Once in the web app:

This sample is designed to be a starting point for your own production application,
but you should do a thorough review of the security and performance before deploying
to production. Here are some things to consider:

* **OpenAI Capacity**: The default TPM (tokens per minute) is set to 30K. That is equivalent
to approximately 30 conversations per minute (assuming 1K per user message/response).
You can increase the capacity by changing the `chatGptDeploymentCapacity` and `embeddingDeploymentCapacity`
parameters in `infra/main.bicep` to your account's maximum capacity.
You can also view the Quotas tab in [Azure OpenAI studio](https://oai.azure.com/)
to understand how much capacity you have.
* **Azure Storage**: The default storage account uses the `Standard_LRS` SKU.
To improve your resiliency, we recommend using `Standard_ZRS` for production deployments,
which you can specify using the `sku` property under the `storage` module in `infra/main.bicep`.
* **Azure Cognitive Search**: The default search service uses the `Standard` SKU
with the free semantic search option, which gives you 1000 free queries a month.
Assuming your app will experience more than 1000 questions, you should either change `semanticSearch`
to "standard" or disable semantic search entirely in the `/app/backend/approaches` files.
If you see errors about search service capacity being exceeded, you may find it helpful to increase
the number of replicas by changing `replicaCount` in `infra/core/search/search-services.bicep`
or manually scaling it from the Azure Portal.
* **Azure App Service**: The default app service plan uses the `Basic` SKU with 1 CPU core and 1.75 GB RAM.
We recommend using a Premium level SKU, starting with 1 CPU core.
You can use auto-scaling rules or scheduled scaling rules,
and scale up the maximum/minimum based on load.
* **Authentication**: By default, the deployed app is publicly accessible.
We recommend restricting access to authenticated users.
See [Enabling authentication](#enabling-authentication) above for how to enable authentication.
* **Networking**: We recommend deploying inside a Virtual Network. If the app is only for
internal enterprise use, use a private DNS zone. Also consider using Azure API Management (APIM)
for firewalls and other forms of protection.
For more details, read [Azure OpenAI Landing Zone reference architecture](https://techcommunity.microsoft.com/t5/azure-architecture-blog/azure-openai-landing-zone-reference-architecture/ba-p/3882102).
* **Loadtesting**: We recommend running a loadtest for your expected number of users.
You can use the [locust tool](https://docs.locust.io/) with the `locustfile.py` in this sample
or set up a loadtest with Azure Load Testing.
to production. Read through our [productionizing guide](docs/productionizing.md) for more details.


## Resources
Expand Down Expand Up @@ -355,7 +324,7 @@ Chunking allows us to limit the amount of information we send to OpenAI due to t
<details><a id="ingestion-more-pdfs"></a>
<summary>How can we upload additional PDFs without redeploying everything?</summary>

To upload more PDFs, put them in the data/ folder and run `./scripts/prepdocs.sh` or `./scripts/prepdocs.ps1`.
To upload more PDFs, put them in the data/ folder and run `./scripts/prepdocs.sh` or `./scripts/prepdocs.ps1`.
A [recent change](https://github.com/Azure-Samples/azure-search-openai-demo/pull/835) added checks to see what's been uploaded before. The prepdocs script now writes an .md5 file with an MD5 hash of each file that gets uploaded. Whenever the prepdocs script is re-run, that hash is checked against the current hash and the file is skipped if it hasn't changed.
</details>

Expand Down
78 changes: 78 additions & 0 deletions docs/productionizing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@

# Productionizing the Chat App

This sample is designed to be a starting point for your own production application,
but you should do a thorough review of the security and performance before deploying
to production. Here are some things to consider:

## Azure resource configuration

* **OpenAI Capacity**: The default TPM (tokens per minute) is set to 30K. That is equivalent
to approximately 30 conversations per minute (assuming 1K per user message/response).
You can increase the capacity by changing the `chatGptDeploymentCapacity` and `embeddingDeploymentCapacity`
parameters in `infra/main.bicep` to your account's maximum capacity.
You can also view the Quotas tab in [Azure OpenAI studio](https://oai.azure.com/)
to understand how much capacity you have.
* **Azure Storage**: The default storage account uses the `Standard_LRS` SKU.
To improve your resiliency, we recommend using `Standard_ZRS` for production deployments,
which you can specify using the `sku` property under the `storage` module in `infra/main.bicep`.
* **Azure Cognitive Search**: The default search service uses the `Standard` SKU
with the free semantic search option, which gives you 1000 free queries a month.
Assuming your app will experience more than 1000 questions, you should either change `semanticSearch`
to "standard" or disable semantic search entirely in the `/app/backend/approaches` files.
If you see errors about search service capacity being exceeded, you may find it helpful to increase
the number of replicas by changing `replicaCount` in `infra/core/search/search-services.bicep`
or manually scaling it from the Azure Portal.
* **Azure App Service**: The default app service plan uses the `Basic` SKU with 1 CPU core and 1.75 GB RAM.
We recommend using a Premium level SKU, starting with 1 CPU core.
You can use auto-scaling rules or scheduled scaling rules,
and scale up the maximum/minimum based on load.

## Additional security measures

* **Authentication**: By default, the deployed app is publicly accessible.
We recommend restricting access to authenticated users.
See [Enabling authentication](../README.md#enabling-authentication) to learn how to enable authentication.
* **Networking**: We recommend deploying inside a Virtual Network. If the app is only for
internal enterprise use, use a private DNS zone. Also consider using Azure API Management (APIM)
for firewalls and other forms of protection.
For more details, read [Azure OpenAI Landing Zone reference architecture](https://techcommunity.microsoft.com/t5/azure-architecture-blog/azure-openai-landing-zone-reference-architecture/ba-p/3882102).

## Load testing

We recommend running a loadtest for your expected number of users.
You can use the [locust tool](https://docs.locust.io/) with the `locustfile.py` in this sample
or set up a loadtest with Azure Load Testing.

To use locust, first install the dev requirements that includes locust:

```shell
python3 -m pip install -r requirements-dev.txt
```

Or manually install locust:

```shell
python3 -m pip install locust
```

Then run the locust command:

```shell
locust
```

Open the locust UI at http://localhost:8089/, the URI displayed in the terminal.

Start a new test with the URI of your website, e.g. `https://my-chat-app.azurewebsites.net`.
Do *not* end the URI with a slash. You can start by pointing at your localhost if you're concerned
more about load on OpenAI/Cognitive Search than the host platform.

For the number of users and spawn rate, we recommend starting with 20 users and 1 users/second.
From there, you can keep increasing the number of users to simulate your expected load.

Here's an example loadtest for 50 users and a spawn rate of 1 per second:

![Screenshot of Locust charts showing 5 requests per second](screenshot_locust.png)

After each test, check the local or App Service logs to see if there are any errors.
Binary file added docs/screenshot_locust.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
32 changes: 18 additions & 14 deletions locustfile.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ def ask_question(self):
self.client.post(
"/chat",
json={
"history": [
"messages": [
{
"content": random.choice(
[
Expand All @@ -27,33 +27,37 @@ def ask_question(self):
"role": "user",
},
],
"overrides": {
"retrieval_mode": "hybrid",
"semantic_ranker": True,
"semantic_captions": False,
"top": 3,
"suggest_followup_questions": False,
"context": {
"overrides": {
"retrieval_mode": "hybrid",
"semantic_ranker": True,
"semantic_captions": False,
"top": 3,
"suggest_followup_questions": False,
},
},
},
)
time.sleep(5)
self.client.post(
"/chat",
json={
"history": [
"messages": [
{"content": "What happens in a performance review?", "role": "user"},
{
"content": "During a performance review, employees will receive feedback on their performance over the past year, including both successes and areas for improvement. The feedback will be provided by the employee's supervisor and is intended to help the employee develop and grow in their role [employee_handbook-3.pdf]. The review is a two-way dialogue between the employee and their manager, so employees are encouraged to be honest and open during the process [employee_handbook-3.pdf]. The employee will also have the opportunity to discuss their goals and objectives for the upcoming year [employee_handbook-3.pdf]. A written summary of the performance review will be provided to the employee, which will include a rating of their performance, feedback, and goals and objectives for the upcoming year [employee_handbook-3.pdf].",
"role": "assistant",
},
{"content": "Does my plan cover eye exams?", "role": "user"},
],
"overrides": {
"retrieval_mode": "hybrid",
"semantic_ranker": True,
"semantic_captions": False,
"top": 3,
"suggest_followup_questions": False,
"context": {
"overrides": {
"retrieval_mode": "hybrid",
"semantic_ranker": True,
"semantic_captions": False,
"top": 3,
"suggest_followup_questions": False,
},
},
},
)

0 comments on commit c989048

Please sign in to comment.