Port to Quart #503

pamelafox · 2023-08-06T00:03:39Z

Purpose

This PR ports the app to Quart, a Python ASGI web framework.
Flask is a WSGI framework, which is synchronous. That means that when a request comes in, the server will wait for the request to be processed before it can handle another request. We have added support for multiple gunicorn workers/threads, but they can still easily be blocked by a long-running request. Each /chat request makes 4 API calls, which take from 0.2-7 seconds each.

Quart is designed with the same API as Flask, with a few changes needed for asynchronous support. Most of my changes are the addition of await/async. For the Azure SDK calls, I used the async versions of the clients (imported from .aio). For the OpenAI API calls, I used the async versions of the functions (acreate instead of create).

For deployment on App Service, I use gunicorn with the uvicorn worker.

Does this introduce a breaking change?

[X] Yes
[ ] No

Pull Request Type

What kind of change does this Pull Request introduce?

[ ] Bugfix
[X] Feature
[ ] Code style update (formatting, local variables)
[X] Refactoring (no functional changes, no api changes)
[ ] Documentation content changes
[ ] Other... Please describe:

How to Test

app/start.sh should start the app
azd up should deploy the app
python3 -m pytest should run the tests

app/backend/approaches/chatreadretrieveread.py

app/backend/approaches/readretrieveread.py

srbalakr · 2023-08-06T21:25:34Z

app/backend/approaches/retrievethenread.py

                                          filter=filter,
-                                          query_type=QueryType.SEMANTIC,
+                                          query_type=QueryType.SIMPLE,


Why this change ?

Good catch, that was unintentionally committed, thats a change I make locally due to running out of semantic search credits this month. Reverted.

pamelafox · 2023-08-07T04:40:41Z

Btw, the creator of Quart took a look at this PR and says the Quart usage looks good to him:
https://fosstodon.org/@pgjones/110842739565496118

srbalakr · 2023-08-07T17:16:25Z

app/backend/approaches/__init__.py

can this be removed ?

This was something I added for mypy, it was confused without it. Generally Python modules should have an init.py
See https://docs.python.org/3/tutorial/modules.html#packages
Let me know if you think there's a reason to not have one.

got it. My impression was that it would be implicitly created on the runtime. Also app worked previously without it. Lets keep it if thats the good practice.

srbalakr · 2023-08-07T17:27:50Z

app/backend/approaches/chatreadretrieveread.py

        else:
-            results = [doc[self.sourcepage_field] + ": " + nonewlines(doc[self.content_field]) for doc in r]
+            results = [doc[self.sourcepage_field] + ": " + nonewlines(doc[self.content_field]) async for doc in r]


Curious, async in iterations, does it yield faster results ? Our list size does not exceeed 10 or 20 at max !

I assume it's not a big change in performance, but I had to make the change for API compatibility reasons once I started using the aio client. I've done a little loadtesting so far but not deep profiling, I need to try @anthony's Opentelemetry integration.

vrajroutu · 2023-08-08T06:34:20Z

@pamelafox Could you please merge the streaming feature into the quart branch? Streaming is significantly faster and begins generating responses immediately, as opposed to waiting for the entire response to be generated. Additionally, I've observed a few issues when enabling streaming alongside asynchronous processing

BitcoinCode · 2023-08-08T14:24:50Z

@pamelafox thank you , this is really useful, could you please shed some light on the following as this is important for me to understand clearly:

What are the 4 APIs that get called when user submits a question ?
How many approaches the application uses, to the best of my understanding it uses the chatreadretrieveread.py file ?
Is there any tech documentation for the applications that would help understand how the application works end to end ?

appreciate your help on this

BitcoinCode · 2023-08-09T16:20:11Z

I've implemented the Quart changes by @pamelafox but the performance is still the same if two users using the application it answers some questions then just stuck on generate answer... then the json error, the only thing I could think off is that the azure subscription is in EU but the repo deployed to east US, would this contribute to the issue ?

I have also upgraded the app service from B1 to B3 and scaled out the instances manually to 3 instances as the auto scale is greyed out

is there anything else I can do to improve the performance with the assumption that 30 users to be accessing the app around the same time ?

Failed to forward request to http://000.000.000.00:8000.
Encountered a System.Threading.Tasks.TaskCanceledException
exception after 300221.453ms with message:
The request was canceled due to the configured
HttpClient.Timeout of 300 seconds elapsing..
Check application logs to verify the application is properly handling HTTP traffic.

Appreciate your help.

pamelafox · 2023-08-09T18:03:35Z

I've been doing additional load testing on this branch, and though I think it's an improvement, there are still some hurdles to getting 0% errors.

First I ran into the issue described here: openai/openai-python#371

There were two suggested workarounds there, setting the session yourself and using tenacity to retry. I implemented the second suggestion and re-ran the loadtests. I no longer encountered the "Connection reset" error, but I still had errors with responses taking too long and timing out.

I realized that App Service has a hard limit of 230 seconds, which corresponded with when errors occurred:
https://learn.microsoft.com/en-us/troubleshoot/azure/app-service/web-apps-performance-faqs#why-does-my-request-time-out-after-230-seconds

I then tried specifying a request_timeout parameter for the OpenAI API request, but that caused every request to timeout, due to this issue:
openai/openai-python#387

My proposal for next steps is to implement retry on the client in JavaScript, but I am still discussing this with other Azure folks for ideas on optimal configuration.

pamelafox · 2023-08-09T18:27:19Z

@BitcoinCode What region is your OpenAI instance deployed in? My loadtests have been in South Central US currently, and I have been recommended to try one of these regions:
Australia East
Canada East
East United States 2
Japan East
United Kingdom South

I'll re-provision to one of those and re-run loadtests on this branch.

vrajroutu · 2023-08-09T18:57:41Z

@pamelafox I have also experimented with this branch. It performed satisfactorily for me, demonstrating noticeable improvements in performance. However, I am streaming the responses, which could pose challenges when using Quart and asynchronous requests.

pamelafox · 2023-08-09T18:59:47Z

@vrajroutu I'm working on adding streaming (as an opt-in option) to Quart in another branch, as you requested that. I'm curious what issues you foresee with adding streaming?
Also, glad to hear you've seen improvements with this branch!

BitcoinCode · 2023-08-09T21:10:49Z

@BitcoinCode What region is your OpenAI instance deployed in? My loadtests have been in South Central US currently, and I have been recommended to try one of these regions: Australia East Canada East East United States 2 Japan East United Kingdom South

I'll re-provision to one of those and re-run loadtests on this branch.

My Open AI instance is in East US, this is interesting as I have another Open AI instance on West Europe and this instance doesn't seem to have this issue even with Flask, I will implement Quart on it just in case.

Thank you.

BitcoinCode · 2023-08-09T21:31:50Z

@pamelafox I have also experimented with this branch. It performed satisfactorily for me, demonstrating noticeable improvements in performance. However, I am streaming the responses, which could pose challenges when using Quart and asynchronous requests.

would you please share the code for streaming the responses?

pamelafox · 2023-08-09T21:55:05Z

This is now merged. If you want to incorporate these changes in an existing branch, make sure you run azd up since this requires re-provisioning to update the appCommandLine. To avoid re-indexing docs, you can delete the postprovision hooks section from azure.yaml.

pamelafox · 2023-08-09T23:08:49Z

@BitcoinCode I will create a PR when the streaming functionality is ready. You can watch this repo to be notified.

BitcoinCode · 2023-08-18T08:56:47Z

What worked for me as error mitigation and improved users experience is the following:

1- I've added request_timeout=15 to all acreate functions
2- I've added a function to click the button before the answer error component

by doing the above the users are not seeing the error and the retry implemented automatically still in testing phase though but the results are a lot better as the request does not take 4 to 5 mins to timeout and the 15 seconds seems to be good enough to get a response .

ShantanuNair · 2023-09-11T11:33:37Z

I've been doing additional load testing on this branch, and though I think it's an improvement, there are still some hurdles to getting 0% errors.

First I ran into the issue described here: openai/openai-python#371

There were two suggested workarounds there, setting the session yourself and using tenacity to retry. I implemented the second suggestion and re-ran the loadtests. I no longer encountered the "Connection reset" error, but I still had errors with responses taking too long and timing out.

I realized that App Service has a hard limit of 230 seconds, which corresponded with when errors occurred: https://learn.microsoft.com/en-us/troubleshoot/azure/app-service/web-apps-performance-faqs#why-does-my-request-time-out-after-230-seconds

I then tried specifying a request_timeout parameter for the OpenAI API request, but that caused every request to timeout, due to this issue: openai/openai-python#387

My proposal for next steps is to implement retry on the client in JavaScript, but I am still discussing this with other Azure folks for ideas on optimal configuration.

This was super helpful @pamelafox I'm running into these issues via langchain, which uses the openai-python client. Can I ask - why didn't you go with the first solution of setting the context? And has this issue been solved in openai-python(I ask because of this PR)?

pamelafox · 2023-09-11T19:50:11Z

@ShantanuNair I did actually end up using the first approach of setting the session:

azure-search-openai-demo/app/backend/app.py

Line 101 in e914960

async with aiohttp.ClientSession() as s:

I didn't add it to the stream endpoint yet as it doesn't seem needed based on my loadtests so far, but I will keep testing.
I do not believe there has been a resolution in openai-python, based off what I've read of their changelog.

ShantanuNair · 2023-09-11T20:16:32Z

@pamelafox Appreciate your help! I thought the PR I linked, allows you to pass in a ClientSession, similar to what you're doing? You are setting a new session on every request right?

I then tried specifying a request_timeout parameter for the OpenAI API request, but that caused every request to timeout, due to this issue: https://github.com/openai/openai-python/pull/387

It makes me wonder how people are using the library in long running chains in production. Also, do you ever notice a tonne of 502 bad_gateway errors when you send multiple requests with a large prompt token count? I do with 11-12k token inputs on 3.5k-turbo-16k openai, and I assumed it's because of langchain's tenacity retries failing due to the openai issue#371.

* Quart draft * Fix ask and test * Quart deploying now * Use semantic * Get tests working * Revert simple * Typing fixes * dont use pipe

pamelafox added 5 commits August 4, 2023 04:40

Quart draft

8f286d7

Fix ask and test

b23eb61

Quart deploying now

cb6f2ed

Use semantic

0225d29

Get tests working

6b1ae1a

pamelafox marked this pull request as ready for review August 6, 2023 00:06

pamelafox requested review from tonybaloney and srbalakr August 6, 2023 00:06

tonybaloney requested changes Aug 6, 2023

View reviewed changes

app/backend/approaches/chatreadretrieveread.py Outdated Show resolved Hide resolved

app/backend/approaches/readretrieveread.py Show resolved Hide resolved

srbalakr reviewed Aug 6, 2023

View reviewed changes

pamelafox added 3 commits August 7, 2023 02:55

Revert simple

413b52e

Typing fixes

fb870cc

dont use pipe

67ec7e0

pamelafox requested review from pablocastro, tonybaloney and srbalakr August 7, 2023 04:37

tonybaloney approved these changes Aug 7, 2023

View reviewed changes

srbalakr requested a review from chuwik August 7, 2023 17:14

srbalakr reviewed Aug 7, 2023

View reviewed changes

srbalakr self-requested a review August 7, 2023 19:15

srbalakr approved these changes Aug 7, 2023

View reviewed changes

chuwik approved these changes Aug 8, 2023

View reviewed changes

This was referenced Aug 8, 2023

The app keeps crashing #504

Closed

SyntaxError: Unexpected non-whitespace character after JSON at position 6 #420

Closed

pamelafox merged commit 9da71ef into Azure-Samples:main Aug 9, 2023

pamelafox deleted the quart branch August 9, 2023 21:55

ShantanuNair mentioned this pull request Sep 14, 2023

Connection reset from long-running or stale API connections openai/openai-python#371

Closed

HughRunyan pushed a commit to RMI/RMI_chatbot that referenced this pull request Mar 26, 2024

Port to Quart (Azure-Samples#503)

f1d7135

* Quart draft * Fix ask and test * Quart deploying now * Use semantic * Get tests working * Revert simple * Typing fixes * dont use pipe

vuculescu pushed a commit to vuculescu/azure-search-openai-demo that referenced this pull request Jan 27, 2025

Port to Quart (Azure-Samples#503)

885eedf

* Quart draft * Fix ask and test * Quart deploying now * Use semantic * Get tests working * Revert simple * Typing fixes * dont use pipe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port to Quart #503

Port to Quart #503

pamelafox commented Aug 6, 2023

srbalakr Aug 6, 2023

pamelafox Aug 7, 2023

pamelafox commented Aug 7, 2023

srbalakr Aug 7, 2023

pamelafox Aug 7, 2023

srbalakr Aug 7, 2023

srbalakr Aug 7, 2023

pamelafox Aug 7, 2023

vrajroutu commented Aug 8, 2023 •

edited

Loading

BitcoinCode commented Aug 8, 2023

BitcoinCode commented Aug 9, 2023

pamelafox commented Aug 9, 2023

pamelafox commented Aug 9, 2023

vrajroutu commented Aug 9, 2023 •

edited

Loading

pamelafox commented Aug 9, 2023

BitcoinCode commented Aug 9, 2023

BitcoinCode commented Aug 9, 2023

pamelafox commented Aug 9, 2023

pamelafox commented Aug 9, 2023

BitcoinCode commented Aug 18, 2023

ShantanuNair commented Sep 11, 2023

pamelafox commented Sep 11, 2023

ShantanuNair commented Sep 11, 2023

Port to Quart #503

Port to Quart #503

Conversation

pamelafox commented Aug 6, 2023

Purpose

Does this introduce a breaking change?

Pull Request Type

How to Test

srbalakr Aug 6, 2023

Choose a reason for hiding this comment

pamelafox Aug 7, 2023

Choose a reason for hiding this comment

pamelafox commented Aug 7, 2023

srbalakr Aug 7, 2023

Choose a reason for hiding this comment

pamelafox Aug 7, 2023

Choose a reason for hiding this comment

srbalakr Aug 7, 2023

Choose a reason for hiding this comment

srbalakr Aug 7, 2023

Choose a reason for hiding this comment

pamelafox Aug 7, 2023

Choose a reason for hiding this comment

vrajroutu commented Aug 8, 2023 • edited Loading

BitcoinCode commented Aug 8, 2023

BitcoinCode commented Aug 9, 2023

pamelafox commented Aug 9, 2023

pamelafox commented Aug 9, 2023

vrajroutu commented Aug 9, 2023 • edited Loading

pamelafox commented Aug 9, 2023

BitcoinCode commented Aug 9, 2023

BitcoinCode commented Aug 9, 2023

pamelafox commented Aug 9, 2023

pamelafox commented Aug 9, 2023

BitcoinCode commented Aug 18, 2023

ShantanuNair commented Sep 11, 2023

pamelafox commented Sep 11, 2023

ShantanuNair commented Sep 11, 2023

vrajroutu commented Aug 8, 2023 •

edited

Loading

vrajroutu commented Aug 9, 2023 •

edited

Loading