-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add custom redis cache #33
Merged
krrishdholakia
merged 28 commits into
BerriAI:ishaan-allow-custom-vectorDBs
from
th33ngineers:ishaan-allow-custom-vectorDBs
Aug 2, 2023
Merged
Changes from all commits
Commits
Show all changes
28 commits
Select commit
Hold shift + click to select a range
4a58274
azure model fallback - same resource
krrishdholakia 9db094b
Merge branch 'main' of https://github.com/BerriAI/reliableGPT
krrishdholakia b50013e
Update README.md
krrishdholakia faa7fcc
updates
krrishdholakia 5567c27
Merge branch 'main' of https://github.com/BerriAI/reliableGPT
krrishdholakia c1fae32
for testing purposes
krrishdholakia 2d14e98
fix logging
ishaan-jaff f333468
adding cache decorator
krrishdholakia 49e9520
adding cache decorator
krrishdholakia 97852dd
Update README.md
krrishdholakia 1b41439
Fix merge conflict
shiftsayan 843dbe6
Keep higher version number for compatibility
shiftsayan 9d7df55
Merge pull request #32 from shiftsayan/sayan/fix-setup
ishaan-jaff ba85e8b
chore(formatting): fix isort
grski da50f6f
feat(redis-cache): allow custom cache storage - redis
grski a4d64a7
feat(generic-cache): refactor current cache code and add generic cach…
grski 619faaa
Add files via upload
krrishdholakia 69e26e5
load balancer demo
ishaan-jaff 1c9be73
Delete UseModelFallBacks.ipynb
ishaan-jaff 1eecd27
Add files via upload
ishaan-jaff 01f94d8
Rename UseModelFallBacks (1).ipynb to UseModelFallBacks.ipynb
ishaan-jaff 5153979
[WIP] Key Management
krrishdholakia e180914
[WIP] key management logic
krrishdholakia c6a78f7
feat(envs): add env handling based on .env file
grski 6e27f04
Merge pull request #34 from th33ngineers/feat/envs
ishaan-jaff 5b161f9
fix(tests): fix the tests - missing import, etc.
grski 90f0d5a
merge
grski e06ad1d
merge(main)
grski File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,7 +14,6 @@ isort: | |
lint: | ||
make black | ||
make ruff | ||
make isort | ||
|
||
pre-commit: | ||
pre-commit run --all-files | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Large diffs are not rendered by default.
Oops, something went wrong.
173 changes: 173 additions & 0 deletions
173
...(in_memory)_+_hosted_cache_to_prevent_dropped_customer_requests_in_prod_(LLM_Apps)_.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,173 @@ | ||
{ | ||
"nbformat": 4, | ||
"nbformat_minor": 0, | ||
"metadata": { | ||
"colab": { | ||
"provenance": [] | ||
}, | ||
"kernelspec": { | ||
"name": "python3", | ||
"display_name": "Python 3" | ||
}, | ||
"language_info": { | ||
"name": "python" | ||
} | ||
}, | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"# Using a hot (in-memory) + hosted cache to prevent dropped customer requests in prod (LLM Apps)." | ||
], | ||
"metadata": { | ||
"id": "ICOXlbjlMh9c" | ||
} | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"# Create a class to manage your Caching logic" | ||
], | ||
"metadata": { | ||
"id": "3dynjvovLwaA" | ||
} | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"id": "6YM2U0ScLTfo" | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"from typing import Any\n", | ||
"import threading\n", | ||
"from threading import active_count\n", | ||
"import requests\n", | ||
"import traceback\n", | ||
"from flask import Flask, request\n", | ||
"import json\n", | ||
"from fuzzywuzzy import process\n", | ||
"\n", | ||
"class reliableCache:\n", | ||
" def __init__(self, query_arg=None, customer_instance_arg=None, user_email=None, similarity_threshold=0.65, max_threads=100, verbose=False) -> None:\n", | ||
" self.max_threads = max_threads\n", | ||
" self.verbose = verbose\n", | ||
" self.query_arg = query_arg\n", | ||
" self.customer_instance_arg = customer_instance_arg\n", | ||
" self.user_email = user_email\n", | ||
" self.threshold = similarity_threshold\n", | ||
" self.cache_wrapper_threads = {}\n", | ||
" self.hot_cache = {}\n", | ||
" pass\n", | ||
"\n", | ||
" def print_verbose(self, print_statement):\n", | ||
" if self.verbose:\n", | ||
" print(\"Cached Request: \" + str(print_statement))\n", | ||
"\n", | ||
" def add_cache(self, user_email, instance_id, input_prompt, response):\n", | ||
" try:\n", | ||
" self.print_verbose(f\"result being stored in cache: {response}\")\n", | ||
" url = \"YOUR_HOSTED_CACHE_ENDPOINT/add_cache\"\n", | ||
" querystring = {\n", | ||
" \"customer_id\": \"temp5@xyz.com\",\n", | ||
" \"instance_id\": instance_id,\n", | ||
" \"user_email\": user_email,\n", | ||
" \"input_prompt\": input_prompt,\n", | ||
" \"response\": json.dumps({\"response\": response})\n", | ||
" }\n", | ||
" response = requests.post(url, params=querystring)\n", | ||
" except:\n", | ||
" pass\n", | ||
"\n", | ||
" def try_cache_request(self, user_email, instance_id, query=None):\n", | ||
" try:\n", | ||
" url = \"YOUR_HOSTED_CACHE_ENDPOINT/get_cache\"\n", | ||
" querystring = {\n", | ||
" \"customer_id\": \"temp5@xyz.com\",\n", | ||
" \"instance_id\": instance_id,\n", | ||
" \"user_email\": user_email,\n", | ||
" \"input_prompt\": query,\n", | ||
" \"threshold\": self.threshold\n", | ||
" }\n", | ||
" response = requests.get(url, params=querystring)\n", | ||
" self.print_verbose(f\"response: {response.text}\")\n", | ||
" extracted_result = response.json()[\"response\"]\n", | ||
" print(f\"extracted_result: {extracted_result} \\n\\n original response: {response.json()}\")\n", | ||
" return extracted_result\n", | ||
" except:\n", | ||
" pass\n", | ||
" self.print_verbose(f\"cache miss!\")\n", | ||
" return None\n", | ||
"\n", | ||
" def cache_wrapper(self, func):\n", | ||
" def wrapper(*args, **kwargs):\n", | ||
" query = request.args.get(\"query\") # the customer question\n", | ||
" instance_id = request.args.get(self.customer_instance_arg) # the unique instance to put that customer query/response in\n", | ||
" try:\n", | ||
" if (self.user_email, instance_id) in self.hot_cache:\n", | ||
" choices = self.hot_cache[(self.user_email, instance_id)]\n", | ||
" most_similar_query = process.extractOne(query, choices)\n", | ||
" if most_similar_query[1] > 70:\n", | ||
" result = self.hot_cache[(self.user_email, instance_id, most_similar_query[0])]\n", | ||
" return result\n", | ||
" else:\n", | ||
" result = func(*args, **kwargs)\n", | ||
" # add response to cache\n", | ||
" self.add_cache(self.user_email, instance_id=instance_id, input_prompt=query, response=result)\n", | ||
" except Exception as e:\n", | ||
" cache_result = self.try_cache_request(user_email=self.user_email, instance_id=instance_id, query=query)\n", | ||
" if cache_result:\n", | ||
" print(\"cache hit!\")\n", | ||
" self.hot_cache[(self.user_email, instance_id, query)] = cache_result\n", | ||
" if (self.user_email, instance_id) not in self.hot_cache:\n", | ||
" self.hot_cache[(self.user_email, instance_id)] = []\n", | ||
" self.hot_cache[(self.user_email, instance_id)].append(query)\n", | ||
" return cache_result\n", | ||
" else:\n", | ||
" print(\"Cache miss!\")\n", | ||
" raise e\n", | ||
" self.print_verbose(f\"final result: {result}\")\n", | ||
" return result\n", | ||
" return wrapper\n", | ||
"\n", | ||
" def get_wrapper_thread_utilization(self):\n", | ||
" self.print_verbose(f\"cache wrapper thread values: {self.cache_wrapper_threads.values()}\")\n", | ||
" active_cache_threads = 0\n", | ||
" for value in self.cache_wrapper_threads.values():\n", | ||
" if value == True:\n", | ||
" active_cache_threads += 1\n", | ||
" # active_cache_threads = sum(self.cache_wrapper_threads.values())\n", | ||
" self.print_verbose(f\"active_cache_threads: {active_cache_threads}\")\n", | ||
" return active_cache_threads / self.max_threads" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"# Wrap your query endpoint with it\n", | ||
"\n", | ||
"In our caching class, we designed our cache to be a decorator (cache_wrapper). This wraps the `berri_query` endpoint, and in case we return errors, this will catch that and return a cached response instead. We also check each request against a local / hot cache, to reduce dropped requests (useful during high-traffic scenarios)." | ||
], | ||
"metadata": { | ||
"id": "RGg6xwGGL2Ih" | ||
} | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"source": [ | ||
"@app.route(\"/berri_query\")\n", | ||
"@cache.cache_wrapper\n", | ||
"def berri_query():\n", | ||
" print('Request receieved: ', request)\n", | ||
" # your execution logic\n", | ||
" pass" | ||
], | ||
"metadata": { | ||
"id": "YZADDSXPL41g" | ||
}, | ||
"execution_count": null, | ||
"outputs": [] | ||
} | ||
] | ||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in ci/cd all extras should be included