-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker Support and Improve RAG_Agent Implementation #17
base: main
Are you sure you want to change the base?
Conversation
9d86615
to
7ff3b9c
Compare
@chimosky do review |
@chimosky Check this implementation. It has 2 endpoints uses the models which we can decide and doesn't have dep issues. |
4b95440
to
87f4b96
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed, not tested.
You also seem to be overloading PRs, your title says one thing and then you add other things to it, they're easy to open and close.
Also resolve the merge conflicts here.
I am not sure why it is causing the merge conflict on README, requriements.txt as those needs to be over written as this is a big change. |
Sorry about the PRs thingy, I was expecting that it was about the change in RAG Implementation only but then we discussed about adding 2 API endpoints so added those, and to test those better added a UI for just convenience |
I am trying out a SLIM docker image to test out if the size changes. Will lyk about the size in Matrix |
The size is 13 gb on the latest dockerfile. |
eac1713
to
2d9305a
Compare
Forced commits are because I had to remove new line at EOF error and didn't want to add extra commit just because of that. |
rag_agent.py
Outdated
|
||
# Using HuggingFace embeddings | ||
embeddings = HuggingFaceEmbeddings() | ||
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will probably fail a flake8 check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You were right, I will make it pass in the next commit.
rag_agent.py
Outdated
| self.model | ||
| StrOutputParser() | ||
| self.prompt | ||
| (lambda x: "\n".join(msg.content for msg in x.to_messages()) if hasattr(x, "to_messages") else str(x)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These one liners are unnecessarily complex, and it'll be great to go for readability rather than complexity, especially when we can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, will make changes on these as well, they are although necessary for tokenizing properly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt the complexity I'm looking at is the only way to tokenize properly.
rag_agent.py
Outdated
parser = argparse.ArgumentParser(description="Pippy's AI-Coding Assistant") | ||
parser.add_argument('--model', type=str, choices=[ | ||
'bigscience/bloom-1b1', | ||
'facebook/opt-350m', | ||
'EleutherAI/gpt-neo-1.3B' | ||
], default='bigscience/bloom-1b1', help='Model name to use for text generation') | ||
parser.add_argument('--docs', nargs='+', default=[ | ||
'./docs/Pygame Documentation.pdf', | ||
'./docs/Python GTK+3 Documentation.pdf', | ||
'./docs/Sugar Toolkit Documentation.pdf' | ||
], help='List of document paths to load into the vector store') | ||
args = parser.parse_args() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seeing as there's going to be multiple lines here, it's best to aim for readability.
This seems a lot more readable.
parser = argparse.ArgumentParser(description="Pippy's AI-Coding Assistant")
parser.add_argument(
'--model',
type=str,
choices=[
'bigscience/bloom-1b1',
'facebook/opt-350m',
'EleutherAI/gpt-neo-1.3B'
],
default='bigscience/bloom-1b1',
help='Model name to use for text generation'
)
parser.add_argument(
'--docs',
nargs='+',
default=[
'./docs/Pygame Documentation.pdf',
'./docs/Python GTK+3 Documentation.pdf',
'./docs/Sugar Toolkit Documentation.pdf'
],
help='List of document paths to load into the vector store'
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alright I will change it to this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this, please look at both endpoints and be sure we cover just the direct dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I think so we cover only the necessary ones, I didn't freeze, would you like me to? or I could create a conda env
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, that's what got us the overload of independencies we're experiencing now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah fair!
test_rag_agent.py
Outdated
from rag_agent import RAG_Agent | ||
|
||
def test_rag_agent(): | ||
agent = RAG_Agent(model="bigscience/bloom-1b1") | ||
|
||
document_paths = [ | ||
'./docs/Pygame Documentation.pdf', | ||
'./docs/Python GTK+3 Documentation.pdf', | ||
'./docs/Sugar Toolkit Documentation.pdf' | ||
] | ||
agent.retriever = agent.setup_vectorstore(document_paths) | ||
|
||
question = "How do I create a Pygame window?" | ||
response = agent.run(question) | ||
|
||
print("Question:", question) | ||
print("Response:", response) | ||
|
||
if __name__ == "__main__": | ||
test_rag_agent() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a test, this just runs the agent and prints the output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my head it was just supposed to test the working of retrieval..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could've worked for testing locally, but it doesn't need to be here as it's not an actual test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this!
main.py
Outdated
# Initialize the RAG_Agent and its vector store (retriever). | ||
agent = RAG_Agent() | ||
agent.retriever = agent.setup_vectorstore([ | ||
'./docs/Pygame Documentation.pdf', | ||
'./docs/Python GTK+3 Documentation.pdf', | ||
'./docs/Sugar Toolkit Documentation.pdf' | ||
]) | ||
|
||
prompt = ''' | ||
Your task is to answer children's questions using simple language. | ||
Explain any difficult words in a way a 3-year-old can understand. | ||
Keep responses under 60 words. | ||
\n\nQuestion: | ||
''' | ||
@app.get("/") | ||
def root(): | ||
return {"message": "Welcome to Sugar-AI with FastAPI!"} | ||
|
||
input_text = prompt + question | ||
@app.post("/ask") | ||
def ask_question(question: str): | ||
answer = agent.run(question) | ||
return {"answer": answer} | ||
|
||
inputs = tokenizer.encode(input_text, return_tensors='pt') | ||
outputs = model.generate(inputs, max_length=150, num_return_sequences=1) | ||
answer = tokenizer.decode(outputs[0], skip_special_tokens=True) | ||
|
||
return answer | ||
if __name__ == "__main__": | ||
uvicorn.run(app, host="0.0.0.0", port=8000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This again just focuses on one part of sugar-ai, there's some unmerged work in #9, and the reason why routers seem appealing is because of the fact that it encourages modularity, everything can be independent of the other - makes it easier to debug and fix issues -.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just seems to add unnecessary complexity...and I have added llm_only.py for the working of LLM separately. If we just want API points, we don't really need to route them at least in this use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, although I think there's some things we'll need from that PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like? We have both the API points to run LLM locally and with RAG functionality.. not sure what else was required. It's just in a singular file rather than split up, at least the server, which makes sense kinda , as normally I am used to writing node server so I just am used to following this style of writing.
import streamlit as st | ||
import requests | ||
|
||
st.title("Sugar-AI Chat Interface") | ||
|
||
use_rag = st.checkbox("Use RAG (Retrieval-Augmented Generation)", value=True) | ||
|
||
st.subheader("Ask Sugar-AI") | ||
question = st.text_input("Enter your question:") | ||
|
||
if st.button("Submit"): | ||
if question: | ||
if use_rag: | ||
url = "http://localhost:8000/ask" | ||
else: | ||
url = "http://localhost:8000/ask-llm" | ||
params = {"question": question} | ||
try: | ||
response = requests.post(url, params=params) | ||
if response.status_code == 200: | ||
result = response.json() | ||
st.markdown("**Answer:** " + result["answer"]) | ||
else: | ||
st.error(f"Error {response.status_code}: {response.text}") | ||
except Exception as e: | ||
st.error(f"Error contacting the API: {e}") | ||
else: | ||
st.warning("Please enter a question.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we doing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Easy way to test both endpoints, gives us a UI to interact with both of them , just an easier way to test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree that it's easier to test as it just adds an extra dependency when we truly don't need to, this should go into the README rather than here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alright, I shall remove this file then, Altho I did not add this as a dependency , the user would have to install streamlit if and only if they want to run the UI. As mentioned in the README:
Using the Streamlit App
Sugar-AI also provides a Streamlit-based interface for quick interactions and visualizations.
Running the Streamlit App
-
Install Streamlit:
If you haven't already, install Streamlit:
pip install streamlit
-
Make sure server is running using:
uvicorn main:app --host 0.0.0.0 --port 8000
-
Start the App:
Launch the Streamlit app by running:
streamlit run streamlit.py
-
Using the App:
- The app provides a simple UI to input coding questions and displays the response using Sugar-AI.
- Use the sidebar options to configure settings if available.
- The app communicates with the FastAPI backend to process and retrieve answers.
Enjoy exploring Sugar-AI through both API endpoints and the interactive Streamlit interface!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is way better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool! thanks
Also, the first commit in this PR no longer exists, so you can update your branch. |
I am sorry, how do I do that btw? |
Do I just rebase and squash that commit? |
You can rebase and drop that commit, then rebase master onto your branch. The commit shouldn't be here in the first place because it did exist on main, but was changed as the commit message was amended. |
- uses HF models. - fixes ollama dep issue. - updated dependencies
- Initialize the RAG Agent and vector store in main.py - Add GET "/" endpoint with a welcome message - Add POST "/ask" endpoint to process questions via the RAG Agent - Update server startup using uvicorn =
- message handling functions written instead of lambda.
adf4d67
to
7bb5a74
Compare
Done |
The UI, I did not add as a dependency because it's the choice of the user!, so it won't increase dependency but would make our lives easies as writing in %20 would be handled by itself directly, so API calls can be tested easily. |
Now we don't have to worry about it's existence at all, we just say "Hey! You can do this too if you want to". Will test this again once I can. |
yes , and it's easier on the user end too! |
Tried running and it threw this error; tokenizer_config.json: 100%|████████████████████████████████████████| 3.07k/3.07k [00:00<00:00, 10.9MB/s]
tokenizer.json: 100%|███████████████████████████████████████████████| 7.03M/7.03M [00:01<00:00, 6.86MB/s]
config.json: 100%|██████████████████████████████████████████████████████| 679/679 [00:00<00:00, 3.32MB/s]
The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions.
Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.
Could not load bitsandbytes native library: /lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.32' not found (required by /usr/local/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so)
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 85, in <module>
lib = get_native_library()
File "/usr/local/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 72, in get_native_library
dll = ct.cdll.LoadLibrary(str(binary_path))
File "/usr/local/lib/python3.10/ctypes/__init__.py", line 452, in LoadLibrary
return self._dlltype(name)
File "/usr/local/lib/python3.10/ctypes/__init__.py", line 374, in __init__
self._handle = _dlopen(self._name, mode)
OSError: /lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.32' not found (required by /usr/local/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so)
CUDA is required but not available for bitsandbytes. Please consider installing the multi-platform enabled version of bitsandbytes, which is currently a work in progress. Please check currently supported platforms and installation instructions at https://huggingface.co/docs/bitsandbytes/main/en/installation#multi-backend
Traceback (most recent call last):
File "/usr/local/bin/uvicorn", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1161, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1082, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1443, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 788, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/uvicorn/main.py", line 412, in main
run(
File "/usr/local/lib/python3.10/site-packages/uvicorn/main.py", line 579, in run
server.run()
File "/usr/local/lib/python3.10/site-packages/uvicorn/server.py", line 66, in run
return asyncio.run(self.serve(sockets=sockets))
File "/usr/local/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
File "/usr/local/lib/python3.10/site-packages/uvicorn/server.py", line 70, in serve
await self._serve(sockets)
File "/usr/local/lib/python3.10/site-packages/uvicorn/server.py", line 77, in _serve
config.load()
File "/usr/local/lib/python3.10/site-packages/uvicorn/config.py", line 435, in load
self.loaded_app = import_from_string(self.app)
File "/usr/local/lib/python3.10/site-packages/uvicorn/importer.py", line 19, in import_from_string
module = importlib.import_module(module_str)
File "/usr/local/lib/python3.10/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 883, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "/app/main.py", line 23, in <module>
agent = RAG_Agent()
File "/app/rag_agent.py", line 62, in __init__
model_obj = AutoModelForCausalLM.from_pretrained(
File "/usr/local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
return model_class.from_pretrained(
File "/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 262, in _wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3698, in from_pretrained
hf_quantizer.validate_environment(
File "/usr/local/lib/python3.10/site-packages/transformers/quantizers/quantizer_bnb_4bit.py", line 83, in validate_environment
validate_bnb_backend_availability(raise_exception=True)
File "/usr/local/lib/python3.10/site-packages/transformers/integrations/bitsandbytes.py", line 559, in validate_bnb_backend_availability
return _validate_bnb_cuda_backend_availability(raise_exception)
File "/usr/local/lib/python3.10/site-packages/transformers/integrations/bitsandbytes.py", line 537, in _validate_bnb_cuda_backend_availability
raise RuntimeError(log_msg)
RuntimeError: CUDA is required but not available for bitsandbytes.
Please consider installing the multi-platform enabled version of bitsandbytes,
which is currently a work in progress.
Please check currently supported platforms and installation
instructions at https://huggingface.co/docs/bitsandbytes/main/en/installation#multi-backend Seems you didn't see this error while testing. |
|
Was this in docker? |
Were you trying to run this on CPU? Cuda is installed but is not required seems a bit weird. We can turn off quantization for CPU |
I updated it to use quantization only when CUDA is available because only then it makes sense, and added config so it's not longer deprecated. |
bf6a2b0
to
a5e1792
Compare
I have pushed this latest image on docker. |
Trying to build a multi stage docker as well to reduce the size. |
@MostlyKIGuess can you add logger this will help in future. |
Trying to build a multi stage docker as well to reduce the size.
Loggin what exactly? |
https://docs.python.org/3/library/logging.html . to save and give logs |
- improved space of the docker image. - goes down from 20 gb to 8.97 Gb
b873fb2
to
256f6a4
Compare
correct, I don't see how it would be useful tho, maybe we can have server logs if we have high traffic than potentially later on if we need it. Let's just focus on decreasing the size of docker and functionality as of now? |
Yes, this was in docker. I didn't think I had to point it out as the PR is about docker support and improvement.
Yes, trying to run on CPU.
Will try testing again. |
Summary
This PR introduces a Dockerfile to containerize Sugar-AI, ensuring a consistent and reproducible environment. Additionally, it modifies the RAG_Agent by refactoring its code, addressing dependency issues, and adding unit tests.
Changes