Docker Support and Improve RAG_Agent Implementation #17

MostlyKIGuess · 2025-02-07T16:47:27Z

Summary

This PR introduces a Dockerfile to containerize Sugar-AI, ensuring a consistent and reproducible environment. Additionally, it modifies the RAG_Agent by refactoring its code, addressing dependency issues, and adding unit tests.

Changes

Added Dockerfile for containerization.
Updated README with instructions for running Sugar-AI using Docker.
Refactored RAG_Agent to:
- Utilize Hugging Face (HF) models for better performance.
- Fix Ollama dependency issues.
- Update relevant dependencies for compatibility.
Added unit tests to validate RAG_Agent functionality.

MostlyKIGuess · 2025-02-11T20:03:03Z

@chimosky do review

Dockerfile

MostlyKIGuess · 2025-02-13T18:55:02Z

@chimosky Check this implementation. It has 2 endpoints uses the models which we can decide and doesn't have dep issues.

chimosky

Reviewed, not tested.

You also seem to be overloading PRs, your title says one thing and then you add other things to it, they're easy to open and close.

Also resolve the merge conflicts here.

rag_agent.py

requirements.txt

MostlyKIGuess · 2025-02-14T15:44:52Z

Reviewed, not tested.

You also seem to be overloading PRs, your title says one thing and then you add other things to it, they're easy to open and close.

Also resolve the merge conflicts here.

I am not sure why it is causing the merge conflict on README, requriements.txt as those needs to be over written as this is a big change.
What should I do regarding that case?

MostlyKIGuess · 2025-02-14T15:46:12Z

Sorry about the PRs thingy, I was expecting that it was about the change in RAG Implementation only but then we discussed about adding 2 API endpoints so added those, and to test those better added a UI for just convenience

MostlyKIGuess · 2025-02-14T15:48:07Z

I am trying out a SLIM docker image to test out if the size changes. Will lyk about the size in Matrix

MostlyKIGuess · 2025-02-14T17:20:25Z

The size is 13 gb on the latest dockerfile.

MostlyKIGuess · 2025-02-14T18:10:41Z

Forced commits are because I had to remove new line at EOF error and didn't want to add extra commit just because of that.

MostlyKIGuess · 2025-02-14T18:22:28Z

It has a size of 6.42 ( compressed) . Decompression size:

chimosky · 2025-02-21T13:18:41Z

rag_agent.py

-
-        # Using HuggingFace embeddings
-        embeddings = HuggingFaceEmbeddings()
+        embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")


This will probably fail a flake8 check.

You were right, I will make it pass in the next commit.

chimosky · 2025-02-21T13:20:02Z

rag_agent.py

-            | self.model   
-            | StrOutputParser()  
+            | self.prompt
+            | (lambda x: "\n".join(msg.content for msg in x.to_messages()) if hasattr(x, "to_messages") else str(x))


These one liners are unnecessarily complex, and it'll be great to go for readability rather than complexity, especially when we can.

Alright, will make changes on these as well, they are although necessary for tokenizing properly

I doubt the complexity I'm looking at is the only way to tokenize properly.

chimosky · 2025-02-21T13:25:50Z

rag_agent.py

+    parser = argparse.ArgumentParser(description="Pippy's AI-Coding Assistant")
+    parser.add_argument('--model', type=str, choices=[
+        'bigscience/bloom-1b1',
+        'facebook/opt-350m',
+        'EleutherAI/gpt-neo-1.3B'
+    ], default='bigscience/bloom-1b1', help='Model name to use for text generation')
+    parser.add_argument('--docs', nargs='+', default=[
+        './docs/Pygame Documentation.pdf',
+        './docs/Python GTK+3 Documentation.pdf',
+        './docs/Sugar Toolkit Documentation.pdf'
+    ], help='List of document paths to load into the vector store')
+    args = parser.parse_args()


Seeing as there's going to be multiple lines here, it's best to aim for readability.

This seems a lot more readable.

parser = argparse.ArgumentParser(description="Pippy's AI-Coding Assistant") parser.add_argument( '--model', type=str, choices=[ 'bigscience/bloom-1b1', 'facebook/opt-350m', 'EleutherAI/gpt-neo-1.3B' ], default='bigscience/bloom-1b1', help='Model name to use for text generation' ) parser.add_argument( '--docs', nargs='+', default=[ './docs/Pygame Documentation.pdf', './docs/Python GTK+3 Documentation.pdf', './docs/Sugar Toolkit Documentation.pdf' ], help='List of document paths to load into the vector store' )

alright I will change it to this

chimosky · 2025-02-21T13:26:45Z

requirements.txt

For this, please look at both endpoints and be sure we cover just the direct dependencies.

yeah I think so we cover only the necessary ones, I didn't freeze, would you like me to? or I could create a conda env

Nah, that's what got us the overload of independencies we're experiencing now.

chimosky · 2025-02-21T13:27:39Z

test_rag_agent.py

+from rag_agent import RAG_Agent
+
+def test_rag_agent():
+    agent = RAG_Agent(model="bigscience/bloom-1b1")
+
+    document_paths = [
+        './docs/Pygame Documentation.pdf',
+        './docs/Python GTK+3 Documentation.pdf',
+        './docs/Sugar Toolkit Documentation.pdf'
+    ]
+    agent.retriever = agent.setup_vectorstore(document_paths)
+
+    question = "How do I create a Pygame window?"
+    response = agent.run(question)
+
+    print("Question:", question)
+    print("Response:", response)
+
+if __name__ == "__main__":
+    test_rag_agent()


This is not a test, this just runs the agent and prints the output.

In my head it was just supposed to test the working of retrieval..

It could've worked for testing locally, but it doesn't need to be here as it's not an actual test.

Removed this!

chimosky · 2025-02-21T13:32:13Z

main.py

+# Initialize the RAG_Agent and its vector store (retriever).
+agent = RAG_Agent()
+agent.retriever = agent.setup_vectorstore([
+    './docs/Pygame Documentation.pdf',
+    './docs/Python GTK+3 Documentation.pdf',
+    './docs/Sugar Toolkit Documentation.pdf'
+])

-        prompt = '''
-        Your task is to answer children's questions using simple language.
-        Explain any difficult words in a way a 3-year-old can understand.
-        Keep responses under 60 words.
-        \n\nQuestion:
-        '''
+@app.get("/")
+def root():
+    return {"message": "Welcome to Sugar-AI with FastAPI!"}

-        input_text = prompt + question
+@app.post("/ask")
+def ask_question(question: str):
+    answer = agent.run(question)
+    return {"answer": answer}

-        inputs = tokenizer.encode(input_text, return_tensors='pt')
-        outputs = model.generate(inputs, max_length=150, num_return_sequences=1)
-        answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
-
-        return answer
+if __name__ == "__main__":
+    uvicorn.run(app, host="0.0.0.0", port=8000)


This again just focuses on one part of sugar-ai, there's some unmerged work in #9, and the reason why routers seem appealing is because of the fact that it encourages modularity, everything can be independent of the other - makes it easier to debug and fix issues -.

It just seems to add unnecessary complexity...and I have added llm_only.py for the working of LLM separately. If we just want API points, we don't really need to route them at least in this use case.

True, although I think there's some things we'll need from that PR.

Like? We have both the API points to run LLM locally and with RAG functionality.. not sure what else was required. It's just in a singular file rather than split up, at least the server, which makes sense kinda , as normally I am used to writing node server so I just am used to following this style of writing.

chimosky · 2025-02-21T13:33:24Z

streamlit.py

+import streamlit as st
+import requests
+
+st.title("Sugar-AI Chat Interface")
+
+use_rag = st.checkbox("Use RAG (Retrieval-Augmented Generation)", value=True)
+
+st.subheader("Ask Sugar-AI")
+question = st.text_input("Enter your question:")
+
+if st.button("Submit"):
+    if question:
+        if use_rag:
+            url = "http://localhost:8000/ask"
+        else:
+            url = "http://localhost:8000/ask-llm"
+        params = {"question": question}
+        try:
+            response = requests.post(url, params=params)
+            if response.status_code == 200:
+                result = response.json()
+                st.markdown("**Answer:** " + result["answer"])
+            else:
+                st.error(f"Error {response.status_code}: {response.text}")
+        except Exception as e:
+            st.error(f"Error contacting the API: {e}")
+    else:
+        st.warning("Please enter a question.")


Why are we doing this?

Easy way to test both endpoints, gives us a UI to interact with both of them , just an easier way to test.

I disagree that it's easier to test as it just adds an extra dependency when we truly don't need to, this should go into the README rather than here.

alright, I shall remove this file then, Altho I did not add this as a dependency , the user would have to install streamlit if and only if they want to run the UI. As mentioned in the README:

Using the Streamlit App

Sugar-AI also provides a Streamlit-based interface for quick interactions and visualizations.

Running the Streamlit App

Install Streamlit:

If you haven't already, install Streamlit:

pip install streamlit

Make sure server is running using:

uvicorn main:app --host 0.0.0.0 --port 8000

Start the App:

Launch the Streamlit app by running:

streamlit run streamlit.py

Using the App:

The app provides a simple UI to input coding questions and displays the response using Sugar-AI.

Use the sidebar options to configure settings if available.

The app communicates with the FastAPI backend to process and retrieve answers.

Enjoy exploring Sugar-AI through both API endpoints and the interactive Streamlit interface!

This is way better.

cool! thanks

chimosky · 2025-02-21T13:47:21Z

Also, the first commit in this PR no longer exists, so you can update your branch.

MostlyKIGuess · 2025-02-21T16:22:45Z

Also, the first commit in this PR no longer exists, so you can update your branch.

I am sorry, how do I do that btw?

MostlyKIGuess · 2025-02-21T16:23:06Z

Do I just rebase and squash that commit?

chimosky · 2025-02-21T20:51:51Z

Also, the first commit in this PR no longer exists, so you can update your branch.

I am sorry, how do I do that btw?

You can rebase and drop that commit, then rebase master onto your branch.

The commit shouldn't be here in the first place because it did exist on main, but was changed as the commit message was amended.

- uses HF models. - fixes ollama dep issue. - updated dependencies

- Initialize the RAG Agent and vector store in main.py - Add GET "/" endpoint with a welcome message - Add POST "/ask" endpoint to process questions via the RAG Agent - Update server startup using uvicorn =

- message handling functions written instead of lambda.

MostlyKIGuess · 2025-02-21T20:57:58Z

Also, the first commit in this PR no longer exists, so you can update your branch.

I am sorry, how do I do that btw?

You can rebase and drop that commit, then rebase master onto your branch.

The commit shouldn't be here in the first place because it did exist on main, but was changed as the commit message was amended.

Done

MostlyKIGuess · 2025-02-21T21:02:20Z

The UI, I did not add as a dependency because it's the choice of the user!, so it won't increase dependency but would make our lives easies as writing in %20 would be handled by itself directly, so API calls can be tested easily.

chimosky · 2025-02-24T12:28:37Z

The UI, I did not add as a dependency because it's the choice of the user!, so it won't increase dependency but would make our lives easies as writing in %20 would be handled by itself directly, so API calls can be tested easily.

Now we don't have to worry about it's existence at all, we just say "Hey! You can do this too if you want to".

Will test this again once I can.

MostlyKIGuess · 2025-02-24T12:45:02Z

Now we don't have to worry about it's existence at all, we just say "Hey! You can do this too if you want to".

Will test this again once I can.

yes , and it's easier on the user end too!

chimosky · 2025-02-24T22:52:38Z

Tried running and it threw this error;

tokenizer_config.json: 100%|████████████████████████████████████████| 3.07k/3.07k [00:00<00:00, 10.9MB/s]
tokenizer.json: 100%|███████████████████████████████████████████████| 7.03M/7.03M [00:01<00:00, 6.86MB/s]
config.json: 100%|██████████████████████████████████████████████████████| 679/679 [00:00<00:00, 3.32MB/s]
The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. 
Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.
Could not load bitsandbytes native library: /lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.32' not found (required by /usr/local/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so)
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 85, in <module>
    lib = get_native_library()
  File "/usr/local/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 72, in get_native_library
    dll = ct.cdll.LoadLibrary(str(binary_path))
  File "/usr/local/lib/python3.10/ctypes/__init__.py", line 452, in LoadLibrary
    return self._dlltype(name)
  File "/usr/local/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.32' not found (required by /usr/local/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so)
CUDA is required but not available for bitsandbytes. Please consider installing the multi-platform enabled version of bitsandbytes, which is currently a work in progress. Please check currently supported platforms and installation instructions at https://huggingface.co/docs/bitsandbytes/main/en/installation#multi-backend
Traceback (most recent call last):
  File "/usr/local/bin/uvicorn", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1082, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/uvicorn/main.py", line 412, in main
    run(
  File "/usr/local/lib/python3.10/site-packages/uvicorn/main.py", line 579, in run
    server.run()
  File "/usr/local/lib/python3.10/site-packages/uvicorn/server.py", line 66, in run
    return asyncio.run(self.serve(sockets=sockets))
  File "/usr/local/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
  File "/usr/local/lib/python3.10/site-packages/uvicorn/server.py", line 70, in serve
    await self._serve(sockets)
  File "/usr/local/lib/python3.10/site-packages/uvicorn/server.py", line 77, in _serve
    config.load()
  File "/usr/local/lib/python3.10/site-packages/uvicorn/config.py", line 435, in load
    self.loaded_app = import_from_string(self.app)
  File "/usr/local/lib/python3.10/site-packages/uvicorn/importer.py", line 19, in import_from_string
    module = importlib.import_module(module_str)
  File "/usr/local/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/app/main.py", line 23, in <module>
    agent = RAG_Agent()
  File "/app/rag_agent.py", line 62, in __init__
    model_obj = AutoModelForCausalLM.from_pretrained(
  File "/usr/local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
    return model_class.from_pretrained(
  File "/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 262, in _wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3698, in from_pretrained
    hf_quantizer.validate_environment(
  File "/usr/local/lib/python3.10/site-packages/transformers/quantizers/quantizer_bnb_4bit.py", line 83, in validate_environment
    validate_bnb_backend_availability(raise_exception=True)
  File "/usr/local/lib/python3.10/site-packages/transformers/integrations/bitsandbytes.py", line 559, in validate_bnb_backend_availability
    return _validate_bnb_cuda_backend_availability(raise_exception)
  File "/usr/local/lib/python3.10/site-packages/transformers/integrations/bitsandbytes.py", line 537, in _validate_bnb_cuda_backend_availability
    raise RuntimeError(log_msg)
RuntimeError: CUDA is required but not available for bitsandbytes. 
Please consider installing the multi-platform enabled version of bitsandbytes,
which is currently a work in progress.
Please check currently supported platforms and installation
instructions at https://huggingface.co/docs/bitsandbytes/main/en/installation#multi-backend

Seems you didn't see this error while testing.

MostlyKIGuess · 2025-02-25T04:09:24Z

Tried running and it threw this error;

tokenizer_config.json: 100%|████████████████████████████████████████| 3.07k/3.07k [00:00<00:00, 10.9MB/s]
tokenizer.json: 100%|███████████████████████████████████████████████| 7.03M/7.03M [00:01<00:00, 6.86MB/s]
config.json: 100%|██████████████████████████████████████████████████████| 679/679 [00:00<00:00, 3.32MB/s]
The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. 
Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.
Could not load bitsandbytes native library: /lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.32' not found (required by /usr/local/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so)
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 85, in <module>
    lib = get_native_library()
  File "/usr/local/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 72, in get_native_library
    dll = ct.cdll.LoadLibrary(str(binary_path))
  File "/usr/local/lib/python3.10/ctypes/__init__.py", line 452, in LoadLibrary
    return self._dlltype(name)
  File "/usr/local/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.32' not found (required by /usr/local/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so)
CUDA is required but not available for bitsandbytes. Please consider installing the multi-platform enabled version of bitsandbytes, which is currently a work in progress. Please check currently supported platforms and installation instructions at https://huggingface.co/docs/bitsandbytes/main/en/installation#multi-backend
Traceback (most recent call last):
  File "/usr/local/bin/uvicorn", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1082, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/uvicorn/main.py", line 412, in main
    run(
  File "/usr/local/lib/python3.10/site-packages/uvicorn/main.py", line 579, in run
    server.run()
  File "/usr/local/lib/python3.10/site-packages/uvicorn/server.py", line 66, in run
    return asyncio.run(self.serve(sockets=sockets))
  File "/usr/local/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
  File "/usr/local/lib/python3.10/site-packages/uvicorn/server.py", line 70, in serve
    await self._serve(sockets)
  File "/usr/local/lib/python3.10/site-packages/uvicorn/server.py", line 77, in _serve
    config.load()
  File "/usr/local/lib/python3.10/site-packages/uvicorn/config.py", line 435, in load
    self.loaded_app = import_from_string(self.app)
  File "/usr/local/lib/python3.10/site-packages/uvicorn/importer.py", line 19, in import_from_string
    module = importlib.import_module(module_str)
  File "/usr/local/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/app/main.py", line 23, in <module>
    agent = RAG_Agent()
  File "/app/rag_agent.py", line 62, in __init__
    model_obj = AutoModelForCausalLM.from_pretrained(
  File "/usr/local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
    return model_class.from_pretrained(
  File "/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 262, in _wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3698, in from_pretrained
    hf_quantizer.validate_environment(
  File "/usr/local/lib/python3.10/site-packages/transformers/quantizers/quantizer_bnb_4bit.py", line 83, in validate_environment
    validate_bnb_backend_availability(raise_exception=True)
  File "/usr/local/lib/python3.10/site-packages/transformers/integrations/bitsandbytes.py", line 559, in validate_bnb_backend_availability
    return _validate_bnb_cuda_backend_availability(raise_exception)
  File "/usr/local/lib/python3.10/site-packages/transformers/integrations/bitsandbytes.py", line 537, in _validate_bnb_cuda_backend_availability
    raise RuntimeError(log_msg)
RuntimeError: CUDA is required but not available for bitsandbytes. 
Please consider installing the multi-platform enabled version of bitsandbytes,
which is currently a work in progress.
Please check currently supported platforms and installation
instructions at https://huggingface.co/docs/bitsandbytes/main/en/installation#multi-backend

Seems you didn't see this error while testing.

I just didn't have it at all, it works for me like this:

MostlyKIGuess · 2025-02-25T04:12:05Z

Was this in docker?
I will try upgrading the Base Image to see if the error persists

MostlyKIGuess · 2025-02-25T04:17:46Z

Were you trying to run this on CPU? Cuda is installed but is not required seems a bit weird. We can turn off quantization for CPU

MostlyKIGuess · 2025-02-25T04:22:55Z

I updated it to use quantization only when CUDA is available because only then it makes sense, and added config so it's not longer deprecated.

MostlyKIGuess · 2025-02-25T06:45:38Z

Tested and Checked Docker Image too ( Quantization works as well):

MostlyKIGuess · 2025-02-25T07:39:09Z

I have pushed this latest image on docker.

MostlyKIGuess · 2025-02-25T07:44:54Z

Trying to build a multi stage docker as well to reduce the size.

harshkasat · 2025-02-25T08:40:26Z

@MostlyKIGuess can you add logger this will help in future.

MostlyKIGuess · 2025-02-25T10:05:49Z

Trying to build a multi stage docker as well to reduce the size.

@MostlyKIGuess can you add logger this will help in future.

Loggin what exactly?

harshkasat · 2025-02-25T10:45:21Z

https://docs.python.org/3/library/logging.html . to save and give logs

- improved space of the docker image. - goes down from 20 gb to 8.97 Gb

MostlyKIGuess · 2025-02-25T11:39:08Z

So the current image is 8.97 Gb, supports both LLM and LLM with RAG, is uploaded on my docker.hub .

Image in my PC ( uncompressed):

MostlyKIGuess · 2025-02-25T19:47:45Z

https://docs.python.org/3/library/logging.html . to save and give logs

correct, I don't see how it would be useful tho, maybe we can have server logs if we have high traffic than potentially later on if we need it. Let's just focus on decreasing the size of docker and functionality as of now?
Anyways good suggestion, thanks!

chimosky · 2025-02-26T14:12:09Z

Was this in docker? I will try upgrading the Base Image to see if the error persists

Yes, this was in docker.

I didn't think I had to point it out as the PR is about docker support and improvement.

Were you trying to run this on CPU? Cuda is installed but is not required seems a bit weird. We can turn off quantization for CPU

Yes, trying to run on CPU.

I updated it to use quantization only when CUDA is available because only then it makes sense, and added config so it's not longer deprecated.

Will try testing again.

MostlyKIGuess force-pushed the ragagent branch from 9d86615 to 7ff3b9c Compare February 7, 2025 20:31

chimosky reviewed Feb 12, 2025

View reviewed changes

Dockerfile Outdated Show resolved Hide resolved

MostlyKIGuess force-pushed the ragagent branch 2 times, most recently from 4b95440 to 87f4b96 Compare February 14, 2025 15:00

chimosky reviewed Feb 14, 2025

View reviewed changes

rag_agent.py Outdated Show resolved Hide resolved

requirements.txt Outdated Show resolved Hide resolved

MostlyKIGuess force-pushed the ragagent branch 3 times, most recently from eac1713 to 2d9305a Compare February 14, 2025 18:10

MostlyKIGuess requested a review from chimosky February 14, 2025 19:20

chimosky reviewed Feb 21, 2025

View reviewed changes

MostlyKIGuess added 5 commits February 22, 2025 02:27

Refactor RAG_Agent

0af80da

- uses HF models. - fixes ollama dep issue. - updated dependencies

Add unit test for RAG_Agent functionality

a617562

Added Dockerfile for containerization

dcf9fa7

README for running Sugar-AI with Docker

13f91a4

feat: add FastAPI endpoints for querying RAG Agent

d9727dc

- Initialize the RAG Agent and vector store in main.py - Add GET "/" endpoint with a welcome message - Add POST "/ask" endpoint to process questions via the RAG Agent - Update server startup using uvicorn =

MostlyKIGuess added 4 commits February 22, 2025 02:27

readable initialization

aec7e82

feat: optimized dockerfile

47da3d1

feat: formatting for readability

fe0cb17

- message handling functions written instead of lambda.

passes flake8, chore: removed trailspaces

7bb5a74

MostlyKIGuess force-pushed the ragagent branch from adf4d67 to 7bb5a74 Compare February 21, 2025 20:57

chore: removed a file to run as we no longer need it

4a1eb1f

MostlyKIGuess requested a review from chimosky February 22, 2025 17:36

feat: enhance quantization handling , fix: deprecated config error

f187951

feat: update Dockerfile to use CUDA base

a5e1792

MostlyKIGuess force-pushed the ragagent branch from bf6a2b0 to a5e1792 Compare February 25, 2025 06:47

feat: refactor docker to use multi-stage

256f6a4

- improved space of the docker image. - goes down from 20 gb to 8.97 Gb

MostlyKIGuess force-pushed the ragagent branch from b873fb2 to 256f6a4 Compare February 25, 2025 10:56

Docker Support and Improve RAG_Agent Implementation #17

Are you sure you want to change the base?

Docker Support and Improve RAG_Agent Implementation #17

Conversation

MostlyKIGuess commented Feb 7, 2025

Summary

Changes

MostlyKIGuess commented Feb 11, 2025

MostlyKIGuess commented Feb 13, 2025

chimosky left a comment • edited Loading

Choose a reason for hiding this comment

MostlyKIGuess commented Feb 14, 2025

MostlyKIGuess commented Feb 14, 2025

MostlyKIGuess commented Feb 14, 2025

MostlyKIGuess commented Feb 14, 2025

MostlyKIGuess commented Feb 14, 2025

MostlyKIGuess commented Feb 14, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chimosky Feb 21, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Using the Streamlit App

Running the Streamlit App

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chimosky commented Feb 21, 2025

MostlyKIGuess commented Feb 21, 2025

MostlyKIGuess commented Feb 21, 2025

chimosky commented Feb 21, 2025

MostlyKIGuess commented Feb 21, 2025

MostlyKIGuess commented Feb 21, 2025

chimosky commented Feb 24, 2025

MostlyKIGuess commented Feb 24, 2025

chimosky commented Feb 24, 2025

MostlyKIGuess commented Feb 25, 2025

MostlyKIGuess commented Feb 25, 2025

MostlyKIGuess commented Feb 25, 2025

MostlyKIGuess commented Feb 25, 2025

MostlyKIGuess commented Feb 25, 2025

Tested and Checked Docker Image too ( Quantization works as well):

MostlyKIGuess commented Feb 25, 2025

MostlyKIGuess commented Feb 25, 2025

harshkasat commented Feb 25, 2025

MostlyKIGuess commented Feb 25, 2025

harshkasat commented Feb 25, 2025

MostlyKIGuess commented Feb 25, 2025

MostlyKIGuess commented Feb 25, 2025

chimosky commented Feb 26, 2025 • edited Loading

chimosky left a comment •

edited

Loading

chimosky Feb 21, 2025 •

edited

Loading

chimosky commented Feb 26, 2025 •

edited

Loading