Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker Support and Improve RAG_Agent Implementation #17

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

MostlyKIGuess
Copy link
Member

Summary

This PR introduces a Dockerfile to containerize Sugar-AI, ensuring a consistent and reproducible environment. Additionally, it modifies the RAG_Agent by refactoring its code, addressing dependency issues, and adding unit tests.

Changes

  • Added Dockerfile for containerization.
  • Updated README with instructions for running Sugar-AI using Docker.
  • Refactored RAG_Agent to:
    • Utilize Hugging Face (HF) models for better performance.
    • Fix Ollama dependency issues.
    • Update relevant dependencies for compatibility.
  • Added unit tests to validate RAG_Agent functionality.

@MostlyKIGuess
Copy link
Member Author

@chimosky do review

@MostlyKIGuess
Copy link
Member Author

@chimosky Check this implementation. It has 2 endpoints uses the models which we can decide and doesn't have dep issues.

@MostlyKIGuess MostlyKIGuess force-pushed the ragagent branch 2 times, most recently from 4b95440 to 87f4b96 Compare February 14, 2025 15:00
Copy link
Member

@chimosky chimosky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed, not tested.

You also seem to be overloading PRs, your title says one thing and then you add other things to it, they're easy to open and close.

Also resolve the merge conflicts here.

@MostlyKIGuess
Copy link
Member Author

Reviewed, not tested.

You also seem to be overloading PRs, your title says one thing and then you add other things to it, they're easy to open and close.

Also resolve the merge conflicts here.

I am not sure why it is causing the merge conflict on README, requriements.txt as those needs to be over written as this is a big change.
What should I do regarding that case?

@MostlyKIGuess
Copy link
Member Author

Sorry about the PRs thingy, I was expecting that it was about the change in RAG Implementation only but then we discussed about adding 2 API endpoints so added those, and to test those better added a UI for just convenience

@MostlyKIGuess
Copy link
Member Author

I am trying out a SLIM docker image to test out if the size changes. Will lyk about the size in Matrix

@MostlyKIGuess
Copy link
Member Author

The size is 13 gb on the latest dockerfile.

@MostlyKIGuess MostlyKIGuess force-pushed the ragagent branch 3 times, most recently from eac1713 to 2d9305a Compare February 14, 2025 18:10
@MostlyKIGuess
Copy link
Member Author

Forced commits are because I had to remove new line at EOF error and didn't want to add extra commit just because of that.

@MostlyKIGuess
Copy link
Member Author

image
It has a size of 6.42 ( compressed) . Decompression size:
image

rag_agent.py Outdated

# Using HuggingFace embeddings
embeddings = HuggingFaceEmbeddings()
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will probably fail a flake8 check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You were right, I will make it pass in the next commit.

rag_agent.py Outdated
| self.model
| StrOutputParser()
| self.prompt
| (lambda x: "\n".join(msg.content for msg in x.to_messages()) if hasattr(x, "to_messages") else str(x))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These one liners are unnecessarily complex, and it'll be great to go for readability rather than complexity, especially when we can.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, will make changes on these as well, they are although necessary for tokenizing properly

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt the complexity I'm looking at is the only way to tokenize properly.

rag_agent.py Outdated
Comment on lines 100 to 183
parser = argparse.ArgumentParser(description="Pippy's AI-Coding Assistant")
parser.add_argument('--model', type=str, choices=[
'bigscience/bloom-1b1',
'facebook/opt-350m',
'EleutherAI/gpt-neo-1.3B'
], default='bigscience/bloom-1b1', help='Model name to use for text generation')
parser.add_argument('--docs', nargs='+', default=[
'./docs/Pygame Documentation.pdf',
'./docs/Python GTK+3 Documentation.pdf',
'./docs/Sugar Toolkit Documentation.pdf'
], help='List of document paths to load into the vector store')
args = parser.parse_args()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing as there's going to be multiple lines here, it's best to aim for readability.

This seems a lot more readable.

parser = argparse.ArgumentParser(description="Pippy's AI-Coding Assistant")
parser.add_argument(
    '--model',
    type=str,
    choices=[
        'bigscience/bloom-1b1',
        'facebook/opt-350m',
        'EleutherAI/gpt-neo-1.3B'
    ],
    default='bigscience/bloom-1b1',
    help='Model name to use for text generation'
)
parser.add_argument(
    '--docs',
    nargs='+',
    default=[
        './docs/Pygame Documentation.pdf',
        './docs/Python GTK+3 Documentation.pdf',
        './docs/Sugar Toolkit Documentation.pdf'
    ],
    help='List of document paths to load into the vector store'
)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright I will change it to this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this, please look at both endpoints and be sure we cover just the direct dependencies.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I think so we cover only the necessary ones, I didn't freeze, would you like me to? or I could create a conda env

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, that's what got us the overload of independencies we're experiencing now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah fair!

Comment on lines 1 to 20
from rag_agent import RAG_Agent

def test_rag_agent():
agent = RAG_Agent(model="bigscience/bloom-1b1")

document_paths = [
'./docs/Pygame Documentation.pdf',
'./docs/Python GTK+3 Documentation.pdf',
'./docs/Sugar Toolkit Documentation.pdf'
]
agent.retriever = agent.setup_vectorstore(document_paths)

question = "How do I create a Pygame window?"
response = agent.run(question)

print("Question:", question)
print("Response:", response)

if __name__ == "__main__":
test_rag_agent()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a test, this just runs the agent and prints the output.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my head it was just supposed to test the working of retrieval..

Copy link
Member

@chimosky chimosky Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could've worked for testing locally, but it doesn't need to be here as it's not an actual test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this!

main.py Outdated
Comment on lines 24 to 52
# Initialize the RAG_Agent and its vector store (retriever).
agent = RAG_Agent()
agent.retriever = agent.setup_vectorstore([
'./docs/Pygame Documentation.pdf',
'./docs/Python GTK+3 Documentation.pdf',
'./docs/Sugar Toolkit Documentation.pdf'
])

prompt = '''
Your task is to answer children's questions using simple language.
Explain any difficult words in a way a 3-year-old can understand.
Keep responses under 60 words.
\n\nQuestion:
'''
@app.get("/")
def root():
return {"message": "Welcome to Sugar-AI with FastAPI!"}

input_text = prompt + question
@app.post("/ask")
def ask_question(question: str):
answer = agent.run(question)
return {"answer": answer}

inputs = tokenizer.encode(input_text, return_tensors='pt')
outputs = model.generate(inputs, max_length=150, num_return_sequences=1)
answer = tokenizer.decode(outputs[0], skip_special_tokens=True)

return answer
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This again just focuses on one part of sugar-ai, there's some unmerged work in #9, and the reason why routers seem appealing is because of the fact that it encourages modularity, everything can be independent of the other - makes it easier to debug and fix issues -.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just seems to add unnecessary complexity...and I have added llm_only.py for the working of LLM separately. If we just want API points, we don't really need to route them at least in this use case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, although I think there's some things we'll need from that PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like? We have both the API points to run LLM locally and with RAG functionality.. not sure what else was required. It's just in a singular file rather than split up, at least the server, which makes sense kinda , as normally I am used to writing node server so I just am used to following this style of writing.

Comment on lines +1 to +28
import streamlit as st
import requests

st.title("Sugar-AI Chat Interface")

use_rag = st.checkbox("Use RAG (Retrieval-Augmented Generation)", value=True)

st.subheader("Ask Sugar-AI")
question = st.text_input("Enter your question:")

if st.button("Submit"):
if question:
if use_rag:
url = "http://localhost:8000/ask"
else:
url = "http://localhost:8000/ask-llm"
params = {"question": question}
try:
response = requests.post(url, params=params)
if response.status_code == 200:
result = response.json()
st.markdown("**Answer:** " + result["answer"])
else:
st.error(f"Error {response.status_code}: {response.text}")
except Exception as e:
st.error(f"Error contacting the API: {e}")
else:
st.warning("Please enter a question.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we doing this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Easy way to test both endpoints, gives us a UI to interact with both of them , just an easier way to test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree that it's easier to test as it just adds an extra dependency when we truly don't need to, this should go into the README rather than here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright, I shall remove this file then, Altho I did not add this as a dependency , the user would have to install streamlit if and only if they want to run the UI. As mentioned in the README:

Using the Streamlit App

Sugar-AI also provides a Streamlit-based interface for quick interactions and visualizations.

Running the Streamlit App

  1. Install Streamlit:

    If you haven't already, install Streamlit:

    pip install streamlit
  2. Make sure server is running using:

    uvicorn main:app --host 0.0.0.0 --port 8000
  3. Start the App:

    Launch the Streamlit app by running:

    streamlit run streamlit.py
  4. Using the App:

    • The app provides a simple UI to input coding questions and displays the response using Sugar-AI.
    • Use the sidebar options to configure settings if available.
    • The app communicates with the FastAPI backend to process and retrieve answers.

Streamlit UI

Enjoy exploring Sugar-AI through both API endpoints and the interactive Streamlit interface!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is way better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool! thanks

@chimosky
Copy link
Member

Also, the first commit in this PR no longer exists, so you can update your branch.

@MostlyKIGuess
Copy link
Member Author

Also, the first commit in this PR no longer exists, so you can update your branch.

I am sorry, how do I do that btw?

@MostlyKIGuess
Copy link
Member Author

Do I just rebase and squash that commit?

@chimosky
Copy link
Member

Also, the first commit in this PR no longer exists, so you can update your branch.

I am sorry, how do I do that btw?

You can rebase and drop that commit, then rebase master onto your branch.

The commit shouldn't be here in the first place because it did exist on main, but was changed as the commit message was amended.

- uses HF models.
- fixes ollama dep issue.
- updated dependencies
- Initialize the RAG Agent and vector store in main.py
- Add GET "/" endpoint with a welcome message
- Add POST "/ask" endpoint to process questions via the RAG Agent
- Update server startup using uvicorn

=
@MostlyKIGuess
Copy link
Member Author

Also, the first commit in this PR no longer exists, so you can update your branch.

I am sorry, how do I do that btw?

You can rebase and drop that commit, then rebase master onto your branch.

The commit shouldn't be here in the first place because it did exist on main, but was changed as the commit message was amended.

Done

@MostlyKIGuess
Copy link
Member Author

The UI, I did not add as a dependency because it's the choice of the user!, so it won't increase dependency but would make our lives easies as writing in %20 would be handled by itself directly, so API calls can be tested easily.

@chimosky
Copy link
Member

The UI, I did not add as a dependency because it's the choice of the user!, so it won't increase dependency but would make our lives easies as writing in %20 would be handled by itself directly, so API calls can be tested easily.

Now we don't have to worry about it's existence at all, we just say "Hey! You can do this too if you want to".

Will test this again once I can.

@MostlyKIGuess
Copy link
Member Author

Now we don't have to worry about it's existence at all, we just say "Hey! You can do this too if you want to".

Will test this again once I can.

yes , and it's easier on the user end too!

@chimosky
Copy link
Member

Tried running and it threw this error;

tokenizer_config.json: 100%|████████████████████████████████████████| 3.07k/3.07k [00:00<00:00, 10.9MB/s]
tokenizer.json: 100%|███████████████████████████████████████████████| 7.03M/7.03M [00:01<00:00, 6.86MB/s]
config.json: 100%|██████████████████████████████████████████████████████| 679/679 [00:00<00:00, 3.32MB/s]
The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. 
Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.
Could not load bitsandbytes native library: /lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.32' not found (required by /usr/local/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so)
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 85, in <module>
    lib = get_native_library()
  File "/usr/local/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 72, in get_native_library
    dll = ct.cdll.LoadLibrary(str(binary_path))
  File "/usr/local/lib/python3.10/ctypes/__init__.py", line 452, in LoadLibrary
    return self._dlltype(name)
  File "/usr/local/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.32' not found (required by /usr/local/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so)
CUDA is required but not available for bitsandbytes. Please consider installing the multi-platform enabled version of bitsandbytes, which is currently a work in progress. Please check currently supported platforms and installation instructions at https://huggingface.co/docs/bitsandbytes/main/en/installation#multi-backend
Traceback (most recent call last):
  File "/usr/local/bin/uvicorn", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1082, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/uvicorn/main.py", line 412, in main
    run(
  File "/usr/local/lib/python3.10/site-packages/uvicorn/main.py", line 579, in run
    server.run()
  File "/usr/local/lib/python3.10/site-packages/uvicorn/server.py", line 66, in run
    return asyncio.run(self.serve(sockets=sockets))
  File "/usr/local/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
  File "/usr/local/lib/python3.10/site-packages/uvicorn/server.py", line 70, in serve
    await self._serve(sockets)
  File "/usr/local/lib/python3.10/site-packages/uvicorn/server.py", line 77, in _serve
    config.load()
  File "/usr/local/lib/python3.10/site-packages/uvicorn/config.py", line 435, in load
    self.loaded_app = import_from_string(self.app)
  File "/usr/local/lib/python3.10/site-packages/uvicorn/importer.py", line 19, in import_from_string
    module = importlib.import_module(module_str)
  File "/usr/local/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/app/main.py", line 23, in <module>
    agent = RAG_Agent()
  File "/app/rag_agent.py", line 62, in __init__
    model_obj = AutoModelForCausalLM.from_pretrained(
  File "/usr/local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
    return model_class.from_pretrained(
  File "/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 262, in _wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3698, in from_pretrained
    hf_quantizer.validate_environment(
  File "/usr/local/lib/python3.10/site-packages/transformers/quantizers/quantizer_bnb_4bit.py", line 83, in validate_environment
    validate_bnb_backend_availability(raise_exception=True)
  File "/usr/local/lib/python3.10/site-packages/transformers/integrations/bitsandbytes.py", line 559, in validate_bnb_backend_availability
    return _validate_bnb_cuda_backend_availability(raise_exception)
  File "/usr/local/lib/python3.10/site-packages/transformers/integrations/bitsandbytes.py", line 537, in _validate_bnb_cuda_backend_availability
    raise RuntimeError(log_msg)
RuntimeError: CUDA is required but not available for bitsandbytes. 
Please consider installing the multi-platform enabled version of bitsandbytes,
which is currently a work in progress.
Please check currently supported platforms and installation
instructions at https://huggingface.co/docs/bitsandbytes/main/en/installation#multi-backend

Seems you didn't see this error while testing.

@MostlyKIGuess
Copy link
Member Author

Tried running and it threw this error;

tokenizer_config.json: 100%|████████████████████████████████████████| 3.07k/3.07k [00:00<00:00, 10.9MB/s]
tokenizer.json: 100%|███████████████████████████████████████████████| 7.03M/7.03M [00:01<00:00, 6.86MB/s]
config.json: 100%|██████████████████████████████████████████████████████| 679/679 [00:00<00:00, 3.32MB/s]
The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. 
Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.
Could not load bitsandbytes native library: /lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.32' not found (required by /usr/local/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so)
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 85, in <module>
    lib = get_native_library()
  File "/usr/local/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 72, in get_native_library
    dll = ct.cdll.LoadLibrary(str(binary_path))
  File "/usr/local/lib/python3.10/ctypes/__init__.py", line 452, in LoadLibrary
    return self._dlltype(name)
  File "/usr/local/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.32' not found (required by /usr/local/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so)
CUDA is required but not available for bitsandbytes. Please consider installing the multi-platform enabled version of bitsandbytes, which is currently a work in progress. Please check currently supported platforms and installation instructions at https://huggingface.co/docs/bitsandbytes/main/en/installation#multi-backend
Traceback (most recent call last):
  File "/usr/local/bin/uvicorn", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1082, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/uvicorn/main.py", line 412, in main
    run(
  File "/usr/local/lib/python3.10/site-packages/uvicorn/main.py", line 579, in run
    server.run()
  File "/usr/local/lib/python3.10/site-packages/uvicorn/server.py", line 66, in run
    return asyncio.run(self.serve(sockets=sockets))
  File "/usr/local/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
  File "/usr/local/lib/python3.10/site-packages/uvicorn/server.py", line 70, in serve
    await self._serve(sockets)
  File "/usr/local/lib/python3.10/site-packages/uvicorn/server.py", line 77, in _serve
    config.load()
  File "/usr/local/lib/python3.10/site-packages/uvicorn/config.py", line 435, in load
    self.loaded_app = import_from_string(self.app)
  File "/usr/local/lib/python3.10/site-packages/uvicorn/importer.py", line 19, in import_from_string
    module = importlib.import_module(module_str)
  File "/usr/local/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/app/main.py", line 23, in <module>
    agent = RAG_Agent()
  File "/app/rag_agent.py", line 62, in __init__
    model_obj = AutoModelForCausalLM.from_pretrained(
  File "/usr/local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
    return model_class.from_pretrained(
  File "/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 262, in _wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3698, in from_pretrained
    hf_quantizer.validate_environment(
  File "/usr/local/lib/python3.10/site-packages/transformers/quantizers/quantizer_bnb_4bit.py", line 83, in validate_environment
    validate_bnb_backend_availability(raise_exception=True)
  File "/usr/local/lib/python3.10/site-packages/transformers/integrations/bitsandbytes.py", line 559, in validate_bnb_backend_availability
    return _validate_bnb_cuda_backend_availability(raise_exception)
  File "/usr/local/lib/python3.10/site-packages/transformers/integrations/bitsandbytes.py", line 537, in _validate_bnb_cuda_backend_availability
    raise RuntimeError(log_msg)
RuntimeError: CUDA is required but not available for bitsandbytes. 
Please consider installing the multi-platform enabled version of bitsandbytes,
which is currently a work in progress.
Please check currently supported platforms and installation
instructions at https://huggingface.co/docs/bitsandbytes/main/en/installation#multi-backend

Seems you didn't see this error while testing.

I just didn't have it at all, it works for me like this:
image

@MostlyKIGuess
Copy link
Member Author

Was this in docker?
I will try upgrading the Base Image to see if the error persists

@MostlyKIGuess
Copy link
Member Author

Were you trying to run this on CPU? Cuda is installed but is not required seems a bit weird. We can turn off quantization for CPU

@MostlyKIGuess
Copy link
Member Author

I updated it to use quantization only when CUDA is available because only then it makes sense, and added config so it's not longer deprecated.

@MostlyKIGuess
Copy link
Member Author

Tested and Checked Docker Image too ( Quantization works as well):

image

@MostlyKIGuess
Copy link
Member Author

I have pushed this latest image on docker.

@MostlyKIGuess
Copy link
Member Author

Trying to build a multi stage docker as well to reduce the size.

@harshkasat
Copy link

@MostlyKIGuess can you add logger this will help in future.

@MostlyKIGuess
Copy link
Member Author

Trying to build a multi stage docker as well to reduce the size.

@MostlyKIGuess can you add logger this will help in future.

Loggin what exactly?

@harshkasat
Copy link

https://docs.python.org/3/library/logging.html . to save and give logs

- improved space of the docker image.
- goes down from 20 gb to 8.97 Gb
@MostlyKIGuess
Copy link
Member Author

So the current image is 8.97 Gb, supports both LLM and LLM with RAG, is uploaded on my docker.hub .
image

  • Image in my PC ( uncompressed):
    image

@MostlyKIGuess
Copy link
Member Author

https://docs.python.org/3/library/logging.html . to save and give logs

correct, I don't see how it would be useful tho, maybe we can have server logs if we have high traffic than potentially later on if we need it. Let's just focus on decreasing the size of docker and functionality as of now?
Anyways good suggestion, thanks!

@chimosky
Copy link
Member

chimosky commented Feb 26, 2025

Was this in docker? I will try upgrading the Base Image to see if the error persists

Yes, this was in docker.

I didn't think I had to point it out as the PR is about docker support and improvement.

Were you trying to run this on CPU? Cuda is installed but is not required seems a bit weird. We can turn off quantization for CPU

Yes, trying to run on CPU.

I updated it to use quantization only when CUDA is available because only then it makes sense, and added config so it's not longer deprecated.

Will try testing again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants