-
Notifications
You must be signed in to change notification settings - Fork 633
RAG app example #118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RAG app example #118
Conversation
examples/DocQA/app.py
Outdated
USE_GPU = os.getenv("USE_GPU", False) | ||
MODEL_NAME = os.getenv("MODEL_NAME", "meta-llama/Llama-3.2-1B-Instruct") | ||
# if use_gpu, then the documents will be processed to output folder | ||
DOCS_DIR = "/root/rag_data/output" if USE_GPU else "/root/rag_data/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this different depending on whether we use a GPU or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our ingest pipeline will take the /root/rag_data/
data folder and save results to /root/rag_data/output
when using GPU. Otherwise the data folder is just /root/rag_data/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wukaixingxp still don't know why the output folder is different when using GPU vs. not? if you aren't using GPU, where does the ingest pipeline save the results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ingest pipeline is just for image: basically it takes everything in /root/rag_data
, find out if there is any image embed in the PDF, split the image and use 11B to generate image description, then it will save the original text with image description into /root/rag_data
folder so that everything is now text data, ready to be used by 3B RAG agent. Running 11B on CPU is sooo slow and we have not enable this in the current stage. We believe this can be a P1 feature to have. Now we only support text data and will ignore embedded images, thus just take everything in the /root/rag_data
folder
# Print a message indicating the start of llama-stack server | ||
echo "starting the llama-stack server" | ||
# Run llama-stack server with specified config and disable ipv6 | ||
python -m llama_stack.distribution.server.server --yaml-config /root/my-run.yaml --disable-ipv6& |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be re-architected a bit. Why isn't the docker running two services for this purpose:
- one for running the llama stack server
- one for running the RAG app? that entrypoint can just be
python /root/DocQA/app.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping to use LlamaStackDirectClient
, but I am not sure if LlamaStackDirectClient
supports MemoryBank connection to ChromaDB.
echo "-----starting to llama-stack docker now---------" | ||
pip install gradio | ||
|
||
if [ "$USE_GPU_FOR_DOC_INGESTION" = true ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like if that variable is false, we just don't ingest at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, ingest pipeline is just for image: basically it takes a PDF, find out if there is any image and use 11B to generate image description, then it will save the original text with image description into output
folder so that everything is now text data, ready to be used by 3B RAG agent. Running 11B on CPU is sooo slow and we have not enable this in the current stage. We believe this can be a P1 feature to have.
Can you rebase and update |
This reverts commit 7b00a1c.
… a ragservice template
What does this PR do?
Creating a E2E RAG example that is able to do retrieval on documents and answer user questions. Components included:
Inference (with llama-stack)
Memory (with llama-stack)
Agent (with llama-stack)
Frontend (with Gradio)
Feature/Issue validation/testing/test plan
Before submitting
Pull Request section?
to it if that's the case.
Thanks for contributing 🎉!