Skip to content

Conversation

@heyjustinai
Copy link
Contributor

@heyjustinai heyjustinai commented Nov 18, 2024

What does this PR do?

Creating a E2E RAG example that is able to do retrieval on documents and answer user questions. Components included:

Inference (with llama-stack)
Memory (with llama-stack)
Agent (with llama-stack)
Frontend (with Gradio)

Feature/Issue validation/testing/test plan

image

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Thanks for contributing 🎉!

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 18, 2024
@heyjustinai heyjustinai marked this pull request as ready for review November 20, 2024 21:11
@heyjustinai heyjustinai changed the title [WIP] Rag app RAG app example Nov 20, 2024
@heyjustinai heyjustinai requested a review from ashwinb December 3, 2024 06:58
USE_GPU = os.getenv("USE_GPU", False)
MODEL_NAME = os.getenv("MODEL_NAME", "meta-llama/Llama-3.2-1B-Instruct")
# if use_gpu, then the documents will be processed to output folder
DOCS_DIR = "/root/rag_data/output" if USE_GPU else "/root/rag_data/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this different depending on whether we use a GPU or not?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our ingest pipeline will take the /root/rag_data/ data folder and save results to /root/rag_data/output when using GPU. Otherwise the data folder is just /root/rag_data/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wukaixingxp still don't know why the output folder is different when using GPU vs. not? if you aren't using GPU, where does the ingest pipeline save the results?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ingest pipeline is just for image: basically it takes everything in /root/rag_data, find out if there is any image embed in the PDF, split the image and use 11B to generate image description, then it will save the original text with image description into /root/rag_data folder so that everything is now text data, ready to be used by 3B RAG agent. Running 11B on CPU is sooo slow and we have not enable this in the current stage. We believe this can be a P1 feature to have. Now we only support text data and will ignore embedded images, thus just take everything in the /root/rag_data folder

# Print a message indicating the start of llama-stack server
echo "starting the llama-stack server"
# Run llama-stack server with specified config and disable ipv6
python -m llama_stack.distribution.server.server --yaml-config /root/my-run.yaml --disable-ipv6&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be re-architected a bit. Why isn't the docker running two services for this purpose:

  • one for running the llama stack server
  • one for running the RAG app? that entrypoint can just be python /root/DocQA/app.py

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping to use LlamaStackDirectClient , but I am not sure if LlamaStackDirectClient supports MemoryBank connection to ChromaDB.

echo "-----starting to llama-stack docker now---------"
pip install gradio

if [ "$USE_GPU_FOR_DOC_INGESTION" = true ]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like if that variable is false, we just don't ingest at all?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, ingest pipeline is just for image: basically it takes a PDF, find out if there is any image and use 11B to generate image description, then it will save the original text with image description into output folder so that everything is now text data, ready to be used by 3B RAG agent. Running 11B on CPU is sooo slow and we have not enable this in the current stage. We believe this can be a P1 feature to have.

@heyjustinai heyjustinai requested a review from ashwinb December 4, 2024 17:29
@ashwinb
Copy link
Contributor

ashwinb commented Dec 4, 2024

Can you rebase and update requirements.txt so we can merge it in

@ashwinb ashwinb merged commit 64ee0f0 into main Dec 4, 2024
1 check passed
@ashwinb ashwinb deleted the rag-app branch December 4, 2024 23:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants