Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tutorial for PDF reader #334

Merged
merged 8 commits into from
Apr 15, 2024
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
365 changes: 365 additions & 0 deletions docs/How_To/Tutorials/document_reader.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,365 @@
Create a PDF Reader with Sherpa
===============================

In this tutorial we will create a simple PDF reader using Sherpa. The PDF reader will be able to open a PDF file, load its content into a vector database and then use a question-answering agent and a web search agent to answer questions about the content of the PDF file.


Overview
********

To create the PDF reader we will define four components:

1. A text embedding tool to convert text and queries into vectors. We will use the SentenceTransformer library to meet this need.
2. A vector database to store the text embeddings of the PDF file. We will use Chroma in-memory vector database.
3. A customized Sherpa Action to enable a Sherpa agent to search the vector database.
4. A question answering agent to answer questions about the content of the PDF file. We will Sherpa's built-in QAAgent to do this.

Install dependencies
*********************

Step 1. Install Python **v3.9** using your preferred installation method.

Step 2. Create a folder for storing your tutorial code and PDF files:

.. code-block:: bash

cd <your development directory>
mkdir sherpa_tutorial
cd sherpa_tutorial

Step 3. You may wish to create a virtual environment to isolate the Python libraries used
for this tutorial from your other Python code. This step is optional.

.. code-block:: bash

python -m venv myvenv
source myvenv/bin/activate

Step 4: Install the Sherpa library.

.. code-block:: bash

pip install sherpa_ai

Step 5: install pdfminer.six for extracting text from PDF files and sentence-transformers for creating text embeddings:

.. code-block:: bash

pip install pdfminer.six sentence-transformers
oshoma marked this conversation as resolved.
Show resolved Hide resolved

Define the custom action
************************
An action is an interface to tools that the agent can use to interact with data. In this case, we will create a custom action to enable the agent to search the vector database containing the content of the PDF file.

In the tutorial folder, create a file called `actions.py` and add the following code:

.. code-block:: python

from langchain.document_loaders import PDFMinerLoader
from langchain.text_splitter import SentenceTransformersTokenTextSplitter
from langchain.vectorstores.chroma import Chroma
from loguru import logger

from sherpa_ai.actions.base import BaseAction


class DocumentSearch(BaseAction):
def __init__(self, filename, embedding_function, k=4):
# file name of the pdf
self.filename = filename
# the embedding function to use
self.embedding_function = embedding_function
# number of results to return in search
self.k = k

# load the pdf and create the vector store
self.chroma = Chroma(embedding_function = embedding_function)
documents = PDFMinerLoader(self.filename).load()
documents = SentenceTransformersTokenTextSplitter(chunk_overlap=0).split_documents(documents)

logger.info(f"Adding {len(documents)} documents to the vector store")
self.chroma.add_documents(documents)
logger.info("Finished adding documents to the vector store")

def execute(self, query):
"""
Execute the action by searching the document store for the query

Args:
query (str): The query to search for

Returns:
str: The search results combined into a single string
"""

results = self.chroma.search(query, search_type="mmr", k=self.k)
return "\n\n".join([result.page_content for result in results])

@property
def name(self) -> str:
"""
The name of the action, used to describe the action to the agent.
"""
return "DocumentSearch"

@property
def args(self) -> dict:
"""
The arguments that the action takes, used to describe the action to the agent.
"""
return {
"query": "string"
}

The action is a crucial part of Sherpa enabling the agent to interact with other systems. In this case, the action we are creating is used for searching the vector database containing PDF content for the query.

This action will be passed to the QAAgent to enable the agent to search the vector database for the query.

There are three main parts in this action class:

1. `__init__`: This method initializes the action by loading the PDF file, creating the vector database and adding the documents to the vector database.

2. `execute`: This method executes the action by searching the vector database for the query and returning the search results.

3. `name` and `args`: These properties describe the action to agents that want to use it.


Find a PDF file
****************

Next, locate a PDF file to use with the PDF reader. For example, we could use a research
paper like this one: https://arxiv.org/pdf/2401.07930.pdf. You can use any other PDF file you have. Save the PDF file in the same folder as the `actions.py` file. For simplicity, we will use the filename `paper.pdf` in this tutorial to refer to your PDF file.

Defining the agent configuration
*********************************

Next, we will create a configuration file for the agent. This configuration file will be directly parsed by Sherpa to create the agent such that no additional code is required.

Create a file called `agent_config.yml` and add the following code:

.. code-block:: yaml

shared_memory:
_target_: sherpa_ai.memory.shared_memory.SharedMemory # The absolute path to the share memory class in the library
objective: Answer the question # Objective for the agent, since this is a question answering agent, the objective is to answer questions

agent_config: # For the demo, default configuration is used. You can change the configuration as per your requirement
_target_: sherpa_ai.config.task_config.AgentConfig


llm: # Configuration for the llm, here we are using the OpenAI GPT-3.5-turbo model
_target_: langchain.chat_models.ChatOpenAI
model_name: gpt-3.5-turbo
temperature: 0

embedding_func:
_target_: langchain.embeddings.SentenceTransformerEmbeddings
model_name: sentence-transformers/all-mpnet-base-v2

doc_search:
_target_: actions.DocumentSearch
filename: paper.pdf
embedding_function: ${embedding_func}
k: 4

qa_agent:
_target_: sherpa_ai.agents.qa_agent.QAAgent
llm: ${llm}
shared_memory: ${shared_memory}
name: QA Sherpa
description: You are a question answering assistant helping users to find answers to to their questions. Based on the input question, you will provide the answer from the text ONLY.
agent_config: ${agent_config}
num_runs: 1
actions:
- ${doc_search}


The `_target_` keys tell Sherpa which classes to use to instantiate various objects.

The DocumentSearch action is defined in the `doc_search` field. You can change the `filename` attribute to the PDF file you want to use. The `embedding_function` attribute is set to the SentenceTransformerEmbeddings class, which is used to convert text into vectors. The `k` attribute is set to 4, which is the number of search results to return.

The last section of this configuration file defines the agent, called `qa_agent.` This section has the following parts:

1. `llm`: This is the language model used by the agent. In this case, we are using the OpenAI GPT-3.5-turbo model.

2. `shared_memory`: This is the shared memory used by the agent. The shared memory is used to store information that can be shared between different agents. Since we only have one agent in this tutorial, we can choose the default shared memory. There will be a separate tutorial on how to use shared memory.

3. `name` and `description`: These are used to describe the agent when it is executing the task.

4. `agent_config`: This is the configuration for the agent. The default configuration is used in this tutorial.

5. `num_runs`: This is the number of times the agent will execute an action. In this tutorial, the agent will execute only once.

6. `actions`: This is the list of actions that the agent can execute. In this case, the agent can execute the `doc_search` action.


Put it all together
********************

Now, let's put everything together to create the PDF reader. Create a file called `main.py` and add the following code:

.. code-block:: python

from argparse import ArgumentParser

from hydra.utils import instantiate
oshoma marked this conversation as resolved.
Show resolved Hide resolved
from omegaconf import OmegaConf

from sherpa_ai.agents import QAAgent
from sherpa_ai.events import EventType


def get_qa_agent_from_config_file(
config_path: str,
) -> QAAgent:
"""
Create a QAAgent from a config file.

Args:
config_path: Path to the config file

Returns:
QAAgent: A QAAgent instance
"""

config = OmegaConf.load(config_path)

agent_config = instantiate(config.agent_config)
qa_agent: QAAgent = instantiate(config.qa_agent, agent_config=agent_config)

return qa_agent

if __name__ == "__main__":
parser = ArgumentParser()
parser.add_argument("--config", type=str, default="agent_config.yaml")
args = parser.parse_args()

qa_agent = get_qa_agent_from_config_file(args.config)

while True:
question = input("Ask me a question: ")

# Add the question to the shared memory. By default, the agent will take the last
# message in the shared memory as the task.
qa_agent.shared_memory.add(EventType.task, "human", question)
result = qa_agent.run()
print(result)

In this code, we define a function `get_qa_agent_from_config_file` that reads the configuration file and creates a QAAgent instance. We then create a QAAgent instance using the `get_qa_agent_from_config_file` function and run the agent in a loop. The agent will ask for a question and then answer the question based on the content of the PDF file.


Run the PDF reader
******************

Before we can run the PDF reader, we need to add a environment variable for OpenAI API key. You can get thane API key from the OpenAI website. Once you have your key, create a file called `.env` and add the following code:

.. code-block:: bash

OPENAI_API_KEY=<YOUR_API_KEY>

Now, you can run the PDF reader by running the following command:

.. code-block:: bash

python main.py --config agent_config.yml
20001LastOrder marked this conversation as resolved.
Show resolved Hide resolved

oshoma marked this conversation as resolved.
Show resolved Hide resolved
Sherpa should now print out several lines of debug information as it starts up. The first time it runs, Sherpa will also download and install several components for the AI models it is using.

Sherpa will then prompt you to ask a question. You can ask any question about the content of the PDF file. Sherpa uses your agent to answer the question based on the content of the PDF file you provided.

.. image:: imgs/pdf_reader.png
20001LastOrder marked this conversation as resolved.
Show resolved Hide resolved
:width: 800

Finally, to view more detailed logs, you can set the log level to debug by changing the `LOG_LEVEL` environment variable in the `.env` file:

.. code-block:: bash

LOG_LEVEL=DEBUG


Add more components
********************

So far we have created a PDF reader (an "agent") that can answer our questions about the content of a PDF file. Now let's go a step further and add additional capabilities.

We can enhance our agent to use knowledge from the Internet via Google search. To add Google Search, we simply add the built-in Sherpa action called `GoogleSearch` to the configuration. Add the following code to the `agent_config.yml` file (before the `qa_agent` section):

.. code-block:: yaml

google_search:
_target_: sherpa_ai.actions.GoogleSearch
role_description: Act as a question answering agent
task: Question answering
llm: ${llm}
include_metadata: true
config: ${agent_config}

Then, add the `google_search` action to the `qa_agent` section:

.. code-block:: yaml

qa_agent:
...
actions:
- ${doc_search}
- ${google_search}


We can also add a verification step to provide more reliable citations from the Google Search results. Add the following code to the `agent_config.yml` file (before the `qa_agent` section):

.. code-block:: yaml

citation_validation: # The tool used to validate and add citation to the answer
_target_: sherpa_ai.output_parsers.citation_validation.CitationValidation
sequence_threshold: 0.5
jaccard_threshold: 0.5
token_overlap: 0.5

Then, add the `citation_validation` to the `validations` property in `qa_agent` section, and change the number of runs to 2 so that both actions have a chance to be selected by the agent.

Finally we need to modify the agent description to include the new capabilities.

The final `qa_agent` section should look like this:

.. code-block:: yaml

qa_agent:
_target_: sherpa_ai.agents.qa_agent.QAAgent
llm: ${llm}
shared_memory: ${shared_memory}
name: QA Sherpa
description: You are a question-answering assistant helping users to find answers based on the document. For each question, first try to collect relevant information by DocumentSearch. Then, use Google Search to find the answer in the next step.
agent_config: ${agent_config}
num_runs: 2
validation_steps: 1
actions:
- ${doc_search}
- ${google_search}
validations:
- ${citation_validation}

Before running the agent, you need to add an Serper API key to the environment variable to enable the Google Search action. You can get an API key from the Serper website: https://serper.dev/. Add the following code to the `.env` file:

.. code-block:: bash

SERPER_API_KEY=<YOUR_API_KEY>


Now you can run the PDF reader with Google Search by running the following command:

.. code-block:: bash

python main.py --config agent_config.yml

You should now be able to ask questions about the content of the PDF file and get answers from the content of the PDF file and Google Search results.

.. image:: imgs/pdf_reader_plus.png
:width: 800

Notice how the agent now provides citations for the answers from the Google Search results.


Conclusion
***********

In this tutorial, we created a simple PDF reader using Sherpa. We used the SentenceTransformer library to convert text into vectors, the Chroma in-memory vector database to store the text embeddings of the PDF file, and the QAAgent from Sherpa to answer questions about the content of the PDF file. We also added the Google Search action to the agent to enable the agent to search the Internet for answers to questions. Finally, we added a citation validation step to provide more reliable citations for the answers from the Google Search results.
Binary file added docs/How_To/Tutorials/imgs/pdf_reader.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/How_To/Tutorials/imgs/pdf_reader_plus.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/How_To/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ contribute changes to Sherpa.

Contribute/*
Slack Bot/*
Tutorials/*
Test/*

Codebase Overview:
Expand Down
2 changes: 1 addition & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ There is a lot of information that you can leverage about this project in the fo
- Step 1: skim through :doc:`About Sherpa<About_Sherpa/index>` to get a better sense of what the project is trying to achieve.
- Step 2: skim through / watch the :doc:`recent project updates<About_Sherpa/Updates/index>` to get a sense of what is currently happening in the project.
- Step 3: check out the instructions for :doc:`contributing via GitHub<How_To/Contribute/contribute>` and make your first pull request improving something, even very small, about these documentations.
- Step 4: :doc:`talk to Sherpa on our community slack<How_To/Slack Bot/0_slack_bot_intro>` and then :doc:`set it up in your own environment<How_To/Slack Bot/1_slackbot_workspace>` and :doc:`run it<How_To/Slack Bot/2_run_slackbot>`.
- Step 4: :doc:`talk to Sherpa on our community slack<How_To/Slack Bot/0_slack_bot_intro>` or :doc:`create your custom Sherpa locally<How_To/Tutorials/document_reader>`
- Step 5: `Join Slack <https://join.slack.com/t/aisc-to/shared_invite/zt-f5zq5l35-PSIJTFk4v60FML177PgsPg>`__, say hi in `#sherpa-dev channel <https://aisc-to.slack.com/archives/C056RU8PEJ0>`__, and add `sherpa dev weekly Meeting <https://lu.ma/sherpa-weekly>`__ to your calendar. See more :doc:`here<How_To/index>`.

.. toctree::
Expand Down
Loading