-
Notifications
You must be signed in to change notification settings - Fork 49
Add tutorial for PDF reader #334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 7 commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
6ade01f
Add tutorial for PDF reader
20001LastOrder 9726e5e
Add more content to the tutorial
20001LastOrder dc6c8d5
Update tutorial
20001LastOrder 18006b1
Add more details to tutorial text
oshoma f4a50c4
Changes to the tutorial
20001LastOrder fdb8523
Add tutorial in index
20001LastOrder 9571d67
update onboarding checklist
20001LastOrder 2f468cf
Fix typo
oshoma File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,365 @@ | ||
Create a PDF Reader with Sherpa | ||
=============================== | ||
|
||
In this tutorial we will create a simple PDF reader using Sherpa. The PDF reader will be able to open a PDF file, load its content into a vector database and then use a question-answering agent and a web search agent to answer questions about the content of the PDF file. | ||
|
||
|
||
Overview | ||
******** | ||
|
||
To create the PDF reader we will define four components: | ||
|
||
1. A text embedding tool to convert text and queries into vectors. We will use the SentenceTransformer library to meet this need. | ||
2. A vector database to store the text embeddings of the PDF file. We will use Chroma in-memory vector database. | ||
3. A customized Sherpa Action to enable a Sherpa agent to search the vector database. | ||
4. A question answering agent to answer questions about the content of the PDF file. We will Sherpa's built-in QAAgent to do this. | ||
|
||
Install dependencies | ||
********************* | ||
|
||
Step 1. Install Python **v3.9** using your preferred installation method. | ||
|
||
Step 2. Create a folder for storing your tutorial code and PDF files: | ||
|
||
.. code-block:: bash | ||
|
||
cd <your development directory> | ||
mkdir sherpa_tutorial | ||
cd sherpa_tutorial | ||
|
||
Step 3. You may wish to create a virtual environment to isolate the Python libraries used | ||
for this tutorial from your other Python code. This step is optional. | ||
|
||
.. code-block:: bash | ||
|
||
python -m venv myvenv | ||
source myvenv/bin/activate | ||
|
||
Step 4: Install the Sherpa library. | ||
|
||
.. code-block:: bash | ||
|
||
pip install sherpa_ai | ||
|
||
Step 5: install pdfminer.six for extracting text from PDF files and sentence-transformers for creating text embeddings: | ||
|
||
.. code-block:: bash | ||
|
||
pip install pdfminer.six sentence-transformers | ||
|
||
Define the custom action | ||
************************ | ||
An action is an interface to tools that the agent can use to interact with data. In this case, we will create a custom action to enable the agent to search the vector database containing the content of the PDF file. | ||
|
||
In the tutorial folder, create a file called `actions.py` and add the following code: | ||
|
||
.. code-block:: python | ||
|
||
from langchain.document_loaders import PDFMinerLoader | ||
from langchain.text_splitter import SentenceTransformersTokenTextSplitter | ||
from langchain.vectorstores.chroma import Chroma | ||
from loguru import logger | ||
|
||
from sherpa_ai.actions.base import BaseAction | ||
|
||
|
||
class DocumentSearch(BaseAction): | ||
def __init__(self, filename, embedding_function, k=4): | ||
# file name of the pdf | ||
self.filename = filename | ||
# the embedding function to use | ||
self.embedding_function = embedding_function | ||
# number of results to return in search | ||
self.k = k | ||
|
||
# load the pdf and create the vector store | ||
self.chroma = Chroma(embedding_function = embedding_function) | ||
documents = PDFMinerLoader(self.filename).load() | ||
documents = SentenceTransformersTokenTextSplitter(chunk_overlap=0).split_documents(documents) | ||
|
||
logger.info(f"Adding {len(documents)} documents to the vector store") | ||
self.chroma.add_documents(documents) | ||
logger.info("Finished adding documents to the vector store") | ||
|
||
def execute(self, query): | ||
""" | ||
Execute the action by searching the document store for the query | ||
|
||
Args: | ||
query (str): The query to search for | ||
|
||
Returns: | ||
str: The search results combined into a single string | ||
""" | ||
|
||
results = self.chroma.search(query, search_type="mmr", k=self.k) | ||
return "\n\n".join([result.page_content for result in results]) | ||
|
||
@property | ||
def name(self) -> str: | ||
""" | ||
The name of the action, used to describe the action to the agent. | ||
""" | ||
return "DocumentSearch" | ||
|
||
@property | ||
def args(self) -> dict: | ||
""" | ||
The arguments that the action takes, used to describe the action to the agent. | ||
""" | ||
return { | ||
"query": "string" | ||
} | ||
|
||
The action is a crucial part of Sherpa enabling the agent to interact with other systems. In this case, the action we are creating is used for searching the vector database containing PDF content for the query. | ||
|
||
This action will be passed to the QAAgent to enable the agent to search the vector database for the query. | ||
|
||
There are three main parts in this action class: | ||
|
||
1. `__init__`: This method initializes the action by loading the PDF file, creating the vector database and adding the documents to the vector database. | ||
|
||
2. `execute`: This method executes the action by searching the vector database for the query and returning the search results. | ||
|
||
3. `name` and `args`: These properties describe the action to agents that want to use it. | ||
|
||
|
||
Find a PDF file | ||
**************** | ||
|
||
Next, locate a PDF file to use with the PDF reader. For example, we could use a research | ||
paper like this one: https://arxiv.org/pdf/2401.07930.pdf. You can use any other PDF file you have. Save the PDF file in the same folder as the `actions.py` file. For simplicity, we will use the filename `paper.pdf` in this tutorial to refer to your PDF file. | ||
|
||
Defining the agent configuration | ||
********************************* | ||
|
||
Next, we will create a configuration file for the agent. This configuration file will be directly parsed by Sherpa to create the agent such that no additional code is required. | ||
|
||
Create a file called `agent_config.yml` and add the following code: | ||
|
||
.. code-block:: yaml | ||
|
||
shared_memory: | ||
_target_: sherpa_ai.memory.shared_memory.SharedMemory # The absolute path to the share memory class in the library | ||
objective: Answer the question # Objective for the agent, since this is a question answering agent, the objective is to answer questions | ||
|
||
agent_config: # For the demo, default configuration is used. You can change the configuration as per your requirement | ||
_target_: sherpa_ai.config.task_config.AgentConfig | ||
|
||
|
||
llm: # Configuration for the llm, here we are using the OpenAI GPT-3.5-turbo model | ||
_target_: langchain.chat_models.ChatOpenAI | ||
model_name: gpt-3.5-turbo | ||
temperature: 0 | ||
|
||
embedding_func: | ||
_target_: langchain.embeddings.SentenceTransformerEmbeddings | ||
model_name: sentence-transformers/all-mpnet-base-v2 | ||
|
||
doc_search: | ||
_target_: actions.DocumentSearch | ||
filename: paper.pdf | ||
embedding_function: ${embedding_func} | ||
k: 4 | ||
|
||
qa_agent: | ||
_target_: sherpa_ai.agents.qa_agent.QAAgent | ||
llm: ${llm} | ||
shared_memory: ${shared_memory} | ||
name: QA Sherpa | ||
description: You are a question answering assistant helping users to find answers to to their questions. Based on the input question, you will provide the answer from the text ONLY. | ||
agent_config: ${agent_config} | ||
num_runs: 1 | ||
actions: | ||
- ${doc_search} | ||
|
||
|
||
The `_target_` keys tell Sherpa which classes to use to instantiate various objects. | ||
|
||
The DocumentSearch action is defined in the `doc_search` field. You can change the `filename` attribute to the PDF file you want to use. The `embedding_function` attribute is set to the SentenceTransformerEmbeddings class, which is used to convert text into vectors. The `k` attribute is set to 4, which is the number of search results to return. | ||
|
||
The last section of this configuration file defines the agent, called `qa_agent.` This section has the following parts: | ||
|
||
1. `llm`: This is the language model used by the agent. In this case, we are using the OpenAI GPT-3.5-turbo model. | ||
|
||
2. `shared_memory`: This is the shared memory used by the agent. The shared memory is used to store information that can be shared between different agents. Since we only have one agent in this tutorial, we can choose the default shared memory. There will be a separate tutorial on how to use shared memory. | ||
|
||
3. `name` and `description`: These are used to describe the agent when it is executing the task. | ||
|
||
4. `agent_config`: This is the configuration for the agent. The default configuration is used in this tutorial. | ||
|
||
5. `num_runs`: This is the number of times the agent will execute an action. In this tutorial, the agent will execute only once. | ||
|
||
6. `actions`: This is the list of actions that the agent can execute. In this case, the agent can execute the `doc_search` action. | ||
|
||
|
||
Put it all together | ||
******************** | ||
|
||
Now, let's put everything together to create the PDF reader. Create a file called `main.py` and add the following code: | ||
|
||
.. code-block:: python | ||
|
||
from argparse import ArgumentParser | ||
|
||
from hydra.utils import instantiate | ||
oshoma marked this conversation as resolved.
Show resolved
Hide resolved
|
||
from omegaconf import OmegaConf | ||
|
||
from sherpa_ai.agents import QAAgent | ||
from sherpa_ai.events import EventType | ||
|
||
|
||
def get_qa_agent_from_config_file( | ||
config_path: str, | ||
) -> QAAgent: | ||
""" | ||
Create a QAAgent from a config file. | ||
|
||
Args: | ||
config_path: Path to the config file | ||
|
||
Returns: | ||
QAAgent: A QAAgent instance | ||
""" | ||
|
||
config = OmegaConf.load(config_path) | ||
|
||
agent_config = instantiate(config.agent_config) | ||
qa_agent: QAAgent = instantiate(config.qa_agent, agent_config=agent_config) | ||
|
||
return qa_agent | ||
|
||
if __name__ == "__main__": | ||
parser = ArgumentParser() | ||
parser.add_argument("--config", type=str, default="agent_config.yaml") | ||
args = parser.parse_args() | ||
|
||
qa_agent = get_qa_agent_from_config_file(args.config) | ||
|
||
while True: | ||
question = input("Ask me a question: ") | ||
|
||
# Add the question to the shared memory. By default, the agent will take the last | ||
# message in the shared memory as the task. | ||
qa_agent.shared_memory.add(EventType.task, "human", question) | ||
result = qa_agent.run() | ||
print(result) | ||
|
||
In this code, we define a function `get_qa_agent_from_config_file` that reads the configuration file and creates a QAAgent instance. We then create a QAAgent instance using the `get_qa_agent_from_config_file` function and run the agent in a loop. The agent will ask for a question and then answer the question based on the content of the PDF file. | ||
|
||
|
||
Run the PDF reader | ||
****************** | ||
|
||
Before we can run the PDF reader, we need to add a environment variable for OpenAI API key. You can get thane API key from the OpenAI website. Once you have your key, create a file called `.env` and add the following code: | ||
|
||
.. code-block:: bash | ||
|
||
OPENAI_API_KEY=<YOUR_API_KEY> | ||
|
||
Now, you can run the PDF reader by running the following command: | ||
|
||
.. code-block:: bash | ||
|
||
python main.py --config agent_config.yml | ||
20001LastOrder marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
oshoma marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Sherpa should now print out several lines of debug information as it starts up. The first time it runs, Sherpa will also download and install several components for the AI models it is using. | ||
|
||
Sherpa will then prompt you to ask a question. You can ask any question about the content of the PDF file. Sherpa uses your agent to answer the question based on the content of the PDF file you provided. | ||
|
||
.. image:: imgs/pdf_reader.png | ||
20001LastOrder marked this conversation as resolved.
Show resolved
Hide resolved
|
||
:width: 800 | ||
|
||
Finally, to view more detailed logs, you can set the log level to debug by changing the `LOG_LEVEL` environment variable in the `.env` file: | ||
|
||
.. code-block:: bash | ||
|
||
LOG_LEVEL=DEBUG | ||
|
||
|
||
Add more components | ||
******************** | ||
|
||
So far we have created a PDF reader (an "agent") that can answer our questions about the content of a PDF file. Now let's go a step further and add additional capabilities. | ||
|
||
We can enhance our agent to use knowledge from the Internet via Google search. To add Google Search, we simply add the built-in Sherpa action called `GoogleSearch` to the configuration. Add the following code to the `agent_config.yml` file (before the `qa_agent` section): | ||
|
||
.. code-block:: yaml | ||
|
||
google_search: | ||
_target_: sherpa_ai.actions.GoogleSearch | ||
role_description: Act as a question answering agent | ||
task: Question answering | ||
llm: ${llm} | ||
include_metadata: true | ||
config: ${agent_config} | ||
|
||
Then, add the `google_search` action to the `qa_agent` section: | ||
|
||
.. code-block:: yaml | ||
|
||
qa_agent: | ||
... | ||
actions: | ||
- ${doc_search} | ||
- ${google_search} | ||
|
||
|
||
We can also add a verification step to provide more reliable citations from the Google Search results. Add the following code to the `agent_config.yml` file (before the `qa_agent` section): | ||
|
||
.. code-block:: yaml | ||
|
||
citation_validation: # The tool used to validate and add citation to the answer | ||
_target_: sherpa_ai.output_parsers.citation_validation.CitationValidation | ||
sequence_threshold: 0.5 | ||
jaccard_threshold: 0.5 | ||
token_overlap: 0.5 | ||
|
||
Then, add the `citation_validation` to the `validations` property in `qa_agent` section, and change the number of runs to 2 so that both actions have a chance to be selected by the agent. | ||
|
||
Finally we need to modify the agent description to include the new capabilities. | ||
|
||
The final `qa_agent` section should look like this: | ||
|
||
.. code-block:: yaml | ||
|
||
qa_agent: | ||
_target_: sherpa_ai.agents.qa_agent.QAAgent | ||
llm: ${llm} | ||
shared_memory: ${shared_memory} | ||
name: QA Sherpa | ||
description: You are a question-answering assistant helping users to find answers based on the document. For each question, first try to collect relevant information by DocumentSearch. Then, use Google Search to find the answer in the next step. | ||
agent_config: ${agent_config} | ||
num_runs: 2 | ||
validation_steps: 1 | ||
actions: | ||
- ${doc_search} | ||
- ${google_search} | ||
validations: | ||
- ${citation_validation} | ||
|
||
Before running the agent, you need to add an Serper API key to the environment variable to enable the Google Search action. You can get an API key from the Serper website: https://serper.dev/. Add the following code to the `.env` file: | ||
|
||
.. code-block:: bash | ||
|
||
SERPER_API_KEY=<YOUR_API_KEY> | ||
|
||
|
||
Now you can run the PDF reader with Google Search by running the following command: | ||
|
||
.. code-block:: bash | ||
|
||
python main.py --config agent_config.yml | ||
|
||
You should now be able to ask questions about the content of the PDF file and get answers from the content of the PDF file and Google Search results. | ||
|
||
.. image:: imgs/pdf_reader_plus.png | ||
:width: 800 | ||
|
||
Notice how the agent now provides citations for the answers from the Google Search results. | ||
|
||
|
||
Conclusion | ||
*********** | ||
|
||
In this tutorial, we created a simple PDF reader using Sherpa. We used the SentenceTransformer library to convert text into vectors, the Chroma in-memory vector database to store the text embeddings of the PDF file, and the QAAgent from Sherpa to answer questions about the content of the PDF file. We also added the Google Search action to the agent to enable the agent to search the Internet for answers to questions. Finally, we added a citation validation step to provide more reliable citations for the answers from the Google Search results. |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -37,6 +37,7 @@ contribute changes to Sherpa. | |
|
||
Contribute/* | ||
Slack Bot/* | ||
Tutorials/* | ||
Test/* | ||
|
||
Codebase Overview: | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.