I build this application with jupter notebook to learn how to build applications using GPT and embegging database, which is called as Retrieval Augmented Generation(RAG).
This is Q&A application answering the specific web blog contents.
A user query about the those web blog contents and then the app answer it.
-
Data Preparation
- Scrapt the Web Pages
We need to gather the data from Web Blog about the details, which Q&A refer it to answer the query.
** Please be careful about the copyright of the web page.
The file is here - (Optional) Format the texts by using ChatGPT.
- Scrapt the Web Pages
-
Build Application
The script is here- Import raw data from csv file prepared by step 1.
- Generate the embeddings(vector data) of the raw data by OpenAI API.
- Insert the embeddings(vector data) to Pinecone.
(Ready for this application to answer the query from a user) - A user input the query
- Generate the embeddings(vector data) of the query by OpenAI API.
- Query to Pinecone with the embeddings(vector data) of the query
- Call OpenAI API by providing the prompt including the raw data as context.
Application Archtecture
graph LR;
Local-->|API| Pinecone;
Local-->|API| OpenAI;
OpenAI
OpenAI
OpenAI Tutorials
OpenAI Models
OpenAI API Reference
OpenAI tokenizer
OpenAI Cookbook Question_answering_using_embeddings.ipynb
OpenAI Cookbook vector_databases pinecone
Vector Database Pinecone
Pinecone
What is a Vector Database & How Does it Work? Use Cases + Examples
The Missing WHERE Clause in Vector Search
Pinecone Docs