Skip to content

kurakurakuda/sandbox_rag_app

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Poc Embedding Applications

Background

I build this application with jupter notebook to learn how to build applications using GPT and embegging database, which is called as Retrieval Augmented Generation(RAG).

What this application can do

This is Q&A application answering the specific web blog contents.
A user query about the those web blog contents and then the app answer it.  

How to build this application

  1. Data Preparation

    1. Scrapt the Web Pages
      We need to gather the data from Web Blog about the details, which Q&A refer it to answer the query.
      ** Please be careful about the copyright of the web page.
      The file is here
    2. (Optional) Format the texts by using ChatGPT.
  2. Build Application
    The script is here

    1. Import raw data from csv file prepared by step 1.
    2. Generate the embeddings(vector data) of the raw data by OpenAI API.
    3. Insert the embeddings(vector data) to Pinecone.
      (Ready for this application to answer the query from a user)
    4. A user input the query
    5. Generate the embeddings(vector data) of the query by OpenAI API.
    6. Query to Pinecone with the embeddings(vector data) of the query
    7. Call OpenAI API by providing the prompt including the raw data as context.

Solution Archtecture

Application Archtecture

graph LR;
    Local-->|API| Pinecone;
    Local-->|API| OpenAI;
Loading

Reference

OpenAI
OpenAI
OpenAI Tutorials
OpenAI Models
OpenAI API Reference
OpenAI tokenizer
OpenAI Cookbook Question_answering_using_embeddings.ipynb
OpenAI Cookbook vector_databases pinecone

Vector Database Pinecone
Pinecone
What is a Vector Database & How Does it Work? Use Cases + Examples
The Missing WHERE Clause in Vector Search
Pinecone Docs

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published