Skip to content

Extract text from each page in a set of PDF files and generate embeddings using Azure OpenAI

Notifications You must be signed in to change notification settings

PabloZaiden/extract-text-with-embeddings-from-pdf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Extract text from PDFs and generate embeddings

This sample shows how to extract plain text from each page of a set of PDF files in an Azure Blob Storage container, generate embeddings for each page using the Azure OpenAI Service and push all the data to an Azure Storage Table.

Usage

export OPENAI_ENDPOINT="https://[service-name].openai.azure.com/"
export OPENAI_KEY="..."
export BLOB_CONNECTION_STRING="..."
export BLOB_CONTAINER_NAME="blob-container-with-pdfs"
export OPENAI_DEPLOYMENT_NAME="openai-deployment-name"

dotnet run

About

Extract text from each page in a set of PDF files and generate embeddings using Azure OpenAI

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages