Skip to content

Qinnovation123/papers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDF services infra

CI

Scraping, downloading, extracting and embedding.

  • Scraping: Try to get the PDF link from a given query.
  • Downloading: Download the PDF file from the scraped url.
  • Extracting: Extract text content from the PDF file as markdown.
  • Embedding: Embed and upsert the results to the vector db.