Skip to content

The project is an experimental one designed to conceive the storage and search of large-scale multi-type files (e.g. PDF, PPT, Doc, Txt).该项目是一个实验性项目,旨在构思大规模多类型文件(例如 PDF、PPT、Doc、Txt)的存储和搜索。

License

Notifications You must be signed in to change notification settings

codemo1991/any-library

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📚 Any Library 📚

Document Upload and Document Search

🎯 Overall Concept

Due to the existence of a large number of documents and materials, many scenarios require tracking documents based on specific sentences or images and interpreting their contents. In traditional search operations, combining search engines and vector queries with LLM can achieve many unexpected extended functionalities.

🏗️ Overall Architecture

  • Elasticsearch stores exact text and provides precise queries
  • Apache Tika extracts various text documents
  • Milvus stores vector data and provides vector queries
  • Minio stores files, provides file upload and download
  • Streamlit for frontend display

📁 Directory Structure

  • /test Test directory
  • /upload.py Document upload and semantic search, program entry
  • /requirements.txt Dependency definitions
  • /docker-compose.yml Docker file to start minio, milvus, elasticsearch, kibana

⚙️ Environment Setup

  • Start docker-desktop
    • docker-compose down -v [minio, milvus, elasticsearch] Remove containers
    • docker-compose up -d [minio, milvus, elasticsearch] Start containers
  • Execute docker-compose.yml to start minio, milvus, elasticsearch
  • Run pip install -r requirements.txt to install dependencies
  • streamlit run ./upload.py to start the program

🎬 Demo Showcase

📤 Homepage

Homepage

📤 Document Upload

Document Upload

📄 Elasticsearch File Storage

Elasticsearch File Storage

📄 Milvus File Storage

Milvus File Storage

📄 Minio File Storage

Minio File Storage

📄 Semantic Search & Exact Search

Exact Search

🗺️ Roadmap

  • Support file upload
  • Support exact search and semantic search
  • Support locating specific document positions & online document viewing
  • Support LLM questions and answers
  • Support specific PDF questions and answers
  • Other features

💬 Contact & Contribution

About

The project is an experimental one designed to conceive the storage and search of large-scale multi-type files (e.g. PDF, PPT, Doc, Txt).该项目是一个实验性项目,旨在构思大规模多类型文件(例如 PDF、PPT、Doc、Txt)的存储和搜索。

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages