AI-based search engine done right.
- Compare trafilatura bs4 and newspaper3k
- Implement the bulk indexer
- Implement the batch system for spider
- Implement the spider with Trafilatura
- Parse title, body, and metadata from HTML
- Parse title, body, and metadata from PDF, etc
- Implement the spider with Trafilatura
- Implement the dispatcher
- Implement dispatcher for linkedin
- Implement dispatcher for GitHub
- Implement dispatcher for Medium
- Implement dispatcher for X (previously Twitter)
- Implement the ParadeDB retriever with LlamaIndex
- Update Rag Retriever to use the searxng engine
- Implement the CRAG workflow for the Rag Retriever
- Implement the reranker
- Add support for Cohere Reranker
- Add support for FlashRank Reranker
- Add support for late-chunking for better IR