Skip to content

tpoisonooo/HuixiangDou2

Repository files navigation

English | Simplified Chinese

HuixiangDou2: A Robustly Optimized GraphRAG Approach

Introduction

GraphRAG has many tuning spots, making it hard to discern whether performance gains stem from parameter adjustments or pipeline optimizations. Moreover, RAG test data is embedded in LLM training sets. LLM input tokens impact generation probabilities (background: phi-4 technical report). It's unclear if precision improvements originate from key token searches or retrievals.

Thus, HuixiangDou2 didn't introduce new methods but integrated multiple open-source projects (HuixiangDou, KAG, LightRAG, and DB-GPT, totaling 18k lines of code) and conducted comparative experiments on a test set where Qwen2.5-7B-Instruct underperformed. The score rose from 60 to 74.5. Ultimately, a GraphRAG implementation with performance recognized by human domain experts was developed. Here is the report.

Note: The impact of open-source on different fields/industries varies. Since licensing restriction, we can only give the code and test conclusions, and the test data cannot be provided.

Version Description

Compared to HuixiangDou1, this repo improves accuracy and async refactor:

  1. Graph Schema. Dense retrieval is only for querying similar entities and relationships.

  2. Ported/merged multiple open-source implementations, with code differences of nearly 18k lines:

    • Data. Organized a set of real domain knowledge that LLM has not fully seen for testing (gpt accuracy < 0.6)
    • Ablation. Confirmed the impact of different stages and parameters on accuracy
    • Improvement. As shown below.
  3. API remains compatible

If it is useful to you, please star it ⭐

Documentation

Acknowledgements

  • SiliconCloud Abundant LLM API, some models are free
  • KAG Graph retrieval based on reasoning
  • DB-GPT LLM tool collection
  • LightRAG Simple and efficient graph retrieval solution

Citation

@misc{huixiangdou2,
  author = {Huanjun Kong},
  title = {HuixiangDou2: A Graph-based Augmented Generation Approach},
  howpublished = {\url{https://github.com/tpoisonooo/HuixiangDou2}},
  year = {2025}
}