Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Community report grouping. #841

Closed
2 tasks done
FatemaD1577 opened this issue Aug 6, 2024 · 2 comments
Closed
2 tasks done

Community report grouping. #841

FatemaD1577 opened this issue Aug 6, 2024 · 2 comments

Comments

@FatemaD1577
Copy link

Is there an existing issue for this?

  • I have searched the existing issues
  • I have checked #657 to validate if my issue is covered by community support

Describe the issue

I am trying to understand implementation of graph rag. I have gone through the documentation available and also through the git repo to understand the global query part. My understanding of the implementation is as follows:

  1. We first create a graph based on the entities and relationships extracted.
  2. Communities are then created by grouping closely related entities
  3. At the next level community reports are created.
  4. Whenever user query comes we shuffle the community reports randomly and group them
  5. An intermediate response is generated for each group and a score is assigned to these groups based on the relevance of the answer to the user query
  6. Responses with 0 score are filtered out and the remaining are passed on to the LLM for final response generation

Below are some of the doubts still not cleared:

  1. Are community reports created at different levels to incrementally cover larger amount of information?
  2. On what basis are the communities reports grouped? Is there any parameter to control number of community reports that will be included in one group?
  3. Only responses with 0 score are filtered out or is there a threshold below which the responses will not be considered for final response generation?
  4. On what basis are we scoring the intermediate response in relevance to user query? Is it using similarity search or any other method to determine the score?

Having an answer to these doubts will help me have more clarity on the process of the global querying part.

Thank you in advance.

Steps to reproduce

No response

GraphRAG Config Used

# Paste your config here

Logs and screenshots

No response

Additional Information

  • GraphRAG Version:
  • Operating System:
  • Python Version:
  • Related Issues:
@FatemaD1577 FatemaD1577 added the triage Default label assignment, indicates new issue needs reviewed by a maintainer label Aug 6, 2024
@Xiyuche
Copy link

Xiyuche commented Aug 6, 2024

For the intermediate response score, this might be helpful, from the paper
image

@natoverse
Copy link
Collaborator

Moving to Discussions: #849

@natoverse natoverse removed the triage Default label assignment, indicates new issue needs reviewed by a maintainer label Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants