Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue]: <title> ❌ create_base_entity_graph solution | 按照graphrag最后一步create_base_entity_graph失败的解决方案 #951

Closed
2 tasks done
Mxk-1 opened this issue Aug 16, 2024 · 6 comments
Labels
community_support Issue handled by community members

Comments

@Mxk-1
Copy link

Mxk-1 commented Aug 16, 2024

Is there an existing issue for this?

  • I have searched the existing issues
  • I have checked #657 to validate if my issue is covered by community support

Describe the issue

Many people, including myself, have encountered the ❌ create_base_entity_graph issue. After trying various methods, I finally found a solution. It turns out that this isn't a bug, but rather a matter of balancing model capabilities with the supported max tokens.

很多人遇到了❌ create_base_entity_graph这个问题,包括我在内,尝试了很多方法后,最终得到了解决方案,或者说这并不是一个bug,而是有关模型能力和支持max_token的平衡。

Steps to reproduce

The chunk splitting in the original setting.yaml provided may not be suitable for the model launched with Ollama, as it could be either too large or too small, leading to errors in the model's responses. The original paper mentioned using the GPT-4o model, while the model I deployed locally is Gemma2:9b via Ollama. These two models differ in size and performance.

Additionally, since the pipeline relies on prompt-based Q&A with the text, the prompt itself takes up some of the model's processing length. By adjusting the chunk_size, I was able to successfully run the experiment. If you encounter this issue, try increasing or decreasing the chunk_size. If you have a better solution, feel free to discuss it with me.


原始提供的setting.yaml中chunk的拆分可能并不适合ollama启动的模型,可能过大过小,造成模型回答问题的错误。在原始的paper中提到了使用的模型为gpt4o,我本地部署的模型才用的是ollama方式的gemma2:9b,两个模型大小不同,性能也是不同的。

再结合pipeline过程中依赖于prompt结合text进行问答,prompt会占用模型处理的文本长度,所以我通过调整chunk_size达到了跑通实验的效果。遇到此问题的小伙伴可以尝试,调大或者调小。或者有更好的解决方案,一起讨论。

GraphRAG Config Used

chunks:
  size: 600
  overlap: 150

Logs and screenshots

No response

Additional Information

No response

@Mxk-1 Mxk-1 added the triage Default label assignment, indicates new issue needs reviewed by a maintainer label Aug 16, 2024
@natoverse
Copy link
Collaborator

Thanks for your contribution! I'll make a note in #657 for folks using Ollama.

@natoverse natoverse closed this as not planned Won't fix, can't repro, duplicate, stale Aug 16, 2024
@natoverse natoverse added community_support Issue handled by community members and removed triage Default label assignment, indicates new issue needs reviewed by a maintainer labels Aug 16, 2024
@night666e
Copy link

这个只能一直试验大小吗,我用的xinference的模型

@night666e
Copy link

请问楼主,你是修改提示词后遇到的这个问题吗

@wangweizheng22
Copy link

请问楼主,你是修改提示词后遇到的这个问题吗

我是在修改prompts下的enyity——extraction文件后出现了这个问题,然后重新自动生成索引这个问题依旧存在,请问您的解决了么

@Mxk-1
Copy link
Author

Mxk-1 commented Sep 14, 2024

不好意思刚注意到,我没有遇到修改提示词遇到这个问题,因为是中文的数据,所有我将原本的提示词翻译了一下,抽取部分标签维持原样,然后可以运行成功,原本的英文提示词也可以运行成功,我运行成功的时候用的是deepseek的api,本地的话推荐千问一些吧,感觉输出更加可控一些。

@Mxk-1
Copy link
Author

Mxk-1 commented Sep 14, 2024

代码是没什么问题的,主要看后台的模型能力。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community_support Issue handled by community members
Projects
None yet
Development

No branches or pull requests

4 participants