有没有计划自己训练模型来支持ngql语法 #5811

papandadj · 2024-01-23T07:04:04Z

我看了下你们langchain的实现，感觉就是一个表通过cypher的prompt来实现，但是感觉在复杂环境下，比如很多节点和边，我用老gpt4（截止日期21年的）获取不到nebula的语法。你们有没有自己训练的模型来支持新的或者复杂的语法

wey-gu · 2024-01-23T08:28:36Z

除了 naive 0-shot text2cypher，再就是在 LlamaIndex 上我们有引入 Graph RAG。

我们自己有 Chain of Exploration（是更先进的方式），我前阵子在 PyCon China Beijing 做过分享，还没有开源哈。

papandadj · 2024-01-23T08:47:54Z

Chain of Exploration有计划开源吗？

wey-gu · 2024-01-23T08:48:49Z

有计划！

papandadj · 2024-01-23T08:49:34Z

大约时间点有没有🤣

ccp123456789 · 2024-01-26T06:10:17Z

大佬，调用gpt4的方式生成ngql还是不可控的。尤其在中文领域，希望大佬可以用中文做些测试，会发现很多问题。

ccp123456789 · 2024-01-29T01:06:00Z

我看了下你们langchain的实现，感觉就是一个表通过cypher的prompt来实现，但是感觉在复杂环境下，比如很多节点和边，我用老gpt4（截止日期21年的）获取不到nebula的语法。你们有没有自己训练的模型来支持新的或者复杂的语法

我感觉目前难的是，text2ngql的训练语料几乎没有。想微调这样一个模型几乎不可能

papandadj · 2024-01-29T01:35:33Z

感觉gpt还可以，但是私有模型简直灾难

ccp123456789 · 2024-01-29T08:48:01Z

感觉gpt还可以，但是私有模型简直灾难

gpt做简单图结构的英文还可以。但是做中文图谱几乎都是错的，关系会出现幻觉，实体识别错误，关系跳数错误

papandadj · 2024-01-30T05:43:43Z

我这边通过给nebula写很多注释，包括节点、边、以及里面的字段注释。然后自动同步到gpt中，感觉效果可以。你也可以这样试下。私有模型如果写cypher语句也可以，但是私有模型让它转成nebula语法就感觉它一句话没听进去

ccp123456789 · 2024-01-31T01:30:50Z

我这边通过给nebula写很多注释，包括节点、边、以及里面的字段注释。然后自动同步到gpt中，感觉效果可以。你也可以这样试下。私有模型如果写cypher语句也可以，但是私有模型让它转成nebula语法就感觉它一句话没听进去

但是你说的这些，依旧无法解决实体识别和实体链接的问题。中文首先识别出实体，然后关键链接到图谱具体的实体

wey-gu · 2024-01-31T01:39:39Z

各位，抱歉回复晚了。

现在我们在企业版的 NebulaLLM 上做了很多相关的优化，效果好很多，给大家参考哈。

text2cypher 的 prompt 不应该耗费心智让它写出 nebula flavor 的 query，而是应该让它把所有力量集中在写查询本身，然后用代码去修正为 nebula flavor query
用 pre-post processing 去处理 query，进一步降低幻觉（超出 schema、等其他场景）的产生
text2cypher 阶段采用比其他阶段更昂贵的模型（比如 qwen 72B）
用 Chain of Exploration 做复杂任务，text2cypher 只是其子模块
schema 本身用词对质量影响很大（think of function calling 中函数的描述），我们引入了机制能在参考 schema 的时候参考 comment 但是最好是能把 schema 改写好，消弭误解，让它达意）

未来我们会一点点把一些成果开源出来哈

cc @ccp123456789 @papandadj

papandadj · 2024-01-31T05:40:27Z

第一个意思是不是这样可以理解，让大模型（qwen 72b）产生cypher语句，之后自己对cypher进行分析，然后改成nebula 语法？

Chain of Exploration 实现思路开源时记着踢下

wey-gu · 2024-01-31T08:42:14Z

第一个意思是不是这样可以理解，让大模型（qwen 72b）产生cypher语句，之后自己对cypher进行分析，然后改成nebula 语法？

不是哈，是让大模型写出通用 cypher，然后我们用代码改写为 nebulagraph-cypher 哈。
我强调其他模型是，如果整个流程中有其他部分也用大模型，可以选便宜的，但是把 query gen 的部分用最贵的，不过时间上我测试过小到 GLM3-6B 的模型，也是 work 的。

papandadj · 2024-01-31T10:37:30Z

好的，是不是就这样

产生普通cypher： MATCH (p:person)-[e:directed]->(m:movie) WHERE m.name = 'The Godfather II'
通过代码分析，不是自然语言模型，转成这样：MATCH (p:person)-[e:directed]->(m:movie) WHERE m.movie.name == 'The Godfather II'
也就是直接用抽象树转

wey-gu · 2024-02-01T00:59:32Z

👍 是的，但是不一定是 AST 做的。

papandadj · 2024-02-01T01:29:52Z

好的大约明白了

ccp123456789 · 2024-02-06T02:07:09Z

👍 是的，但是不一定是 AST 做的。
有考虑基于大模型做微调吗

wey-gu · 2024-02-06T04:31:04Z

👍 是的，但是不一定是 AST 做的。

有考虑基于大模型做微调吗

有的，未来会做的

papandadj added the type/enhancement Type: make the code neat or more efficient label Jan 23, 2024

wey-gu mentioned this issue Jan 27, 2024

Weekly Report 2024-01-26 vesoft-inc/nebula-community#424

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

有没有计划自己训练模型来支持ngql语法 #5811

有没有计划自己训练模型来支持ngql语法 #5811

papandadj commented Jan 23, 2024

wey-gu commented Jan 23, 2024

papandadj commented Jan 23, 2024

wey-gu commented Jan 23, 2024

papandadj commented Jan 23, 2024

ccp123456789 commented Jan 26, 2024

ccp123456789 commented Jan 29, 2024

papandadj commented Jan 29, 2024

ccp123456789 commented Jan 29, 2024

papandadj commented Jan 30, 2024 •

edited

Loading

ccp123456789 commented Jan 31, 2024

wey-gu commented Jan 31, 2024 •

edited

Loading

papandadj commented Jan 31, 2024

wey-gu commented Jan 31, 2024

papandadj commented Jan 31, 2024

wey-gu commented Feb 1, 2024

papandadj commented Feb 1, 2024

ccp123456789 commented Feb 6, 2024

wey-gu commented Feb 6, 2024

有没有计划自己训练模型来支持ngql语法 #5811

有没有计划自己训练模型来支持ngql语法 #5811

Comments

papandadj commented Jan 23, 2024

wey-gu commented Jan 23, 2024

papandadj commented Jan 23, 2024

wey-gu commented Jan 23, 2024

papandadj commented Jan 23, 2024

ccp123456789 commented Jan 26, 2024

ccp123456789 commented Jan 29, 2024

papandadj commented Jan 29, 2024

ccp123456789 commented Jan 29, 2024

papandadj commented Jan 30, 2024 • edited Loading

ccp123456789 commented Jan 31, 2024

wey-gu commented Jan 31, 2024 • edited Loading

papandadj commented Jan 31, 2024

wey-gu commented Jan 31, 2024

papandadj commented Jan 31, 2024

wey-gu commented Feb 1, 2024

papandadj commented Feb 1, 2024

ccp123456789 commented Feb 6, 2024

wey-gu commented Feb 6, 2024

papandadj commented Jan 30, 2024 •

edited

Loading

wey-gu commented Jan 31, 2024 •

edited

Loading