Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: tweak jsonl and excel name format in dataAgent #125

Merged
merged 2 commits into from
Jul 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions docs/guidebook/en/8_1_1_data_autonomous_agent.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ As shown in the figure below:

![data_agent_dataset](../_picture/data_agent_dataset_en.png)

[dataAgent sample evaluation dataset](../../../sample_standard_app/app/examples/data/dataset_turn_1_2024-07-10-15:06:24.jsonl)
[dataAgent sample evaluation dataset](../../../sample_standard_app/app/examples/data/dataset_turn_1_2024-07-10-15-06-24.jsonl)


### Complete Evaluation Results
Expand All @@ -124,7 +124,7 @@ As shown in the figure below:
- More dimensions Score/Suggestion: similar to the Relevance dimension.
![data_agent_eval_result](../_picture/data_agent_eval_result_en.png)

[dataAgent sample eval result](../../../sample_standard_app/app/examples/data/eval_result_turn_1_2024-07-10-15:06:24.xlsx)
[dataAgent sample eval result](../../../sample_standard_app/app/examples/data/eval_result_turn_1_2024-07-10-15-06-24.xlsx)



Expand All @@ -139,7 +139,7 @@ As shown in the figure below:

![data_agent_eval_report](../_picture/data_agent_eval_report_en.png)

[dataAgent sample evaluation report](../../../sample_standard_app/app/examples/data/eval_report_2024-07-10-15:06:24.xlsx)
[dataAgent sample evaluation report](../../../sample_standard_app/app/examples/data/eval_report_2024-07-10-15-06-24.xlsx)

### Comparative Experiment
Adjust the llm model in demo_rag_agent within aU from the previous `qwen1.5-72b-chat` to `qwen1.5-7b-chat`, and after evaluation by dataAgent, the comprehensive evaluation reports are as follows:
Expand Down
6 changes: 3 additions & 3 deletions docs/guidebook/zh/8_1_1_数据自治智能体.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ tips: 请合理配置问题集及具体评测行数,以免造成大量算力

![data_agent_dataset](../_picture/data_agent_dataset.png)

[dataAgent生产的评测数据集样例地址](../../../sample_standard_app/app/examples/data/dataset_turn_1_2024-07-10-10:48:30.jsonl)
[dataAgent生产的评测数据集样例地址](../../../sample_standard_app/app/examples/data/dataset_turn_1_2024-07-10-10-48-30.jsonl)

### 完整评测结果
生产评测数据集后,dataAgent开始数据多维度评估标注,产出完整评测结果(若执行多轮dataAgent跑批任务,则产出多个完整评测结果)。
Expand All @@ -129,7 +129,7 @@ tips: 请合理配置问题集及具体评测行数,以免造成大量算力
- 例如第1条数据在**相关性维度的suggestion**: 虽然回答了关于北京天气的问题,但提供的温度单位为华氏度,与国内用户习惯的摄氏度不符,建议转换为摄氏度并提供更全面的天气信息,如湿度、风力等。
- 例如第3条数据在**事实性维度的suggestion**: 回答中包含事实错误,如将阿根廷球星莱昂内尔·梅西错误地归入英格兰队。优化建议是确保所有提及的数据和事实准确无误,尤其是在涉及具体人物和事件时。

[dataAgent生产的完整评测结果样例地址](../../../sample_standard_app/app/examples/data/eval_result_turn_1_2024-07-10-10:48:30.xlsx)
[dataAgent生产的完整评测结果样例地址](../../../sample_standard_app/app/examples/data/eval_result_turn_1_2024-07-10-10-48-30.xlsx)

### 综合评测报告
根据多轮完整的评测结果,生成一份综合评测报告。
Expand All @@ -141,7 +141,7 @@ tips: 请合理配置问题集及具体评测行数,以免造成大量算力
- 更多维度 Avg Score 以此类推
![data_agent_eval_report](../_picture/data_agent_eval_report.png)

[dataAgent生产的综合评测报告样例地址](../../../sample_standard_app/app/examples/data/eval_report_2024-07-10-10:48:30.xlsx)
[dataAgent生产的综合评测报告样例地址](../../../sample_standard_app/app/examples/data/eval_report_2024-07-10-10-48-30.xlsx)

### 对比实验
调整aU中`demo_rag_agent`中的模型从上文生产评测报告时的**qwen1.5-7b-chat**改为**qwen1.5-72b-chat**,通过dataAgent评测后,生成的综合评测报告如下:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ def execute(self, input_object: InputObject, agent_input: dict):
input_object (InputObject): input parameters passed by the user.
agent_input (dict): agent input parsed from `input_object` by the user.
"""
date = datetime.datetime.now().strftime("%Y-%m-%d-%H:%M:%S")
date = datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S")
input_object.add_data('date', date)

# step1: build q&a dataset from the candidate agent which needs to be evaluated.
Expand Down