Skip to content

[EMNLP 2024] ”ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models“

Notifications You must be signed in to change notification settings

AIFlames/ESC-Eval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ESC-Eval

Paper link: https://arxiv.org/abs/2406.14952

This is the official repository of ESC-Eval, which includes the datasets and models used in the ESC-Eval paper. The paper proposes a method for evaluating ESC models using a role-playing model, and the specific process is illustrated in the following Figure. Framework

TODO

  • Middle quality character card upload

Overview

  • ./data: role_cards data used in the paper.
  • ./ESC-Role: our trained role playing agents which performace better than GPT4 in role-palying of a trouble person.
  • ./ESC-RANK: our trained scorer for scoring dialogues data according to 7 well-designed dimensions.
  • ./result: some examples of multi-turn conversations.
  • ./score: some examples of scoring results.
  • ./evaluate.py: get the multi-round dialogue script of the ESC model.
  • ./score.py: get the score of each dimention for multi-round dialogue.

Usage

  1. Download ESC-Role and replace the folder of './ESC-Role'
  2. Change your LLM-based ESC-model to the format of below (we also list examples of llama3 and Qwen1.5 in evaluate.py) :
    class YourModel():
        def __init__(self):
            self.tokenizer = AutoTokenizer.from_pretrained("model_dir")
            self.model = AutoModelForCausalLM.from_pretrained("model_dir",torch_dtype="auto",device_map="auto").eval()
        def __call__(self, message) -> str:
            reponse=self.model.chat(message)
            return response
  1. run evaluate.py to get multi-turn dialogue data, examples:
python evaluate.py -ef ./data/test_zh.json -rf ./result/ -lang zh 
python evaluate.py -ef ./data/test_en.json -rf ./result/ -lang en
After this progress, you should get some json data in the format of examples list in folder ./result.
  1. Download ESC-RANK to folder ESC-RANK, and prepare Internlm2-chat's folder in score.py.
  2. run score.py using ESC-RANK on your interactive data.
python score.py

User Cards

Statics

ESC-Role

ESC-Role is a specific role-playing models for ESC evaluation, which could be download form : https://huggingface.co/haidequanbu/ESC-Role

ESC-RANK

ESC-RANK is our training scoring for ESC evaluation, which could be download form : https://huggingface.co/haidequanbu/ESC-RANK

Scoring performace

Leaderboard

Human Evaluation chinese

Cite

Our paper is coming soon.

About

[EMNLP 2024] ”ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models“

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages