ESC-Eval

Paper link: https://arxiv.org/abs/2406.14952

This is the official repository of ESC-Eval, which includes the datasets and models used in the ESC-Eval paper. The paper proposes a method for evaluating ESC models using a role-playing model, and the specific process is illustrated in the following Figure.

TODO

Middle quality character card upload

Overview

./data: role_cards data used in the paper.
./ESC-Role: our trained role playing agents which performace better than GPT4 in role-palying of a trouble person.
./ESC-RANK: our trained scorer for scoring dialogues data according to 7 well-designed dimensions.
./result: some examples of multi-turn conversations.
./score: some examples of scoring results.
./evaluate.py: get the multi-round dialogue script of the ESC model.
./score.py: get the score of each dimention for multi-round dialogue.

Usage

Download ESC-Role and replace the folder of './ESC-Role'
Change your LLM-based ESC-model to the format of below (we also list examples of llama3 and Qwen1.5 in evaluate.py) :

    class YourModel():
        def __init__(self):
            self.tokenizer = AutoTokenizer.from_pretrained("model_dir")
            self.model = AutoModelForCausalLM.from_pretrained("model_dir",torch_dtype="auto",device_map="auto").eval()
        def __call__(self, message) -> str:
            reponse=self.model.chat(message)
            return response

run evaluate.py to get multi-turn dialogue data, examples:

python evaluate.py -ef ./data/test_zh.json -rf ./result/ -lang zh 
python evaluate.py -ef ./data/test_en.json -rf ./result/ -lang en

After this progress, you should get some json data in the format of examples list in folder ./result.

Download ESC-RANK to folder ESC-RANK, and prepare Internlm2-chat's folder in score.py.
run score.py using ESC-RANK on your interactive data.

python score.py

User Cards

Statics

ESC-Role

ESC-Role is a specific role-playing models for ESC evaluation, which could be download form : https://huggingface.co/haidequanbu/ESC-Role

ESC-RANK

ESC-RANK is our training scoring for ESC evaluation, which could be download form : https://huggingface.co/haidequanbu/ESC-RANK

Scoring performace

Leaderboard

Human Evaluation

Cite

Our paper is coming soon.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ESC-Eval

TODO

Overview

Usage

User Cards

ESC-Role

ESC-RANK

Leaderboard

Cite

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
ESC-RANK		ESC-RANK
ESC-Role		ESC-Role
data		data
img		img
result/2024-06-24		result/2024-06-24
score		score
README.md		README.md
evaluate.py		evaluate.py
score.py		score.py

AIFlames/ESC-Eval

Folders and files

Latest commit

History

Repository files navigation

ESC-Eval

TODO

Overview

Usage

User Cards

ESC-Role

ESC-RANK

Leaderboard

Cite

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages