Skip to content

Commit

Permalink
Merge pull request #15 from miragecoa/main
Browse files Browse the repository at this point in the history
Update README.md with contribution guidelines.
  • Loading branch information
YangletLiu authored Nov 11, 2024
2 parents e35301b + 61d4fa3 commit ead04f7
Show file tree
Hide file tree
Showing 2 changed files with 50 additions and 33 deletions.
69 changes: 44 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,31 +37,50 @@ OFLL provides a specialized evaluation framework tailored specifically to the fi
The Open Financial LLM Leaderboard aims to set a new standard in evaluating the capabilities of language models in the financial domain, offering a specialized, real-world-focused benchmarking solution.


# Start the configuration

Most of the variables to change for a default leaderboard are in `src/env.py` (replace the path for your leaderboard) and `src/about.py` (for tasks).

Results files should have the following format and be stored as json files:
```json
{
"config": {
"model_dtype": "torch.float16", # or torch.bfloat16 or 8bit or 4bit
"model_name": "path of the model on the hub: org/model",
"model_sha": "revision on the hub",
},
"results": {
"task_name": {
"metric_name": score,
},
"task_name2": {
"metric_name": score,
}
}
}
```

Request files are created automatically by this tool.

# Contribute to OFLL

To make the leaderboard more accessible for external contributors, we offer clear guidelines for adding tasks, updating result files, and other maintenance activities.

1. **Primary Files**:
- `src/env.py`: Modify variables like repository paths for customization.
- `src/about.py`: Update task configurations here to add new datasets.

2. **Adding New Tasks**:
- Navigate to `src/about.py` and specify new tasks in the `Tasks` enum section.
- Each task requires details such as `benchmark`, `metric`, `col_name`, and `category`. For example:
```python
taskX = Task("DatasetName", "MetricType", "ColumnName", category="Category")
```

3. **Updating Results Files**:
- Results files should be in JSON format and structured as follows:
```json
{
"config": {
"model_dtype": "torch.float16",
"model_name": "path of the model on the hub: org/model",
"model_sha": "revision on the hub"
},
"results": {
"task_name": {
"metric_name": score
},
"task_name2": {
"metric_name": score
}
}
}
```

4. **Updating Leaderboard Data**:
- When a new task is added, ensure that the results JSON files reflect this update. This process will be automated in future releases.
- Access the current results at [Hugging Face Datasets](https://huggingface.co/datasets/TheFinAI/results/tree/main/demo-leaderboard).

5. **Useful Links**:
- [Hugging Face Leaderboard Documentation](https://huggingface.co/docs/leaderboards/en/leaderboards/building_page)
- [OFLL Demo on Hugging Face](https://huggingface.co/spaces/finosfoundation/Open-Financial-LLM-Leaderboard)


If you encounter problem on the space, don't hesitate to restart it to remove the create eval-queue, eval-queue-bk, eval-results and eval-results-bk created folder.

# Code logic for more complex edits
Expand Down
14 changes: 6 additions & 8 deletions src/about.py
Original file line number Diff line number Diff line change
Expand Up @@ -194,12 +194,10 @@ class Tasks(Enum):

CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
CITATION_BUTTON_TEXT = r"""
@misc{xie2024finben,
title={The FinBen: An Holistic Financial Benchmark for Large Language Models},
author={Qianqian Xie and Weiguang Han and Zhengyu Chen and Ruoyu Xiang and Xiao Zhang and Yueru He and Mengxi Xiao and Dong Li and Yongfu Dai and Duanyu Feng and Yijing Xu and Haoqiang Kang and Ziyan Kuang and Chenhan Yuan and Kailai Yang and Zheheng Luo and Tianlin Zhang and Zhiwei Liu and Guojun Xiong and Zhiyang Deng and Yuechen Jiang and Zhiyuan Yao and Haohang Li and Yangyang Yu and Gang Hu and Jiajia Huang and Xiao-Yang Liu and Alejandro Lopez-Lira and Benyou Wang and Yanzhao Lai and Hao Wang and Min Peng and Sophia Ananiadou and Jimin Huang},
year={2024},
eprint={2402.12659},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
@article{Xie2024FinBen,
title={FinBen: A Holistic Financial Benchmark for Large Language Models},
author={Qianqian Xie and Weiguang Han and Zhengyu Chen and Ruoyu Xiang and Xiao Zhang and Yueru He and Mengxi Xiao and Dong Li and Yongfu Dai and Duanyu Feng and Yijing Xu and Haoqiang Kang and Ziyan Kuang and Chenhan Yuan and Kailai Yang and Zheheng Luo and Tianlin Zhang and Zhiwei Liu and Guojun Xiong and Zhiyang Deng and Yuechen Jiang and Zhiyuan Yao and Haohang Li and Yangyang Yu and Gang Hu and Jiajia Huang and Xiao-Yang Liu and Alejandro Lopez-Lira and Benyou Wang and Yanzhao Lai and Hao Wang and Min Peng and Sophia Ananiadou and Jimin Huang},
journal={NeurIPS, Special Track on Datasets and Benchmarks},
year={2024},
}
"""

0 comments on commit ead04f7

Please sign in to comment.