Qianqian Xie¹ Weiguang Han¹ Xiao Zhang² Ruoyu Xiang⁷ Gang Hu⁵ Ke Qin⁵ Duanyu Feng³ Yongfu Dai³ Hao Wang³ Yanzhao Lai⁴ Min Peng¹ Alejandro Lopez-Lira⁶ Jimin Huang*^,8

¹Wuhan University ²Sun Yat-Sen University ³Sichuan University ⁴Southwest Jiaotong University ⁵Yunan University ⁶University of Florida ⁷New York University ⁸ChanceFocus AMC.

Pixiu Paper | FLARE Leaderboard

Disclaimer

This repository and its contents are provided for academic and educational purposes only. None of the material constitutes financial, legal, or investment advice. No warranties, express or implied, are offered regarding the accuracy, completeness, or utility of the content. The authors and contributors are not responsible for any errors, omissions, or any consequences arising from the use of the information herein. Users should exercise their own judgment and consult professionals before making any financial, legal, or investment decisions. The use of the software and information contained in this repository is entirely at the user's own risk.

By using or accessing the information in this repository, you agree to indemnify, defend, and hold harmless the authors, contributors, and any affiliated organizations or persons from any and all claims or damages.

Checkpoints:

FinMA v0.1 (Full 7B version)

Languages

English
Chinese

Evaluations (More details on FLARE section):

FLARE (flare-zh-afqmc)
FLARE (flare-zh-stocka)
FLARE (flare-zh-corpus)
FLARE (flare-zh-fineval)
FLARE (flare-zh-fe)
FLARE (flare-zh-nl)
FLARE (flare-zh-nl2)
FLARE (flare-zh-nsp)
FLARE (flare-zh-re)
FLARE (flare-zh-stockb)
FLARE (flare-zh-qa)
FLARE (flare-zh-na)
FLARE (flare-zh-19ccks)
FLARE (flare-zh-20ccks)
FLARE (flare-zh-21ccks)
FLARE (flare-zh-22ccks)
FLARE (flare-zh-ner)
FLARE (flare-zh-fpb)
FLARE (flare-zh-fiqasa)
FLARE (flare-zh-headlines)
FLARE (flare-zh-bigdata)
FLARE (flare-zh-acl)
FLARE (flare-zh-cikm)
FLARE (flare-zh-finqa)
FLARE (flare-zh-convfinqa)

Overview

FLARE_ZH is a cornerstone initiative focusing on the Chinese financial domain, FLARE_ZH aims to bolster the progress, refinement, and assessment of Large Language Models (LLMs) tailored specifically for Chinese financial contexts. As a vital segment of the broader PIXIU endeavor, FLARE_ZH stands as a testament to the commitment in harnessing the capabilities of LLMs, ensuring that financial professionals and enthusiasts in the Chinese-speaking world have top-tier linguistic tools at their disposal.

Key Features

Open resources: PIXIU openly provides the financial LLM, instruction tuning data, and datasets included in the evaluation benchmark to encourage open research and transparency.
Multi-task: The instruction tuning data and benchmark in PIXIU cover a diverse set of financial tasks.
Multi-modality: PIXIU's instruction tuning data and benchmark consist of multi-modality financial data, including time series data from the stock movement prediction task. It covers various types of financial texts, including reports, news articles, tweets, and regulatory filings.
Diversity: Unlike previous benchmarks focusing mainly on financial NLP tasks, PIXIU's evaluation benchmark includes critical financial prediction tasks aligned with real-world scenarios, making it more challenging.

FLARE_ZH: Financial Language Understanding and Prediction Evaluation Benchmark

In this section, we provide a detailed performance analysis of FinMA compared to other leading models, including ChatGPT, GPT-4, lince-zero et al. For this analysis, we've chosen a range of tasks and metrics that span various aspects of financial Natural Language Processing and financial prediction.

Tasks

Data	Task	Raw	Data Types	Modalities	License	Paper
AFQMC	semantic matching	38,650	question data, chat	text	Apache-2.0	[1]
corpus	semantic matching	120,000	question data, chat	text	Public	[2]
stockA	stock classification	14,769	news, historical prices	text, time series	Public	[3]
Fineval	multiple-choice	1,115	financial exam	text	Apache-2.0	[4]
NL	news classification	7,955	news articles	text	Public	[5]
NL2	news classification	7,955	news articles	text	Public	[5]
NSP	negative news judgment	4,499	news, social media text	text	Public	[5]
RE	relationship identification	14,973	news, entity pair	text	Public	[5]
FE	sentiment analysis	18,177	financial social media text	text	Public	[5]
stockB	sentiment analysis	9,812	financial social media text	text	Apache-2.0	[6]
QA	question answering	22,375	financial news announcements	text，table	Public	[5]
NA	text summarization	32,400	news articles, announcements	text	Public	[5]
19CCKS	event subject extraction	156,834	financial social media text	text	CC BY-SA 4.0	[7]
20CCKS	event subject extraction	372,810	news、reports	text	CC BY-SA 4.0	[8]
21CCKS	event causality extraction	8,000	news、reports	text	CC BY-SA 4.0	[9]
22CCKS	event subject extraction	109,555	news、reports	text	CC BY-SA 4.0	[10]
NER	named entity recognition	1,685	financial reports	text	Public	[11]
FPB	sentiment analysis	4,845	news	text	MIT license	[12]
FIQASA	sentiment analysis	1,173	news headlines, tweets	text	MIT license	[12]
Headlines	news headline classification	11,412	news headlines	text	MIT license	[12]
BigData	stock movement prediction	7,164	tweets, historical prices	text, time series	MIT license	[12]
ACL	stock movement prediction	27,053	tweets, historical prices	text, time series	MIT license	[12]
CIKM	stock movement prediction	4,967	tweets, historical prices	text, time series	MIT license	[12]
FinQA	question answering	14,900	earnings reports	text, table	MIT license	[12]
ConvFinQA	multi-turn question answering	48,364	earnings reports	text, table	MIT license	[12]

Xu L, Hu H, Zhang X, et al. CLUE: A Chinese language understanding evaluation benchmark[J]. arXiv preprint arXiv:2004.05986, 2020.
Jing Chen, Qingcai Chen, Xin Liu, Haijun Yang, Daohe Lu, and Buzhou Tang. 2018. The BQ Corpus: A Large-scale Domain-specific Chinese Corpus For Sentence Semantic Equivalence Identification. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4946–4951, Brussels, Belgium. Association for Computational Linguistics.
Jinan Zou, Haiyao Cao, Lingqiao Liu, Yuhao Lin, Ehsan Abbasnejad, and Javen Qinfeng Shi. 2022. Astock: A New Dataset and Automated Stock Trading based on Stock-specific News Analyzing Model. In Proceedings of the Fourth Workshop on Financial Technology and Natural Language Processing (FinNLP), pages 178–186, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Zhang L, Cai W, Liu Z, et al. FinEval: A Chinese Financial Domain Knowledge Evaluation Benchmark for Large Language Models[J]. arxiv preprint arxiv:2308.09975, 2023.
Lu D, Liang J, Xu Y, et al. BBT-Fin: Comprehensive Construction of Chinese Financial Domain Pre-trained Language Model, Corpus and Benchmark[J]. arxiv preprint arxiv:2302.09432, 2023.
https://huggingface.co/datasets/kuroneko5943/stock11
https://www.biendata.xyz/competition/ccks_2019_4/
https://www.biendata.xyz/competition/ccks_2020_4_1/
https://www.biendata.xyz/competition/ccks_2021_task6_2/
https://www.biendata.xyz/competition/ccks2022_eventext/
Jia C, Shi Y, Yang Q, et al. Entity enhanced BERT pre-training for Chinese NER[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020: 6384-6396.
Xie Q, Han W, Zhang X, et al. PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark for Finance[J]. arXiv preprint arXiv:2306.05443, 2023.

Evaluation

Preparation

Locally install

git clone https://github.com/chancefocus/PIXIU.git --recursive
cd PIXIU
pip install -r requirements.txt
cd PIXIU/src/financial-evaluation
pip install -e .[multilingual]

Docker image

sudo bash scripts/docker_run.sh

Above command starts a docker container, you can modify docker_run.sh to fit your environment. We provide pre-built image by running sudo docker pull tothemoon/pixiu:latest

docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \
    --network host \
    --env https_proxy=$https_proxy \
    --env http_proxy=$http_proxy \
    --env all_proxy=$all_proxy \
    --env HF_HOME=$hf_home \
    -it [--rm] \
    --name pixiu \
    -v $pixiu_path:$pixiu_path \
    -v $hf_home:$hf_home \
    -v $ssh_pub_key:/root/.ssh/authorized_keys \
    -w $workdir \
    $docker_user/pixiu:$tag \
    [--sshd_port 2201 --cmd "echo 'Hello, world!' && /bin/bash"]

Arguments explain:

[] means ignoreable arguments
HF_HOME: huggingface cache dir
sshd_port: sshd port of the container, you can run ssh -i private_key -p $sshd_port root@$ip to connect to the container, default to 22001
--rm: remove the container when exit container (ie.CTRL + D)

Automated Task Assessment

Before evaluation, please download BART checkpoint to src/metrics/BARTScore/bart_score.pth.

For automated evaluation, please follow these instructions:

Huggingface Transformer

To evaluate a model hosted on the HuggingFace Hub (for instance, finma-7b-full), use this command:

python eval.py \
    --model "hf-causal-llama" \
    --model_args "use_accelerate=True,pretrained=chancefocus/finma-7b-full,tokenizer=chancefocus/finma-7b-full,use_fast=False" \
    --tasks "flare_ner,flare_sm_acl,flare_fpb"

More details can be found in the lm_eval documentation.

Commercial APIs

Please note, for tasks such as NER, the automated evaluation is based on a specific pattern. This might fail to extract relevant information in zero-shot settings, resulting in relatively lower performance compared to previous human-annotated results.

export OPENAI_API_SECRET_KEY=YOUR_KEY_HERE
python eval.py \
    --model gpt-4 \
    --tasks flare_ner,flare_sm_acl,flare_fpb

License

PIXIU is licensed under [MIT]. For more details, please see the MIT file.

Star History

Pixiu Paper | FLARE Leaderboard

免责声明

本资料库及其内容仅用于学术和教育目的。所有资料均不构成金融、法律或投资建议。不对内容的准确性、完整性或实用性提供任何明示或暗示的保证。作者和撰稿人不对任何错误、遗漏或因使用本网站信息而产生的任何后果负责。用户在做出任何财务、法律或投资决定之前，应自行判断并咨询专业人士。使用本资料库所含软件和信息的风险完全由用户自行承担。

使用或访问本资源库中的信息，即表示您同意对作者、撰稿人以及任何附属组织或个人的任何及所有索赔或损害进行赔偿、为其辩护并使其免受损害。

检查点:

FinMA v0.1 (Full 7B version)

语言

英文
中文

评估 (更多详情，请参阅FLARE部分):

FLARE (flare-zh-afqmc)
FLARE (flare-zh-stocka)
FLARE (flare-zh-corpus)
FLARE (flare-zh-fineval)
FLARE (flare-zh-fe)
FLARE (flare-zh-nl)
FLARE (flare-zh-nl2)
FLARE (flare-zh-nsp)
FLARE (flare-zh-re)
FLARE (flare-zh-stockb)
FLARE (flare-zh-qa)
FLARE (flare-zh-na)
FLARE (flare-zh-19ccks)
FLARE (flare-zh-20ccks)
FLARE (flare-zh-21ccks)
FLARE (flare-zh-22ccks)
FLARE (flare-zh-ner)
FLARE (flare-zh-fpb)
FLARE (flare-zh-fiqasa)
FLARE (flare-zh-headlines)
FLARE (flare-zh-bigdata)
FLARE (flare-zh-acl)
FLARE (flare-zh-cikm)
FLARE (flare-zh-finqa)
FLARE (flare-zh-convfinqa)

概述

FLARE_ZH 是一项专注于中文金融领域的基石计划，旨在促进专为中文金融环境定制的大型语言模型（LLMs）的进展、完善和评估。FLARE_ZH 是 PIXIU 更大范围工作的一个重要部分，证明了我们在利用 LLMs 能力方面的承诺，确保中文世界的金融专业人士和爱好者拥有顶级的语言工具。

主要特征

公开资源: PIXIU 公开提供财务 LLM、教学调整数据和评估基准中的数据集，以鼓励公开研究和透明度。
多任务: PIXIU 中的指令调整数据和基准涵盖了一系列不同的金融任务。
多模态: PIXIU 的指令调整数据和基准由多模态金融数据组成，包括股票走势预测任务的时间序列数据。它涵盖各种类型的金融文本，包括报告、新闻报道、推特和监管文件。
多样性: 与以往主要侧重于金融 NLP 任务的基准不同，PIXIU 的评估基准包括与真实世界场景相一致的关键金融预测任务，因此更具挑战性。

FLARE_ZH: 金融语言理解和预测评估基准

在本节中，我们将提供 FinMA 与其他领先模型（包括 ChatGPT、GPT-4、ince-zero 等）相比的详细性能分析。为了进行分析，我们选择了一系列任务和指标，涵盖了金融自然语言处理和金融预测的各个方面。

任务

数据	任务类型	原始数据	数据类型	模式	许可证	论文
AFQMC	语义匹配	38,650	提问数据, 对话	文本	Apache-2.0	[1]
corpus	语义匹配	120,000	提问数据, 对话	文本	Public	[2]
stockA	股票分类	14,769	新闻, 历史价格	文本, 时间序列	Public	[3]
Fineval	多项选择	1,115	金融考试	文本	Apache-2.0	[4]
NL	新闻分类	7,955	新闻报道	文本	Public	[5]
NL2	新闻分类	7,955	新闻报道	文本	Public	[5]
NSP	负面新闻判断	4,499	新闻、社交媒体文本	文本	Public	[5]
RE	关系识别	14,973	新闻、实体对	文本	Public	[5]
FE	情感分析	18,177	金融社交媒体文本	文本	Public	[5]
stockB	情感分析	9,812	金融社交媒体文本	文本	Apache-2.0	[6]
QA	金融问答	22,375	财经新闻公告	文本, 表格	Public	[5]
NA	文本摘要	32,400	新闻文章、公告	文本	Public	[5]
19CCKS	事件主体提取	156,834	新闻报道	文本	CC BY-SA 4.0	[7]
20CCKS	事件主体提取	372,810	新闻报道	文本	CC BY-SA 4.0	[8]
21CCKS	事件因果关系抽取	8,000	新闻报道	文本	CC BY-SA 4.0	[9]
22CCKS	事件主体提取	109,555	新闻报道	文本	CC BY-SA 4.0	[10]
NER	命名实体识别	1,685	新闻报道	文本	Public	[11]
FPB	情感分析	4,845	新闻	文本	MIT license	[12]
FIQASA	情感分析	1,173	新闻头条、推文	文本	MIT license	[12]
Headlines	新闻标题分类	11,412	新闻头条	文本	MIT license	[12]
BigData	股票走势预测	7,164	推文、历史价格	文本, 时间序列	MIT license	[12]
ACL	股票走势预测	27,053	推文、历史价格	文本, 时间序列	MIT license	[12]
CIKM	股票走势预测	4,967	推文、历史价格	文本, 时间序列	MIT license	[12]
FinQA	金融问答	14,900	收益报告	文本, 表格	MIT license	[12]
ConvFinQA	多轮问答	48,364	收益报告	文本, 表格	MIT license	[12]

Xu L, Hu H, Zhang X, et al. CLUE: A Chinese language understanding evaluation benchmark[J]. arXiv preprint arXiv:2004.05986, 2020.
Jing Chen, Qingcai Chen, Xin Liu, Haijun Yang, Daohe Lu, and Buzhou Tang. 2018. The BQ Corpus: A Large-scale Domain-specific Chinese Corpus For Sentence Semantic Equivalence Identification. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4946–4951, Brussels, Belgium. Association for Computational Linguistics.
Jinan Zou, Haiyao Cao, Lingqiao Liu, Yuhao Lin, Ehsan Abbasnejad, and Javen Qinfeng Shi. 2022. Astock: A New Dataset and Automated Stock Trading based on Stock-specific News Analyzing Model. In Proceedings of the Fourth Workshop on Financial Technology and Natural Language Processing (FinNLP), pages 178–186, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Zhang L, Cai W, Liu Z, et al. FinEval: A Chinese Financial Domain Knowledge Evaluation Benchmark for Large Language Models[J]. arxiv preprint arxiv:2308.09975, 2023.
Lu D, Liang J, Xu Y, et al. BBT-Fin: Comprehensive Construction of Chinese Financial Domain Pre-trained Language Model, Corpus and Benchmark[J]. arxiv preprint arxiv:2302.09432, 2023.
https://huggingface.co/datasets/kuroneko5943/stock11
https://www.biendata.xyz/competition/ccks_2019_4/
https://www.biendata.xyz/competition/ccks_2020_4_1/
https://www.biendata.xyz/competition/ccks_2021_task6_2/
https://www.biendata.xyz/competition/ccks2022_eventext/
Jia C, Shi Y, Yang Q, et al. Entity enhanced BERT pre-training for Chinese NER[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020: 6384-6396.
Xie Q, Han W, Zhang X, et al. PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark for Finance[J]. arXiv preprint arXiv:2306.05443, 2023.

评估

准备工作

本地安装

git clone https://github.com/chancefocus/PIXIU.git --recursive
cd PIXIU
pip install -r requirements.txt
cd PIXIU/src/financial-evaluation
pip install -e .[multilingual]

Docker 镜像

sudo bash scripts/docker_run.sh

以上命令会启动一个 docker 容器，你可以根据自己的环境修改 docker_run.sh。我们通过运行 sudo docker pull tothemoon/pixiu:latest 来提供预编译镜像。

docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \
    --network host \
    --env https_proxy=$https_proxy \
    --env http_proxy=$http_proxy \
    --env all_proxy=$all_proxy \
    --env HF_HOME=$hf_home \
    -it [--rm] \
    --name pixiu \
    -v $pixiu_path:$pixiu_path \
    -v $hf_home:$hf_home \
    -v $ssh_pub_key:/root/.ssh/authorized_keys \
    -w $workdir \
    $docker_user/pixiu:$tag \
    [--sshd_port 2201 --cmd "echo 'Hello, world!' && /bin/bash"]

参数说明:

[] 表示可忽略的参数
HF_HOME: huggingface 缓存目录
sshd_port: 容器的 sshd 端口，可以运行 ssh -i private_key -p $sshd_port root@$ip 来连接容器，默认为 22001
--rm: 退出容器时移除容器（即 CTRL + D）

自动化任务评估

在评估前, 请下载 punto de control BART 到 src/metrics/BARTScore/bart_score.pth.

如需进行自动评估，请按照以下说明操作：

Transformador Huggingface

要评估 HuggingFace Hub 上托管的模型（例如，finma-7b-full），请使用此命令：

python eval.py \
    --model "hf-causal-llama" \
    --model_args "use_accelerate=True,pretrained=chancefocus/finma-7b-full,tokenizer=chancefocus/finma-7b-full,use_fast=False" \
    --tasks "flare_ner,flare_sm_acl,flare_fpb"

更多详情，请参阅 lm_eval 文档。

商用接口

请注意，对于 NER 等任务，自动评估是基于特定模式进行的。这可能无法提取零镜头设置中的相关信息，导致性能相对低于之前的人工标注结果。

export OPENAI_API_SECRET_KEY=YOUR_KEY_HERE
python eval.py \
    --model gpt-4 \
    --tasks flare_ner,flare_sm_acl,flare_fpb

许可证

PIXIU 采用 [MIT] 许可。有关详细信息，请参阅 MIT 文件。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.zh.md

README.zh.md

Overview

Key Features

FLARE_ZH: Financial Language Understanding and Prediction Evaluation Benchmark

Tasks

Evaluation

Preparation

Locally install

Docker image

Automated Task Assessment

License

Star History

概述

主要特征

FLARE_ZH: 金融语言理解和预测评估基准

任务

评估

准备工作

本地安装

Docker 镜像

自动化任务评估

许可证

星标历史

Files

README.zh.md

Latest commit

History

README.zh.md

File metadata and controls

Overview

Key Features

FLARE_ZH: Financial Language Understanding and Prediction Evaluation Benchmark

Tasks

Evaluation

Preparation

Locally install

Docker image

Automated Task Assessment

License

Star History

概述

主要特征

FLARE_ZH: 金融语言理解和预测评估基准

任务

评估

准备工作

本地安装

Docker 镜像

自动化任务评估

许可证

星标历史