FinRL-DeepSeek: LLM-Infused Risk-Sensitive Reinforcement Learning for Trading Agents

Blog: https://melwy.com/finrl_deepseek

Paper: https://arxiv.org/abs/2502.07393

Update1: The project is integrated to the original FinRL project by AI4Finance!

Update2: The project is the basis of task 1 in FinRL contest 2025!

Installation script: installation_script.sh

Data: https://huggingface.co/datasets/benstaf/nasdaq_2013_2023/tree/main

Trading agents: https://huggingface.co/benstaf/Trading_agents/tree/main

Backtesting Notebook:

Results

Preliminary conclusion

Bull market -> PPO

Bear market -> CPPO-DeepSeek

More details on installation of dependencies

run installation_script.sh on Ubuntu server (128 GB RAM CPU instance recommended)

Datasets and data preprocessing

The basic dataset is FNSPID: https://huggingface.co/datasets/Zihan1004/FNSPID (the relevant file is Stock_news/nasdaq_exteral_data.csv)

https://github.com/Zdong104/FNSPID_Financial_News_Dataset

https://arxiv.org/abs/2402.06698

LLM signals are added by running sentiment_deepseek_deepinfra.py and risk_deepseek_deepinfra.py, to obtain:

Then this data is processed by train_trade_data_deepseek_sentiment.py and train_trade_data_deepseek_risk.py to generate agent-ready datasets.
For plain PPO and CPPO, train_trade_data.py is used.

Training and Environments

For training PPO, run:
nohup mpirun --allow-run-as-root -np 8 python train_ppo.py > output_ppo.log 2>&1 &
For CPPO: train_cppo.py
For PPO-DeepSeek: train_ppo_llm.py
For CPPO-DeepSeek: train_cppo_llm_risk.py

Environment files are:

env_stocktrading.py for PPO and CPPO, same as in the original FinRL
env_stocktrading_llm.py or env_stocktrading_llm_01.py for PPO-DeepSeek (depending on the desired LLM influence. More tweaking would be interesting)
env_stocktrading_llm_risk.py or env_stocktrading_llm_risk_01.py for CPPO-DeepSeek

Log files are output_ppo.log, etc., and should be monitored during training, especially:

AverageEpRet
KL
ClipFrac

Evaluation

Evaluation in the trading phase (2019-2023) happens in the FinRL_DeepSeek_backtest.ipynb Colab notebook.
Metrics used are Information Ratio, CVaR, and Rachev Ratio, but adding others like Outperformance frequency would be nice.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
FinRL_DeepSeek_backtest.ipynb		FinRL_DeepSeek_backtest.ipynb
IMG_20250207_175434_001.jpg		IMG_20250207_175434_001.jpg
LICENSE		LICENSE
README.md		README.md
env_stocktrading.py		env_stocktrading.py
env_stocktrading_llama.py		env_stocktrading_llama.py
env_stocktrading_llama_risk.py		env_stocktrading_llama_risk.py
env_stocktrading_llm.py		env_stocktrading_llm.py
env_stocktrading_llm_01.py		env_stocktrading_llm_01.py
env_stocktrading_llm_1.py		env_stocktrading_llm_1.py
env_stocktrading_llm_risk.py		env_stocktrading_llm_risk.py
env_stocktrading_llm_risk_01.py		env_stocktrading_llm_risk_01.py
env_stocktrading_llm_risk_1.py		env_stocktrading_llm_risk_1.py
hugging_face_upload.py		hugging_face_upload.py
installation_script.sh		installation_script.sh
output_cppo.log		output_cppo.log
output_cppo_deepseek_risk_100_epochs.log		output_cppo_deepseek_risk_100_epochs.log
output_cppo_deepseek_risk_100_epochs_01.log		output_cppo_deepseek_risk_100_epochs_01.log
output_cppo_deepseek_risk_100_epochs_99_101.log		output_cppo_deepseek_risk_100_epochs_99_101.log
output_cppo_llama_risk.log		output_cppo_llama_risk.log
output_cppo_llama_risk_100_epochs_98_95.log		output_cppo_llama_risk_100_epochs_98_95.log
output_cppo_risk_deepseek.log		output_cppo_risk_deepseek.log
output_ppo.log		output_ppo.log
output_ppo_deepseek.log		output_ppo_deepseek.log
output_ppo_deepseek_100_epochs_01.log		output_ppo_deepseek_100_epochs_01.log
output_ppo_deepseek_100_epochs_96_104_bumps.log		output_ppo_deepseek_100_epochs_96_104_bumps.log
output_ppo_deepseek_100_epochs_99_101_bumps.log		output_ppo_deepseek_100_epochs_99_101_bumps.log
output_ppo_llama.log		output_ppo_llama.log
output_ppo_llama_100_epochs_98_95_sentiment.log		output_ppo_llama_100_epochs_98_95_sentiment.log
output_ppo_llm_deepseek.log		output_ppo_llm_deepseek.log
risk_deepseek_deepinfra.py		risk_deepseek_deepinfra.py
sentiment_deepseek_deepinfra.py		sentiment_deepseek_deepinfra.py
train_cppo.py		train_cppo.py
train_cppo_llama_risk.py		train_cppo_llama_risk.py
train_cppo_llm_old.py		train_cppo_llm_old.py
train_cppo_llm_risk.py		train_cppo_llm_risk.py
train_cppo_llm_risk_01.py		train_cppo_llm_risk_01.py
train_ppo.py		train_ppo.py
train_ppo_llama.py		train_ppo_llama.py
train_ppo_llm.py		train_ppo_llm.py
train_trade_data.py		train_trade_data.py
train_trade_data_deepseek_risk.py		train_trade_data_deepseek_risk.py
train_trade_data_deepseek_sentiment.py		train_trade_data_deepseek_sentiment.py
train_trade_data_deepseek_sentimwnt.py		train_trade_data_deepseek_sentimwnt.py
train_trade_data_llama_risk.py		train_trade_data_llama_risk.py
train_trade_data_qwen_risk.py		train_trade_data_qwen_risk.py
train_trade_data_sentiment_chunk.py		train_trade_data_sentiment_chunk.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FinRL-DeepSeek: LLM-Infused Risk-Sensitive Reinforcement Learning for Trading Agents

Results

Preliminary conclusion

More details on installation of dependencies

Datasets and data preprocessing

Training and Environments

Evaluation

About

Releases

Packages

Languages

License

AI4Finance-Foundation/FinRL_DeepSeek

Folders and files

Latest commit

History

Repository files navigation

FinRL-DeepSeek: LLM-Infused Risk-Sensitive Reinforcement Learning for Trading Agents

Results

Preliminary conclusion

More details on installation of dependencies

Datasets and data preprocessing

Training and Environments

Evaluation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages