Blog: https://melwy.com/finrl_deepseek
Paper: https://arxiv.org/abs/2502.07393
Update1: The project is integrated to the original FinRL project by AI4Finance!
Update2: The project is the basis of task 1 in FinRL contest 2025!
Installation script: installation_script.sh
Data: https://huggingface.co/datasets/benstaf/nasdaq_2013_2023/tree/main
Trading agents: https://huggingface.co/benstaf/Trading_agents/tree/main
Bull market -> PPO
Bear market -> CPPO-DeepSeek
run installation_script.sh
on Ubuntu server (128 GB RAM CPU instance recommended)
The basic dataset is FNSPID:
https://huggingface.co/datasets/Zihan1004/FNSPID (the relevant file is Stock_news/nasdaq_exteral_data.csv
)
https://github.com/Zdong104/FNSPID_Financial_News_Dataset
https://arxiv.org/abs/2402.06698
LLM signals are added by running sentiment_deepseek_deepinfra.py
and risk_deepseek_deepinfra.py
, to obtain:
- https://huggingface.co/datasets/benstaf/nasdaq_news_sentiment
- https://huggingface.co/datasets/benstaf/risk_nasdaq
Then this data is processed by train_trade_data_deepseek_sentiment.py
and train_trade_data_deepseek_risk.py
to generate agent-ready datasets.
For plain PPO and CPPO, train_trade_data.py
is used.
-
For training PPO, run:
nohup mpirun --allow-run-as-root -np 8 python train_ppo.py > output_ppo.log 2>&1 &
-
For CPPO:
train_cppo.py
-
For PPO-DeepSeek:
train_ppo_llm.py
-
For CPPO-DeepSeek:
train_cppo_llm_risk.py
Environment files are:
env_stocktrading.py
for PPO and CPPO, same as in the original FinRLenv_stocktrading_llm.py
orenv_stocktrading_llm_01.py
for PPO-DeepSeek (depending on the desired LLM influence. More tweaking would be interesting)env_stocktrading_llm_risk.py
orenv_stocktrading_llm_risk_01.py
for CPPO-DeepSeek
Log files are output_ppo.log
, etc., and should be monitored during training, especially:
AverageEpRet
KL
ClipFrac
Evaluation in the trading phase (2019-2023) happens in the FinRL_DeepSeek_backtest.ipynb
Colab notebook.
Metrics used are Information Ratio
, CVaR
, and Rachev Ratio
, but adding others like Outperformance frequency
would be nice.