This repository contains a series of scripts for data preprocessing, feature engineering, and feature importance analysis.
- Python 3.7 or higher
- pip (Python package installer)
- Install the required packages:
pip install -r requirements.txt
- Create a new directory
raw_data
and add the following files to it:factor_char_list.csv
hackathon_sample_v2.csv
mkt_ind.csv
Follow these steps to run the complete analysis:
-
Data Preprocessing:
cd 01-Data_Preprocessing python preprocessing_code.py cd ..
-
Feature Engineering:
cd 02-Feature_Engineering python feature_engineering_code.py cd ..
-
Feature Importance:
cd 03-Feature_Importance python feature_importance_code.py python feature_selection_code.py cd ..
-
Causal Discovery:
cd 0X-Causal_discovery python discovery.py cd ..
-
Predictor:
cd 04-Predictor python train_AlphaSignals.py cd ..
-
Backtesting
cd 06-Backtesting python backtest_parallel.py cd ..
-
Chain of Thought Zero Shot Features Download the dataset from here and put it in the
datasets
directory. Login into Hugging Face and download the model here.cd 0L-CoTZeroShotFeatures python create_dataset.py python llama-3.2-3B-Instruct-Inference-READABILITY-SCORE.py python llama-3.2-3B-Instruct-Inference-RISK_FACTORS.py python llama-3.2-3B-Instruct-Inference-SENTIMENT-SCORES.py cd ..
After running all the scripts, you'll find the following output:
objects/FULL_stacked_data.pkl
objects/causal_dataset.pkl
objects/WEIGHT_SAMPLING.pkl
objects/mkt_ind.csv
objects/X_DATASET.pkl
objects/Y_DATASET.pkl
objects/predictions_0.csv to predictions_13.csv
objects/predictions.csv
-- Final predictions on the test datasetobjects/prices.pkl
-- Dataframe of the prices of the assetsobjects/signals.pkl
-- Timeseries of signals from each stockobjects/market_caps.pkl
-- Dataframe of market caps of the assetsobjects/stock_exret.pkl
-- Excess returns of the each stock
The feature importance scores from MDI and MDA are:
For more details on each step of the process, please refer to the comments and docstrings within each script.