First install snakemake
following these instructions
To download and regrid a CMIP6 dataset to a common resolution (e.g., 1.406525 degree), go to the corresponding directory inside snakemake_configs
and run
snakemake all --configfile config_2m_temperature.yml --cores 8
This script will download and regrid the 2m_temperature
data in parallel using 8 CPU cores. Modify configfile
for other variables. After downloading and regrdding, run the following script to preprocess the .nc
files into .npz
format for pretraining ClimaX
python src/data_preprocessing/nc2np_equally_cmip6.py \
--dataset mpi
--path /data/CMIP6/MPI-ESM/1.40625deg/
--num_shards 10
--save_dir /data/CMIP6/MPI-ESM/1.40625deg_np_10shards
in which num_shards
denotes the number of chunks to break each .nc
file into.
python src/climax/pretrain/train.py --config <path/to/config>
For example, to pretrain ClimaX on MPI-ESM dataset on 8 GPUs use
python src/climax/pretrain/train.py --config configs/pretrain_climax.yaml \
--trainer.strategy=ddp --trainer.devices=8 \
--trainer.max_epochs=100 \
--data.batch_size=16 \
--model.lr=5e-4 --model.beta_1="0.9" --model.beta_2="0.95" \
--model.weight_decay=1e-5
!!! tip Make sure to update the paths of the data directories in the config files (or override them via the CLI).
We provide two pretrained checkpoints, one was pretrained on 5.625deg data, and the other was pretrained on 1.40625deg data. Both checkpoints were pretrained using all 5 CMIP6 datasets.
Usage: We can load the checkpoint by passing the checkpoint url to the training script. See below for examples.
First, download ERA5 data from WeatherBench. The data directory should look like the following
5.625deg
|-- 10m_u_component_of_wind
|-- 10m_v_component_of_wind
|-- 2m_temperature
|-- constants.nc
|-- geopotential
|-- relative_humidity
|-- specific_humidity
|-- temperature
|-- toa_incident_solar_radiation
|-- total_precipitation
|-- u_component_of_wind
|-- v_component_of_wind
Then, preprocess the netcdf data into small numpy files and compute important statistics
python src/data_preprocessing/nc2np_equally_era5.py \
--root_dir /mnt/data/5.625deg \
--save_dir /mnt/data/5.625deg_npz \
--start_train_year 1979 --start_val_year 2016 \
--start_test_year 2017 --end_year 2019 --num_shards 8
The preprocessed data directory will look like the following
5.625deg_npz
|-- train
|-- val
|-- test
|-- normalize_mean.npz
|-- normalize_std.npz
|-- lat.npy
|-- lon.npy
To finetune ClimaX for global forecasting, use
python src/climax/global_forecast/train.py --config <path/to/config>
For example, to finetune ClimaX on 8 GPUs use
python src/climax/global_forecast/train.py --config configs/global_forecast_climax.yaml \
--trainer.strategy=ddp --trainer.devices=8 \
--trainer.max_epochs=50 \
--data.root_dir=/mnt/data/5.625deg_npz \
--data.predict_range=72 --data.out_variables=['z_500','t_850','t2m'] \
--data.batch_size=16 \
--model.pretrained_path='https://huggingface.co/tungnd/climax/resolve/main/5.625deg.ckpt' \
--model.lr=5e-7 --model.beta_1="0.9" --model.beta_2="0.99" \
--model.weight_decay=1e-5
To train ClimaX from scratch, set --model.pretrained_path=""
.
We use the same ERA5 data as in global forecasting and extract the regional data on the fly during training. If you have already downloaded and preprocessed the data, you do not have to do it again.
To finetune ClimaX for regional forecasting, use
python src/climax/regional_forecast/train.py --config <path/to/config>
For example, to finetune ClimaX on North America using 8 GPUs, use
python src/climax/regional_forecast/train.py --config configs/regional_forecast_climax.yaml \
--trainer.strategy=ddp --trainer.devices=8 \
--trainer.max_epochs=50 \
--data.root_dir=/mnt/data/5.625deg_npz \
--data.region="NorthAmerica"
--data.predict_range=72 --data.out_variables=['z_500','t_850','t2m'] \
--data.batch_size=16 \
--model.pretrained_path='https://huggingface.co/tungnd/climax/resolve/main/1.40625deg.ckpt' \
--model.lr=5e-7 --model.beta_1="0.9" --model.beta_2="0.99" \
--model.weight_decay=1e-5
To train ClimaX from scratch, set --model.pretrained_path=""
.
First, download ClimateBench data. ClimaX can work with either the original ClimateBench data or the regridded version. In the experiment in the paper, we regridded to ClimateBench data to 5.625 degree. To do that, run
python src/data_preprocessing/regrid_climatebench.py /mnt/data/climatebench/train_val \
--save_path /mnt/data/climatebench/5.625deg/train_val --ddeg_out 5.625
and
python src/data_preprocessing/regrid_climatebench.py /mnt/data/climatebench/test \
--save_path /mnt/data/climatebench/5.625deg/test --ddeg_out 5.625
To finetune ClimaX for climate projection, use
python src/climax/climate_projection/train.py --config <path/to/config>
For example, to finetune ClimaX on 8 GPUs use
python python src/climax/climate_projection/train.py --config configs/climate_projection.yaml \
--trainer.strategy=ddp --trainer.devices=8 \
--trainer.max_epochs=50 \
--data.root_dir=/mnt/data/climatebench/5.625deg \
--data.out_variables="tas" \
--data.batch_size=16 \
--model.pretrained_path='https://huggingface.co/tungnd/climax/resolve/main/5.625deg.ckpt' \
--model.out_vars="tas" \
--model.lr=5e-4 --model.beta_1="0.9" --model.beta_2="0.99" \
--model.weight_decay=1e-5
To train ClimaX from scratch, set --model.pretrained_path=""
.
Coming soon