Plotting scripts for "Evaluation of CMIP6 GCMs over the CONUS for downscaling studies"
We include the starting data in this repo in the data
directory. If you wish to put data in another directory, you may do so, as long as you pass the path to our run script, run.sh
, later as descrbed in the running section and put all the input data there.
The input data files are as follows:
-
unweighted_data_37gcms_66metrics_0708.nc
contains the relative errors for each GCM on each metric in eahc region. The order of the GCMs and metrics are described by the other two files, and the regions are in the order north, east, west, and south. -
gcms_names.txt
contains the names of the GCMs in the file above, in the same order as they are presented there; the$i$ -th line of this file identifies the$i$ -th GCM in that file -
metrics_list.txt
contains the names of the metrics in the first file, in the same order as they are presented there; the$i$ -th line of this file identifies the$i$ -th metric in that file
Our scripts are written in both the NCAR Command Language (NCL) and Python. Start by making sure you have a working install of both languages. Note that while we used NCL version 6.6.2 and Python version 3.10.4, other versions may still work.
For Python, we recommend using an Anaconda installation, as the included package management tools will made installing the various Python packages much easier. We use the following packages in our Python scripts:
- pandas
- numpy
- xarray
- networkx
- statsmodels
- matplotlib
- seaborn
- pygraphviz
For convenience, we provide a file containing a list of all the packages installed in our local Anaconda environment,
environment.yml
. This file can be used to create and activate a new Anaconda environment containign the neccessary dependencies bycd
ing into thesrc
directory and running:conda env create environment.yml
Note that this may take a while. Once this is done, the environment can be activated like soconda activate gcms
To generate the plots and data files for the paper, just cd
into src
and run
./run.sh <data_dir>
if you decided to put the input data in a directory other than data
. Otherwise you can leave off the <data_dir>
argument, and run with:
./run.sh
This will run all the neccessary scripts. By default, this will put all of the output date in data
alongside the input data in this repo.
The individual scripts run in this way are:
-
generate_data.ncl
calculates the pairwise correlations, similarity score, weights, and weighted relative error data given the unweighted relative error data -
radius_of_similarity.ncl
performs the calculations needed for deciding the$D_x$ value.$D_x = 0.3$ is used based on this analysis -
plot_rankings_diffs.py
calcuates GCM rankings and visualizes how they change as additional metrics are considered (in order of their weight) -
plot_cos_sim.py
computes the cosine similarity between different GCMs and visualises it, including by showing the distribution of cosine similarity values -
plot_network.py
creates and plots a network with GCMs as nodes by creating edges between GCMs which have a cosine similarity above a certain threshold (in this case,$0.8$ ) -
heatmap_sample.ncl
is a sample script demonstrating how to plot heatmaps. This example plots pairwise metrics correlations for each region, given the weighted data, and the lists of GCM and metric names
The following files are generated by running the scripts as described above:
-
data_cor_ss_ww.nc
contains the weighted and unweighted relative errors (weighted_data
andunweighted_data
), weights (ww
), similarity scores (ss
), and pairwise correlatations (cor
) of the GCMs for each metric on each region -
ros_100.nc
contains the radius of similarity for each region and metrics for variois values of$D_x$ -
weighted_data_rank_diff_by_weighted_std.nc
contains the incremental change in the rankings of each GCM as each metric is added in (weighted_data_rank_diff_by_weighted_std_vs_num_metrics.pdf
contains a plot displaying this information) -
weighted_data_relative_ranks_by_weighted_std.nc
contains the diffences between each GCM's current and final rankings as each metrics is added in (weighted_data_relative_ranks_by_weighted_std_vs_num_metrics.pdf
contains a plot displaying this information) -
weighted_data_rank_scores_by_weighted_std.nc
contains the weighted mean relative error for each GCM as each metric is added in (weighted_data_rank_scores_by_weighted_std_vs_num_metrics.pdf
contains a plot of this information) -
model_weighted_data_cos_sim_distrib.csv
cotains a list of all the cosine similarity scores strictly above the diagonal to avoid all of the trivial scores along the diagonal and dupkicate scores below it, as caclulating the cosine similarity is a commutative operation (model_weighted_data_cos_sim_distrib.pdf
contains a plot of the distribution) -
model_weighted_data_cos_sim.csv
contains a matrix with the cosine similarity scores for each pair of GCMs (model_weighted_data_cos_sim.pdf
contains a plot of this data) -
dot_cos_sim_0.8_network.pdf
contains a depliction of the network formed by contecting GCMS with a cosine similarity score of at least$0.8$ . -
weighted_data_sorted_metrics_by_weighted_std.csv
andweighted_data_sorted_models_by_weighted_std.csv
contain metric and GCMs in order of their weighted standard deviations and weighted relativve errors, respectively, alongside the values used to rank them.