Python package for implementing the UNSEEN (UNprecedented Simulated Extremes using ENsembles; Thompson et al 2017) approach to assessing the likelihood of extreme events.
The UNSEEN package depends on the following libraries:
cftime cmdline_provenance dask dask-jobqueue geopandas gitpython matplotlib netCDF4 numpy pytest pyyaml regionmask scipy xarray xclim xskillscore zarr
... plus xks
if you're doing similarity testing.
All these libraries (except xks
) can be installed via conda.
To create a new environment with all the libraries installed,
use either of the following commands:
$ conda create -n unseen cftime cmdline_provenance dask dask-jobqueue ...
$ conda env create -f environment.yml
The environment.yml
file includes other useful analysis libraries
such as cartopy and jupyter.
The xks
package is only needed for similarity testing
(i.e. when using unseen/similarity.py
).
It isn't available on PyPI,
so in order to install it in your environment you'll need to clone
the xks
repository and pip install as follows:
$ git clone https://github.com/dougiesquire/xks
$ cd xks
$ pip install .
The UNSEEN package isn't currently available on PyPI, so in order to install it in your conda environemnt (along with all the dependencies) you'll need to clone this repository and pip install as follows:
$ git clone https://github.com/AusClimateService/unseen
$ cd unseen
$ pip install .
If you're thinking of modifying and possibly contributing changes to the package, follow the installation instructions in CONTRIBUTING.md instead.
Key functions for UNSEEN analysis are contained within the modules listed below.
A number can also be run as command line programs.
For example, once you've pip installed the unseen package,
the bias_correction.py
module can be run as a command line program
by simply running bias_correction
at the command line
(use the -h
option for details).
Functions for array handling and manipulation:
stack_by_init_date
: Stack time series array in initial date / lead time formatreindex_forecast
: Switch out lead_time axis for time axis (or vice versa) in a forecast datasettime_to_lead
: Convert from time to (a newly created) lead_time dimension
Functions (and command line program) for bias correction:
get_bias
: Calculate forecast biasremove_bias
: Remove model bias
Utilities for repeated random sampling:
n_random_resamples
: Repeatedly randomly resample from provided xarray objects and return the results of the subsampled dataset passed through a provided function
Setup dask scheduling client:
launch_client
: Launch a dask client
Functions and command line program for file I/O:
open_file
: create an xarray dataset from a single netCDF or zarr file, with many processing options (spatial/temporal selection and aggregation, unit conversion, etc)open_mfzarr
: open multiple zarr filesopen_mfforecast
: open multi-file forecastget_new_log
: generate command log for output fileto_zarr
: write to zarr file
General utility functions:
convert_units
: Convert units
Functions and command line program for independence testing
Climate indices:
calc_drought_factor
: Calculate the Drought Factor indexcalc_FFDI
: Calculate the McArthur Forest Fire Danger Indexcalc_wind_speed
: Calculate wind speedfit_gev
: Fit a GEV
Funcitons and command line program for similarity testing:
univariate_ks_test
: Univariate KS test
Functions for spatial selection (point, box and shapefile)
Utilities for working with time axes and values
When reading an input file using fileio.open_file
you can use the metadata_file
keyword argument to pass
a YAML file specifying required file metadata changes.
The valid keys for the YAML file are
rename
, drop_coords
, round_coords
and units
.
The rename
key is typically used to enforce standard variable names
(lat
, lon
and time
are required by many functions).
For example,
the following YAML file renames a bunch of variables
(including lat
, lon
and time
),
adds a missing units attribute,
deletes some unneeded coordinates,
and rounds coordinate values
(see fileio.fix_metadata
for details).
rename:
initial_time0_hours: time
latitude: lat
longitude: lon
precip: pr
u_ref: uas
v_ref: vas
units:
pr: 'mm d-1'
drop_coords:
- average_DT
- average_T1
- average_T2
- zsurf
- area
round_coords:
- lat
The config/
directory contains a series of YAML files
that describe changes that need to be made to the metadata
associated with a number of different datasets.
When launching a dask client using dask_setup.launch_client
a configuration YAML file is required.
These files might look something like
the following for a local cluster, PBS Cluster (e.g. on NCI) and SLURM Cluster respectively:
LocalCluster: {}
temporary_directory: /g/data/xv83/dbi599/
PBSCluster:
processes: 1
walltime: '01:00:00'
cores: 24
memory: 48GB
job_extra:
- '-l ncpus=24'
- '-l mem=48GB'
- '-P ux06'
- '-l jobfs=100GB'
- '-l storage=gdata/xv83+gdata/v14+scratch/v14'
local_directory: $PBS_JOBFS
header_skip:
- select
SLURMCluster:
cores: 12
memory: 72GB
walltime: '02:00:00'
In other words, the YAML files contain the keyword arguments for
dask.distributed.LocalCluster
, dask_jobqueue.PBSCluster
or dask_jobqueue.SLURMCluster
.