|
| 1 | + |
| 2 | + |
| 3 | +Python library to train neural networks with a strong focus on hydrological applications. |
| 4 | + |
| 5 | +This package has been used extensively in research over the last year and was used in various academic publications. |
| 6 | +The core idea of this package is modularity in all places to allow easy integration of new datasets, new model |
| 7 | +architectures or any training related aspects (e.g. loss functions, optimizer, regularization). |
| 8 | +One of the core concepts of this code base are configuration files, which lets anyone train neural networks without |
| 9 | +touching the code itself. The `NeuralHydrology` package is build on top of the deep learning framework |
| 10 | +[Pytorch](https://pytorch.org/), since it has proven to be the most flexible and useful for research purposes. |
| 11 | + |
| 12 | +We (AI for Earth Science group at Institute for Machine Learning, Johannes Kepler University, Linz, Austria) are using |
| 13 | +this code in our day-to-day research and will continue to integrate our new research findings into this public repository. |
| 14 | + |
| 15 | +**Note:** We will gradually add more examples/documentation over the next couple of days/weeks. |
| 16 | + |
| 17 | +- Documentation: [neuralhydrology.readthedocs.io](neuralhydrology.readthedocs.io) |
| 18 | +- Research Blog: [neuralhydrology.github.io](neuralhydrology.github.io) |
| 19 | +- Bug reports/Feature requests [https://github.com/neuralhydrology/neuralhydrology/issues](https://github.com/neuralhydrology/neuralhydrology/issues) |
| 20 | + |
| 21 | +# Getting started |
| 22 | + |
| 23 | +## Requirements |
| 24 | + |
| 25 | +We recommend to use Anaconda/Miniconda. With one of the two installed, a dedicated environment with all requirements |
| 26 | +installed can be set up from the environment files provided in |
| 27 | +[environments](https://github.com/neuralhydrology/neuralhydrology/environments). |
| 28 | + |
| 29 | +If you have no CUDA capable GPU available run |
| 30 | + |
| 31 | +```bash |
| 32 | +conda env create -f environments/environment_cpu.yml |
| 33 | +``` |
| 34 | + |
| 35 | +With a CUDA capable GPU available, check which CUDA version your GPU supports and then run e.g. (for CUDA 10.2) |
| 36 | + |
| 37 | +```bash |
| 38 | +conda env create -f environments/environment_cuda10_2.yml |
| 39 | +``` |
| 40 | + |
| 41 | +If neither Minicoda/Anaconda are available, make sure to Python environment with all packages installed that are listed |
| 42 | +in one of the environment files. |
| 43 | + |
| 44 | +## Installation |
| 45 | + |
| 46 | +For now download or clone the repository to your local machine and install a local, editable copy. |
| 47 | +This is a good idea if you want to edit the ``neuralhydrology`` code (e.g., adding new models or datasets).:: |
| 48 | + |
| 49 | +```bash |
| 50 | + git clone https://github.com/neuralhydrology/neuralhydrology.git |
| 51 | + cd neuralhydrology |
| 52 | + pip install -e . |
| 53 | +``` |
| 54 | +Besides adding the package to your Python environment, it will also add three bash scripts: |
| 55 | +`nh-run`, `nh-run-scheduler` and `nh-results-ensemble`. For details, see below. |
| 56 | + |
| 57 | + |
| 58 | +## Data |
| 59 | + |
| 60 | +Training and evaluating models requires a dataset. |
| 61 | +If you're unsure where to start, a common dataset is CAMELS US, available at |
| 62 | +[CAMELS US (NCAR)](https://ral.ucar.edu/solutions/products/camels). |
| 63 | +Download the "CAMELS time series meteorology, observed flow, meta data" and place the actual data folder |
| 64 | +(`basin_dataset_public_v1p2`) in a directory. |
| 65 | +This directory will be referred to as the "data directory", or `data_dir`. |
| 66 | + |
| 67 | +## Configuration file |
| 68 | + |
| 69 | +One of the core concepts of this package is the usage of configuration files (`.yml`). Basically, all configurations |
| 70 | +required to train a neural network can be specified via these configuration files and no code has to be touched. |
| 71 | +Training a model does require a `.yml` file that specifies the run configuration. We will add a detailed explanation |
| 72 | +for within the next weeks that explains the config files and arguments in more detail. For now refer to the |
| 73 | +[example config](https://github.com/neuralhydrology/neuralhydrology/blob/master/examples/config.yml.example) for a full |
| 74 | +list of all available arguments (with inline documentation). For an example of a configuration file that can be used to |
| 75 | +train a standard LSTM for a single CAMELS US basin, check |
| 76 | +[1_basin_config.yml](https://github.com/neuralhydrology/neuralhydrology/blob/master/examples/1_basin_config.yml.example). |
| 77 | + |
| 78 | +## Train a model |
| 79 | + |
| 80 | +To train a model, prepare a configuration file, then run:: |
| 81 | + |
| 82 | +```bash |
| 83 | + nh-run train --config-file /path/to/config.yml |
| 84 | +``` |
| 85 | +If you want to train multiple models, you can make use of the ``nh-run-scheduler`` command. |
| 86 | +Place all configs in a folder, then run:: |
| 87 | +```bash |
| 88 | + nh-run-scheduler train --config-dir /path/to/config_dir/ --runs-per-gpu X --gpu-ids Y |
| 89 | +``` |
| 90 | +With X, you can specify how many models should be trained on parallel on a single GPU. |
| 91 | +With Y, you can specify which GPUs to use for training (use the id as specified in ``nvidia-smi``). |
| 92 | + |
| 93 | +## Evaluate a model |
| 94 | + |
| 95 | +To evaluate a trained model on the test set, run:: |
| 96 | + |
| 97 | + nh-run evaluate --run-dir /path/to/run_dir/ |
| 98 | + |
| 99 | +If the optional argument ``--epoch N`` (where N is the epoch to evaluate) is not specified, |
| 100 | +the weights of the last epoch are used. You can also use ``--period `` if you want to evaluate the model on the |
| 101 | +train period ``--period train``) or validation period (``--period validation``) |
| 102 | + |
| 103 | +To evaluate all runs in a specific directory you can, similarly to training, run:: |
| 104 | + |
| 105 | + nh-run-scheduler evaluate --run-dir /path/to/config_dir/ --runs-per-gpu X --gpu-ids Y |
| 106 | + |
| 107 | + |
| 108 | +To merge the predictons of a number of runs (stored in ``$DIR1``, ...) into one averaged ensemble, |
| 109 | +use the ``nh-results-ensemble`` script:: |
| 110 | + |
| 111 | + nh-results-ensemble --run-dirs $DIR1 $DIR2 ... --save-file /path/to/target/file.p --metrics NSE MSE ... |
| 112 | + |
| 113 | +``--metrics`` specifies which metrics will be calculated for the averaged predictions. |
| 114 | + |
| 115 | +# Contact |
| 116 | + |
| 117 | +If you have any questions regarding the usage of this repository, feature requests or comments, please open an issue. |
| 118 | +You can also reach out to Frederik Kratzert (kratzert(at)ml.jku.at) by email. |
0 commit comments