-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
adding explainations to the verify documentation
- Loading branch information
Showing
10 changed files
with
280 additions
and
18 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,94 @@ | ||
# Forecast Verification Steps | ||
# Forecast Verification Workflow | ||
|
||
This subfolder contains scripts to facilitate the verification of newly produced forecasts. This is an evolving section of the repository, and contributions are welcome to enhance the pipeline and expand verification capabilities. | ||
|
||
This sub folder provides scripts to aide in the verification of newly produced forecasts. | ||
Forecast verification is compute-intensive and often requires parallel processing. Our strategy leverages many small queued jobs to handle the workload efficiently. This document outlines the steps required to verify a forecast against the ERA5 dataset and compare it to your forecast system. | ||
|
||
The process is driven by a **YAML configuration file** located in **./verification/** and named **verif\_config.yml**. This file must be customized extensively before initiating any verification steps. | ||
|
||
--- | ||
|
||
## Step 00 – Adjust the YAML Configuration | ||
|
||
Most fields in the `ERA5` and `IFS` sections of the YAML file can remain as they are, but the following areas require your attention and adjustment: | ||
|
||
1. **qsub Section** | ||
|
||
- **`qsub_loc`** – Path to the directory for qsub scripts (typically `./verification/qsub/`). | ||
- **`scripts_loc`** – Path to the directory containing verification scripts. | ||
- **`project_code`** – Your project code (required for submitting jobs to the cluster). | ||
- **`conda_env`** – Name of the conda environment used for running the scripts. | ||
|
||
2. **forecastmodel Section** | ||
|
||
- **`save_loc_rollout`** – Path to the directory where your generated forecasts are saved. | ||
- **`verif_variables`** – List of variables you wish to verify (ensure these match your forecast output). | ||
|
||
--- | ||
|
||
## Step 01 – Generate and Run QSUB Scripts | ||
|
||
Navigate to the **`./verification/verification/`** directory, where you will find four Jupyter notebooks named **`qsub_STEP00_*.ipynb`**. These notebooks generate the qsub scripts found in the **`./verification/qsub/`** directory. | ||
|
||
### Key Scripts to Run: | ||
|
||
- **STEP\_00** – Gathers forecast data (required before proceeding). | ||
- **STEP\_02** – Generates RMSE and ACC metrics. | ||
|
||
These scripts must be executed sequentially. | ||
|
||
### Running QSUB Scripts: | ||
|
||
1. After generating the qsub scripts via the notebooks, navigate to the **`./verification/qsub/`** directory. | ||
2. Execute the following scripts via bash: | ||
```bash | ||
bash step00_gather_ForecastModel_all.sh | ||
bash step02_RMSE_MF_all.sh | ||
bash step02_ACC_MF_all.sh | ||
``` | ||
3. **`step00_gather_ForecastModel_all.sh`** must complete before running the other scripts. | ||
|
||
--- | ||
|
||
## Expected Results | ||
|
||
Upon completion of each stage: | ||
|
||
1. **After Forecast Gathering:** | ||
- Forecasts will be gathered into individual NetCDF (`*.nc`) files in the location specified in the `qsub` section of the **YAML file**. | ||
|
||
2. **After RMSE and ACC Computation:** | ||
- RMSE and ACC NetCDF files will be saved in the directory defined by the **`save_loc_verif`** field under the `forecastmodel` section of the YAML file. | ||
|
||
## Troubleshooting | ||
|
||
Forecast verification can take several days, especially for multi-year data. If errors occur, consider the following: | ||
|
||
1. **Directory Permissions & Existence**\ | ||
Ensure all directories specified in the YAML file exist and have appropriate write permissions. Create them manually if necessary. | ||
|
||
```python | ||
import os | ||
os.makedirs(path_verif, exist_ok=True) | ||
``` | ||
|
||
2. **Post-Gather Checks** | ||
|
||
- After running the gather script, verify that all forecast files have been created and contain the correct data and size. | ||
- If you encounter files with abnormally small sizes, delete them and rerun the gather script. Files that already exist will **not** be overwritten. | ||
- This will be much faster than the first run, as the files that already exist will be skipped. | ||
|
||
3. **Monitoring Job Progress** | ||
|
||
- Use cluster job monitoring tools to track progress and troubleshoot errors. | ||
- For failed jobs, inspect the `.err` files in the qsub directory for detailed logs. | ||
|
||
--- | ||
|
||
## Additional Notes | ||
|
||
- This process heavily relies on **parallel computing environments** like NCAR's Casper/Derecho clusters. Ensure you are familiar with the cluster's queuing and submission systems (PBS/SLURM). | ||
- The workflow is designed to be flexible. Users are encouraged to adapt scripts to suit their specific verification needs. | ||
|
||
If additional clarification or sections are needed (e.g., explanation of the verification metrics or variable definitions), feel free to reach out or contribute directly to this repository. | ||
|
||
Here are the detailed steps to make new verification results: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,25 @@ | ||
# Location of qsub folders | ||
# Location of qsub folder | ||
|
||
## Step 01 – Generate and Run QSUB Scripts | ||
|
||
Navigate to the **`./verification/verification/`** directory, where you will find four Jupyter notebooks named **`qsub_STEP00_*.ipynb`**. These notebooks generate the qsub scripts found in the **`./verification/qsub/`** directory. | ||
|
||
### Key Scripts to Run: | ||
|
||
- **STEP\_00** – Gathers forecast data (required before proceeding). | ||
- **STEP\_02** – Generates RMSE and ACC metrics. | ||
|
||
These scripts must be executed sequentially. | ||
|
||
### Running QSUB Scripts: | ||
|
||
1. After generating the qsub scripts via the notebooks, navigate to the **`./verification/qsub/`** (you are here now) directory. | ||
2. Execute the following scripts via bash: | ||
```bash | ||
bash step00_gather_ForecastModel_all.sh | ||
bash step02_RMSE_MF_all.sh | ||
bash step02_ACC_MF_all.sh | ||
``` | ||
3. **`step00_gather_ForecastModel_all.sh`** must complete before running the other scripts. | ||
|
||
--- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,112 @@ | ||
## Hello! | ||
|
||
Below we outline the notebooks used to gen the QSUB scripts, in **`./scripts`** are the main files which drive the calculations. Adjustments can be made there. | ||
|
||
--- | ||
## QSUB Jupyter Notebooks – Generating Job Scripts for Forecast Verification | ||
|
||
This directory contains Jupyter notebooks designed to generate and submit qsub scripts for various stages of the forecast verification process. These notebooks facilitate job scheduling and resource allocation on HPC systems, streamlining the process of gathering, processing, and verifying forecast data. | ||
|
||
--- | ||
### Notebooks Overview | ||
|
||
The primary function of the notebooks in this folder is to automate the creation of bash scripts (`.sh`) that submit jobs to the PBS queueing system. This approach allows for efficient parallelization, ensuring multiple forecasts are processed concurrently. | ||
|
||
**Notebook Naming Convention:** | ||
- **`qsub_STEP00_*.ipynb`** – Responsible for gathering forecast model data. | ||
- **`qsub_STEP02_*.ipynb`** – Generates RMSE and ACC qsub scripts for model verification. | ||
--- | ||
|
||
### How to Use These Notebooks | ||
|
||
1. **Setup & Prerequisites** | ||
Ensure the following prerequisites are met before running the notebooks: | ||
- **Configured YAML file** (`verif_config.yml`) with correct paths, project codes, and environment settings. | ||
- **Conda environment** activated (defined in the YAML under `conda_env`). | ||
- Appropriate access to the cluster and necessary permissions for submitting jobs. | ||
|
||
**Example Activation:** | ||
```bash | ||
conda activate credit | ||
``` | ||
|
||
2. **Navigating the Workflow** | ||
- Start by opening the `qsub_STEP00_jobs.ipynb` notebook to generate scripts for gathering forecast data. | ||
- Follow by executing the `qsub_STEP02_*` notebooks for computing RMSE and ACC after the gather phase completes. | ||
|
||
3. **Running the Notebooks** | ||
- Execute cells sequentially within the notebook. | ||
- Each notebook will output `.sh` scripts into the `./verification/qsub/` directory. | ||
|
||
4. **Submitting QSUB Jobs** | ||
Once scripts are generated, submit them to the cluster queue: | ||
```bash | ||
bash step00_gather_ForecastModel_all.sh | ||
bash step02_RMSE_MF_all.sh | ||
bash step02_ACC_MF_all.sh | ||
``` | ||
|
||
--- | ||
|
||
### Notebook Breakdown – `qsub_STEP00_jobs.ipynb` | ||
|
||
**Purpose:** | ||
Generates qsub scripts to gather forecast data from various sources and formats it for further verification. | ||
|
||
**Key Sections:** | ||
- **Config Loading:** Loads the YAML configuration to set paths, environment, and project-specific parameters. | ||
- **Script Generation Loop:** Iterates over forecast indices (`INDs`) to create individual qsub scripts for each chunk of data. | ||
- **Output:** | ||
- Scripts are saved in the `qsub_loc` directory specified in the YAML. | ||
- Example script: `verif_ZES_WX_001.sh` | ||
|
||
**Critical Code Example:** | ||
```python | ||
f = open('{}verif_ZES_WX_{:03d}.sh'.format(conf['qsub']['qsub_loc'], i), 'w') | ||
|
||
heads = '''#!/bin/bash -l | ||
#PBS -N ZES_MF | ||
#PBS -A {project_code} | ||
#PBS -l walltime=23:59:59 | ||
#PBS -l select=1:ncpus=4:mem=32GB | ||
#PBS -q casper | ||
#PBS -o verif_ZES_MF.log | ||
#PBS -e verif_ZES_MF.err | ||
conda activate credit | ||
cd {} | ||
python STEP03_ZES_ModelForecast.py {} {} | ||
'''.format(conf['qsub']['scripts_loc'], ind_start, ind_end, ind_start, ind_end) | ||
``` | ||
|
||
--- | ||
|
||
### Keys to Running These Notebooks | ||
|
||
- **Ensure Sequential Execution:** | ||
- `STEP00` scripts **must** be submitted and completed **before** proceeding to `STEP02` scripts. | ||
- Failure to adhere to this order will result in missing forecast data during RMSE/ACC calculations. | ||
- **Directory Existence:** | ||
- The directories where NetCDF files are saved (`save_loc_verif`) must exist. | ||
- Use `os.makedirs(path, exist_ok=True)` to create directories if needed. | ||
- **Cluster Specifics:** | ||
- These scripts are optimized for NCAR’s Cheyenne/Derecho clusters. Adjust for other HPC environments if necessary. | ||
|
||
--- | ||
|
||
### Troubleshooting | ||
|
||
- **Job Failures:** | ||
- Review `.err` files in the qsub directory. These contain logs of job failures and error messages. | ||
- **Missing Files:** | ||
- If forecast files appear incomplete, re-run the gather phase (`STEP00`) without fear of overwriting existing valid files. | ||
- **Memory/CPU Issues:** | ||
- Adjust resource allocation by modifying `ncpus` and `mem` in the qsub script templates within the notebooks. | ||
|
||
--- | ||
|
||
### Final Notes | ||
|
||
This workflow provides a scalable and efficient method for verifying forecast data on HPC clusters. While designed for internal projects, contributions are encouraged to improve performance, add metrics, or adapt for other clusters. | ||
|
||
If you find gaps or areas that require clarification, feel free to submit issues or pull requests to enhance the repository. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.