Skip to content

Commit

Permalink
merge diyepw-scripts into diyepw, using click to enable cli functiona…
Browse files Browse the repository at this point in the history
…lity; fixes #51
  • Loading branch information
thurber committed Jul 13, 2021
1 parent 99c8459 commit ce48083
Show file tree
Hide file tree
Showing 7 changed files with 488 additions and 17 deletions.
124 changes: 112 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,17 @@
# DIYEPW
DIYEPW is a tool developed by Pacific Northwest National Laboratory that allows the quick and easy
# `diyepw`
`diyepw` is a tool developed by Pacific Northwest National Laboratory that allows the quick and easy
generation of a set of EPW files for a given set of WMOs and years. It is provided as both a set
of scripts (https://github.com/IMMM-SFA/diyepw-scripts) and as a Python package (https://github.com/IMMM-SFA/diyepw).
This allows DIYEPW to be used as a command-line tool, or as a package to incorporate EPW file
generation into a custom script.
of scripts and as a Python package. This allows `diyepw` to be used as a command-line tool, or as a package to
incorporate EPW file generation into a custom script.

# Getting Started
The DIYEPW Python package can be easily installed using PIP:
The `diyepw` Python package can be easily installed using PIP:

```
pip install diyepw
```

One you've installed the package, you can access any of the DIYEPW functions or classes by importing the package
One you've installed the package, you can access any of the `diyepw` functions or classes by importing the package
into your own Python scripts:

```
Expand All @@ -27,7 +26,7 @@ diyepw.create_amy_epw_files_for_years_and_wmos(
)
```

# Using DIYEPW to generate AMY EPW files
# Using `diyepw` to generate AMY EPW files
This package is a tool for the generation of AMY (actual meteorological year) EPW files, which is done
by injecting AMY data into TMY (typical meteorological year) EPW files. The generated EPW files
have the following fields replaced with observed data:
Expand All @@ -38,14 +37,14 @@ have the following fields replaced with observed data:
1. wind direction
1. wind speed

Because observed weather data commonly contains gaps, DIYEPW will attempt to fill in any such gaps to ensure that in
Because observed weather data commonly contains gaps, `diyepw` will attempt to fill in any such gaps to ensure that in
each record a value is present for all of the hourly timestamps for the variables shown above. To do so, it will use one
of two strategies to impute or interpolate values for any missing fields in the data set:

#### Interpolation: Handling for small gaps
Small gaps (by default up to 6 consecutive hours of consecutive missing data for a field), are handled by linear
interpolation, so that for example if the dry bulb temperature has a gap with neighboring observed values like
(20, X, X, X, X, 25), DIYEPW will replace the missing values to give (20, 21, 22, 23, 24, 25).
(20, X, X, X, X, 25), `diyepw` will replace the missing values to give (20, 21, 22, 23, 24, 25).

#### Imputation: Handling for large gaps
Large gaps (by default up to 48 consecutive hours of missing data for a field) are filled using an imputation strategy
Expand All @@ -61,9 +60,9 @@ missing values that can be imputed, can be changed from their defaults. The func
`max_records_to_interpolate` and `max_records_to_impute`, which likewise override the defaults of 6 and 48.

## Package Functions
All of the functionality of the DIYEPW project is available as a set of functions that underlie the scripts
All of the functionality of the `diyepw` project is available as a set of functions that underlie the scripts
described above. The functions offer much more granular access to the capabilities of the project, and allow
DIYEPW capabilites to be incorporated into other software projects.
`diyepw` capabilites to be incorporated into other software projects.

The functions provided by the package are as follows:

Expand All @@ -86,6 +85,107 @@ The classes provided by the package are as follows:
For more detailed documentation of all parameters, return types, and behaviors of the above functions and classes,
please refer to the in-code documentation that heads each function's definition in the package.

## Scripts
This section describes the scripts available as part of this project. The scripts will be available in the terminal or
virtual environment after running `pip install diyepw`. The scripts are located in the `diyepw/scripts/` directory.
Every script has a manual page that can be accessed by passing the "--help" option to the script. For example:

```
analyze_noaa_data --help
```

### Workflow 1: AMY EPW generation based on years and WMO indices
This workflow uses only a single script, `create_amy_epw_files_for_years_and_wmos.py`, and
generates AMY EPW files for a set of years and WMO indices. It accomplishes this by combining
TMY (typical meteorological year) EPW files with AMY (actual meteorological year) data. The
TMY EPW file for a given WMO is downloaded by the software as needed from energyplus.net. The
AMY data comes from NOAA ISD Lite files that are likewise downloaded as needed, from
ncdc.noaa.gov.

This script can be called like this:

```
create_amy_epw_files_for_years_and_wmos --years=2010-2015 --wmo-indices=723403,7722780 --output-path .
```

The options `--years` and `--wmo-indices` are required. You will be prompted for them if not provided in the arguments.
There are a number of other optional options that can also be set. All available options, their effects, and the values
they accept can be seen by calling this script with the `--help` option:

```
create_amy_epw_files_for_years_and_wmos --help
```

### Workflow 2: AMY EPW generation based on existing ISD Lite files
This workflow is very similar to Workflow 1, but instead of downloading NOAA's ISD Lite files
as needed, it reads in a set of ISD Lite files provided by the user and generates one AMY EPW
file corresponding to each.

This workflow involves two steps:

#### 1. analyze_noaa_data

The script analyze_noaa_data.py will check a set of ISD Lite files against a set of requirements,
and generate a CSV file listing the ISD Lite files that are suitable for conversion to EPW. The
script is called like this:

```
analyze_noaa_data --inputs=/path/to/your/inputs/directory --output-path .
```

The script will look for any file within the directory passed to --inputs, including in
subdirectories or subdirectories of subdirectories. The files must be named like
"999999-88888-2020.gz", where the first number is a WMO index and the final number is the
year - the middle number is ignored. The easiest way to get files that are suitable for use
for this script is to download them from NOAA's catalog at
https://www1.ncdc.noaa.gov/pub/data/noaa/isd-lite/.

The ".gz" (gzip commpressed) format of the ISD Lite files is the format provided by NOAA,
but is not required. You may also provide ISD Lite files in CSV (.csv) format, or in a
different compression format like ZIP (.zip). The file extension is used to determine what
format the file is and must match the file's format. Pass the `--help` option
(`analyze_noaa_data --help`) for more information on what compressed formats are supported.

The script is primarily checking that the ISD Lite files are in concordance with the following limits:

1. Total number of rows missing
1. Maximum number of consecutive rows missing

and will produce the following files (as applicable) under the specified `--output-path`:

1. `missing_total_entries_high.csv`: A list of files where the total number of rows missing exceeds a threshold.
The threshold is set to rule out files where more than 700 (out of 8760 total) entries are missing entirely
by default, but a custom value can be set with the --max-missing-rows option:

```
analyze_noaa_data --max-missing-rows=700
```
1. `missing_consec_entries_high.csv`: A list of files where the maximum consecutive number of rows missing exceeds
a threshold. The threshold is currently set to a maximum of 48 consecutive empty rows, but a custom value can
be set with the --max-consecutive-missing-rows option:
```
analyze_noaa_data --max-consecutive-missing-rows=48
```
1. `files_to_convert.csv`: A list of the files that are deemed to be usable because they are neither missing too many
total nor too many consecutive rows. This file determines which EPWs will be generated by the next script, and
it can be freely edited before running that script.
#### 2. create_amy_epw_files
The script create_amy_epw.py reads the files_to_convert.csv file generated in the previous step, and for each
ISD Lite file listed, generates an AMY EPW file. It can be called like this:
```
create_amy_epw_files --max-records-to-interpolate=6 --max-records-to-impute=48
```
Both `--max-records-to-interpolate` and `--max-records-to-impute` are optional and can be used to override the
default size of the gaps that can be filled in observed data using the two strategies, which are described in more
detail at the top of this document.
## Reading in TMY3 files and writing EPW files
Functions for reading TMY3 files and writing EPW files within this script were adapted from the
[LAF.py script](https://github.com/SSESLab/laf/blob/master/LAF.py) by Carlo Bianchi at the Site-Specific
Expand Down
2 changes: 1 addition & 1 deletion diyepw/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
__version__ = '1.0.5'
__version__ = '1.1.0'
from .meteorology import Meteorology
from .create_amy_epw_files_for_years_and_wmos import create_amy_epw_files_for_years_and_wmos
from .analyze_noaa_isd_lite_files import analyze_noaa_isd_lite_files
Expand Down
2 changes: 1 addition & 1 deletion diyepw/analyze_noaa_isd_lite_file.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

def analyze_noaa_isd_lite_file(
file: str,
compression:str='infer'
compression: str='infer'
):
"""
Performs an analysis of a single NOAA ISD Lite file, determining whether it is suitable for conversion into an AMY
Expand Down
102 changes: 102 additions & 0 deletions diyepw/scripts/analyze_noaa_data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
import click
import diyepw
from glob import iglob
import os
import pandas as pd


@click.command()
@click.option(
'--max-missing-rows',
default=700,
show_default=True,
type=int,
help='ISD files with more than this number of missing rows will be excluded from the output'
)
@click.option(
'--max-consecutive-missing-rows',
default=48,
show_default=True,
type=int,
help='ISD files with more than this number of consecutive missing rows will be excluded from the output'
)
@click.option(
'-o', '--output-path',
default='.',
type=click.Path(
file_okay=False,
dir_okay=True,
writable=True,
resolve_path=True,
),
help="""The path to which output and error files should be written."""
)
@click.argument(
'input_path',
default='.',
type=click.Path(
file_okay=False,
dir_okay=True,
readable=True,
resolve_path=True,
),
)
def analyze_noaa_data(
max_missing_rows,
max_consecutive_missing_rows,
output_path,
input_path,
):
"""Perform an analysis of a set of NOAA ISA Lite files, determining which are suitable for conversion to
AMY EPW files. Any ISD Lite files in INPUT_PATH or any of its subdirectories will be processed. The files
must be named according to the format '<WMO Index>-<WBAN>-<Year>' and must end with '.gz', '.csv', or '.zip'."""

# Make a directory to store results if it doesn't already exist.
if not os.path.exists(output_path):
os.makedirs(output_path)

# Recursively search for all files under the passed path, excluding directories
input_files = [file for file in iglob(input_path + '/**/*', recursive=True) if not os.path.isdir(file)]

try:
analysis_results = diyepw.analyze_noaa_isd_lite_files(
input_files,
max_missing_rows=max_missing_rows,
max_consecutive_missing_rows=max_consecutive_missing_rows,
)
except:
click.echo("Unable to read input files, aborting...")
raise click.Abort

# Write the dataframes to CSVs for the output files.
num_files_with_too_many_rows_missing = len(analysis_results['too_many_total_rows_missing'])
if num_files_with_too_many_rows_missing > 0:
path = os.path.join(output_path, 'missing_total_entries_high.csv')
path = os.path.abspath(path) # Change to absolute path for readability
click.echo(f"""{num_files_with_too_many_rows_missing}
records excluded because they were missing more than {max_missing_rows}
rows. Information about these files will be written to {path}.""")
pd.DataFrame(analysis_results['too_many_total_rows_missing']).to_csv(path, index=False)

num_files_with_too_many_consec_rows_missing = len(analysis_results['too_many_consecutive_rows_missing'])
if num_files_with_too_many_consec_rows_missing > 0:
path = os.path.join(output_path, 'missing_consec_entries_high.csv')
path = os.path.abspath(path) # Change to absolute path for readability
click.echo(f"""{num_files_with_too_many_consec_rows_missing}
records excluded because they were missing more than {max_consecutive_missing_rows}
consecutive rows. Information about these files will be written to {path}.""")
pd.DataFrame(analysis_results['too_many_consecutive_rows_missing']).to_csv(path, index=False)

num_good_files = len(analysis_results['good'])
if num_good_files > 0:
path = os.path.join(output_path, 'files_to_convert.csv')
path = os.path.abspath(path) # Change to absolute path for readability
click.echo(f"""{num_good_files} records are complete enough to be processed.
Information about these files will be written to {path}.""")
pd.DataFrame(analysis_results['good']).to_csv(path, index=False)

click.echo('Done! {count} files processed.'.format(count=sum([
num_good_files,
num_files_with_too_many_consec_rows_missing,
num_files_with_too_many_rows_missing
])))
Loading

0 comments on commit ce48083

Please sign in to comment.