-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add map_variables and -99999 nan value to read_crn #1368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
9f8e256
Add map_variables argument to read_crn
AdamRJensen a10fe28
Add test coverage for map_variables
AdamRJensen 456cb71
Update whatsnew
AdamRJensen 6bc6950
Remove unnecessary tz_localize
AdamRJensen 106030a
Replace nans with .replace instead of .where
AdamRJensen 1fcaa97
Extend documentation
AdamRJensen 02fbfbb
Simply test coverage of map_variables
AdamRJensen 427fa38
Use assert_index_equal instead of assert
AdamRJensen ebe5510
Add import of assert_index_equal
AdamRJensen a409915
Add -99999 and -999999 to nan values
AdamRJensen 81ead2d
Add -99999 nan bug to whatsnew
AdamRJensen ee56796
Add -99999 to test file
AdamRJensen b041b8e
Update doc and whatsnew
AdamRJensen a753377
Remove -999999 from list of nans
AdamRJensen 485e2a9
Merge branch 'master' into patch-2
AdamRJensen a66f71d
Minor doc update
AdamRJensen 457fd5b
Add dictionary of nan values
AdamRJensen 999738b
Change CRN_VARIABLE_MAP back to VARIABLE_MAP
AdamRJensen 667f243
Remove numpy import
AdamRJensen a07acbc
Reformat setting of dtypes
AdamRJensen 538fd75
Merge branch 'master' into patch-2
AdamRJensen File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,15 +2,14 @@ | |
""" | ||
|
||
import pandas as pd | ||
import numpy as np | ||
|
||
|
||
HEADERS = ( | ||
'WBANNO UTC_DATE UTC_TIME LST_DATE LST_TIME CRX_VN LONGITUDE LATITUDE ' | ||
'AIR_TEMPERATURE PRECIPITATION SOLAR_RADIATION SR_FLAG ' | ||
'SURFACE_TEMPERATURE ST_TYPE ST_FLAG RELATIVE_HUMIDITY RH_FLAG ' | ||
'SOIL_MOISTURE_5 SOIL_TEMPERATURE_5 WETNESS WET_FLAG WIND_1_5 WIND_FLAG' | ||
) | ||
HEADERS = [ | ||
'WBANNO', 'UTC_DATE', 'UTC_TIME', 'LST_DATE', 'LST_TIME', 'CRX_VN', | ||
'LONGITUDE', 'LATITUDE', 'AIR_TEMPERATURE', 'PRECIPITATION', | ||
'SOLAR_RADIATION', 'SR_FLAG', 'SURFACE_TEMPERATURE', 'ST_TYPE', 'ST_FLAG', | ||
'RELATIVE_HUMIDITY', 'RH_FLAG', 'SOIL_MOISTURE_5', 'SOIL_TEMPERATURE_5', | ||
'WETNESS', 'WET_FLAG', 'WIND_1_5', 'WIND_FLAG'] | ||
|
||
VARIABLE_MAP = { | ||
'LONGITUDE': 'longitude', | ||
|
@@ -24,6 +23,21 @@ | |
'WIND_FLAG': 'wind_speed_flag' | ||
} | ||
|
||
NAN_DICT = { | ||
'CRX_VN': -99999, | ||
'AIR_TEMPERATURE': -9999, | ||
'PRECIPITATION': -9999, | ||
'SOLAR_RADIATION': -99999, | ||
'SURFACE_TEMPERATURE': -9999, | ||
'RELATIVE_HUMIDITY': -9999, | ||
'SOIL_MOISTURE_5': -99, | ||
'SOIL_TEMPERATURE_5': -9999, | ||
'WETNESS': -9999, | ||
'WIND_1_5': -99} | ||
|
||
# Add NUL characters to possible NaN values for all columns | ||
NAN_DICT = {k: [v, '\x00\x00\x00\x00\x00\x00'] for k, v in NAN_DICT.items()} | ||
|
||
# as specified in CRN README.txt file. excludes 1 space between columns | ||
WIDTHS = [5, 8, 4, 8, 4, 6, 7, 7, 7, 7, 6, 1, 7, 1, 1, 5, 1, 7, 7, 5, 1, 6, 1] | ||
# add 1 to make fields contiguous (required by pandas.read_fwf) | ||
|
@@ -40,15 +54,22 @@ | |
] | ||
|
||
|
||
def read_crn(filename): | ||
""" | ||
Read a NOAA USCRN fixed-width file into pandas dataframe. The CRN is | ||
described in [1]_ and [2]_. | ||
def read_crn(filename, map_variables=True): | ||
"""Read a NOAA USCRN fixed-width file into a pandas dataframe. | ||
|
||
The CRN network consists of over 100 meteorological stations covering the | ||
U.S. and is described in [1]_ and [2]_. The primary goal of CRN is to | ||
provide long-term measurements of temperature, precipitation, and soil | ||
moisture and temperature. Additionally, global horizontal irradiance (GHI) | ||
is measured at each site using a photodiode pyranometer. | ||
|
||
Parameters | ||
---------- | ||
filename: str, path object, or file-like | ||
filepath or url to read for the fixed-width file. | ||
map_variables: boolean, default: True | ||
When true, renames columns of the Dataframe to pvlib variable names | ||
where applicable. See variable :const:`VARIABLE_MAP`. | ||
|
||
Returns | ||
------- | ||
|
@@ -60,12 +81,12 @@ def read_crn(filename): | |
----- | ||
CRN files contain 5 minute averages labeled by the interval ending | ||
time. Here, missing data is flagged as NaN, rather than the lowest | ||
possible integer for a field (e.g. -999 or -99). Air temperature in | ||
deg C. Wind speed in m/s at a height of 1.5 m above ground level. | ||
possible integer for a field (e.g. -999 or -99). Air temperature is in | ||
deg C and wind speed is in m/s at a height of 1.5 m above ground level. | ||
|
||
Variables corresponding to standard pvlib variables are renamed, | ||
Variables corresponding to standard pvlib variables are by default renamed, | ||
e.g. `SOLAR_RADIATION` becomes `ghi`. See the | ||
`pvlib.iotools.crn.VARIABLE_MAP` dict for the complete mapping. | ||
:const:`pvlib.iotools.crn.VARIABLE_MAP` dict for the complete mapping. | ||
|
||
CRN files occasionally have a set of null characters on a line | ||
instead of valid data. This function drops those lines. Sometimes | ||
|
@@ -85,16 +106,13 @@ def read_crn(filename): | |
Amer. Meteor. Soc., 94, 489-498. :doi:`10.1175/BAMS-D-12-00170.1` | ||
""" | ||
|
||
# read in data. set fields with NUL characters to NaN | ||
data = pd.read_fwf(filename, header=None, names=HEADERS.split(' '), | ||
widths=WIDTHS, na_values=['\x00\x00\x00\x00\x00\x00']) | ||
# at this point we only have NaNs from NUL characters, not -999 etc. | ||
# these bad rows need to be removed so that dtypes can be set. | ||
# NaNs require float dtype so we run into errors if we don't do this. | ||
data = data.dropna(axis=0) | ||
# loop here because dtype kwarg not supported in read_fwf until 0.20 | ||
for (col, _dtype) in zip(data.columns, DTYPES): | ||
data[col] = data[col].astype(_dtype) | ||
# read in data | ||
data = pd.read_fwf(filename, header=None, names=HEADERS, widths=WIDTHS, | ||
na_values=NAN_DICT) | ||
# Remove rows with all nans | ||
data = data.dropna(axis=0, how='all') | ||
# set dtypes here because dtype kwarg not supported in read_fwf until 0.20 | ||
data = data.astype(dict(zip(HEADERS, DTYPES))) | ||
|
||
# set index | ||
# UTC_TIME does not have leading 0s, so must zfill(4) to comply | ||
|
@@ -103,19 +121,8 @@ def read_crn(filename): | |
dtindex = pd.to_datetime(dts['UTC_DATE'] + dts['UTC_TIME'].str.zfill(4), | ||
format='%Y%m%d%H%M', utc=True) | ||
data = data.set_index(dtindex) | ||
try: | ||
# to_datetime(utc=True) does not work in older versions of pandas | ||
data = data.tz_localize('UTC') | ||
except TypeError: | ||
pass | ||
Comment on lines
-106
to
-110
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Too bad I neglected to indicate the versions that needed this. I think the tests are comprehensive enough that we can safely remove this if they still pass. |
||
|
||
# Now we can set nans. This could be done a per column basis to be | ||
# safer, since in principle a real -99 value could occur in a -9999 | ||
# column. Very unlikely to see that in the real world. | ||
for val in [-99, -999, -9999]: | ||
# consider replacing with .replace([-99, -999, -9999]) | ||
data = data.where(data != val, np.nan) | ||
|
||
data = data.rename(columns=VARIABLE_MAP) | ||
|
||
if map_variables: | ||
data = data.rename(columns=VARIABLE_MAP) | ||
|
||
return data |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pvlib now requires pandas > 0.22 so perhaps we can remove this line in favor of the dtype kwarg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not believe this is possible as there are rows with all nans - and you cannot set the column dtype to
int
if the column contains nans. In the current version, the dtypes are set after the all nan rows are removed.I get the following error when i set
dtype=dict(zip(HEADERS, DTYPES))
withread_fwf
: