Skip to content

read_crn returns -99999 instead of NaN #1372

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wholmgren opened this issue Jan 7, 2022 · 5 comments · Fixed by #1368
Closed

read_crn returns -99999 instead of NaN #1372

wholmgren opened this issue Jan 7, 2022 · 5 comments · Fixed by #1368
Labels
bug io solarfx2 DOE SETO Solar Forecasting 2 / Solar Forecast Arbiter

Comments

@wholmgren
Copy link
Member

Describe the bug
read_crn fails to map -99999 to NaN

To Reproduce

from pvlib.iotools import read_crn
crn = read_crn('https://www.ncei.noaa.gov/pub/data/uscrn/products/subhourly01/2021/CRNS0101-05-2021-NY_Millbrook_3_W.txt')
crn.loc['2021-12-14 0930':'2021-12-14 1130', 'ghi']
2021-12-14 09:30:00+00:00        0.0
2021-12-14 09:35:00+00:00        0.0
2021-12-14 09:40:00+00:00        0.0
2021-12-14 09:45:00+00:00        0.0
2021-12-14 09:50:00+00:00        0.0
2021-12-14 09:55:00+00:00        0.0
2021-12-14 10:00:00+00:00        0.0
2021-12-14 10:05:00+00:00   -99999.0
2021-12-14 10:10:00+00:00   -99999.0
2021-12-14 10:15:00+00:00   -99999.0
2021-12-14 10:20:00+00:00   -99999.0
2021-12-14 10:25:00+00:00   -99999.0
2021-12-14 10:30:00+00:00   -99999.0
2021-12-14 10:35:00+00:00   -99999.0
2021-12-14 10:40:00+00:00   -99999.0
2021-12-14 10:45:00+00:00   -99999.0
2021-12-14 10:50:00+00:00   -99999.0
2021-12-14 10:55:00+00:00   -99999.0
2021-12-14 11:00:00+00:00   -99999.0
2021-12-14 11:05:00+00:00        0.0
2021-12-14 11:10:00+00:00        0.0
2021-12-14 11:15:00+00:00        0.0
2021-12-14 11:20:00+00:00        0.0
2021-12-14 11:25:00+00:00        0.0
2021-12-14 11:30:00+00:00        0.0
Name: ghi, dtype: float64

Expected behavior
Should return NaN instead of -99999

Versions:

  • pvlib.__version__: 0.9.0
  • pandas.__version__: 1.0.3 (doesn't matter)
  • python: 3.7

Additional context

Documentation here says

     C.  Missing data are indicated by the lowest possible integer for a 
        given column format, such as -9999.0 for 7-character fields with 
        one decimal place or -99.000 for 7-character fields with three
        decimal places.

So we should change

# Now we can set nans. This could be done a per column basis to be
# safer, since in principle a real -99 value could occur in a -9999
# column. Very unlikely to see that in the real world.
for val in [-99, -999, -9999]:
# consider replacing with .replace([-99, -999, -9999])
data = data.where(data != val, np.nan)

to include -99999 and perhaps -999999. Or do the smarter thing as discussed in the comment.

also SolarArbiter/solarforecastarbiter-core#773

@wholmgren wholmgren added bug solarfx2 DOE SETO Solar Forecasting 2 / Solar Forecast Arbiter io labels Jan 7, 2022
@AdamRJensen
Copy link
Member

@wholmgren I have added -99999 and -999999 to the list of nan values in #1368.

However, I'd also be happy to implement column-specific replacement of nans. Any suggestion on how to do this?

@AdamRJensen
Copy link
Member

I see that the CRN files contain the latitude and longitude of the stations. I suppose it would make sense to parse this and output it in a metadata dictionary?

I realize that outputing a tuple of (data, meta) instead of just data would be a breaking change, but it would be in line with the iotools pattern.

@wholmgren
Copy link
Member Author

However, I'd also be happy to implement column-specific replacement of nans. Any suggestion on how to do this?

crn.WIDTHS gets you half way there, but you'd need a corresponding list for the number of decimals allowed in each column. From there you can determine, for each column, how many 9s will fit between a - and the space required for the decimals.

I see that the CRN files contain the latitude and longitude of the stations. I suppose it would make sense to parse this and output it in a metadata dictionary? I realize that outputing a tuple of (data, meta) instead of just data would be a breaking change, but it would be in line with the iotools pattern.

Could be nice, but not sure how we'd handle the breaking change in a nice way. Let's address in a separate issue.

@AdamRJensen
Copy link
Member

AdamRJensen commented Jan 10, 2022

It is possible to pass a dictionary to the na_values argument in pd.read_fwf:

NAN_DICT = {
    'CRX_VN': -99999,
    'AIR_TEMPERATURE': -9999,
    'PRECIPITATION': -9999,
    'SOLAR_RADIATION': -99999,
    'SURFACE_TEMPERATURE': -9999,
    'RELATIVE_HUMIDITY': -9999,
    'SOIL_MOISTURE_5': -99,
    'SOIL_TEMPERATURE_5': -9999,
    'WETNESS': -9999,
    'WIND_1_5': -99}

Although, currently na_values is set equal to ['\x00\x00\x00\x00\x00\x00'].

However, they can be combined:

NAN_DICT = {k: [v, '\x00\x00\x00\x00\x00\x00'] for k, v in NAN_DICT.items()}

and then:

# read in data. set fields with NUL characters to NaN
data = pd.read_fwf(filename, header=None, names=HEADERS.split(' '),
                   widths=WIDTHS, na_values=NAN_DICT)

I tested this method and it passes all of the tests.

@AdamRJensen
Copy link
Member

AdamRJensen commented Jan 11, 2022

Fun fact: there are a few CRN stations that have longitudes in the -99 range, but none that matches exactly:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug io solarfx2 DOE SETO Solar Forecasting 2 / Solar Forecast Arbiter
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants