Add BSRN format reader to iotools #1015

wholmgren · 2020-08-03T15:35:02Z

I need a parser for the NASA Langley CAPABLE BSRN site. Given the importance and the quality of the BSRN sites, I expect that others would benefit from this parser as well.

Does anyone have experience with this site or have a parser for the format?

https://capable.larc.nasa.gov/data/

https://cove.larc.nasa.gov/BSRN/LRC49/

Data from December 2014 to present.

1 month of data per file. Appears to be uploaded in the first few days of the following month. 1 minute intervals.

Another fun fixed width file. Entries like:

  1  987   1003 -99.9 -999 -999    975 -99.9 -999 -999
             72 -99.9 -999 -999    287 -99.9 -999 -999     19.9  37.2 1026

  1  988   1006 -99.9 -999 -999    977 -99.9 -999 -999
             72 -99.9 -999 -999    290 -99.9 -999 -999     19.8  36.9 1026

  1 1438     22 -99.9 -999 -999      0 -99.9 -999 -999
             21 -99.9 -999 -999    307 -99.9 -999 -999     17.7  57.9 1023

  1 1439     21 -99.9 -999 -999      0 -99.9 -999 -999
             20 -99.9 -999 -999    307 -99.9 -999 -999     17.6  56.8 1023

The first number is the day of the month. The second is the minute of the day. Times appear to be in UTC.

I believe the ordering is:

CM22 pyranometer GHI (upper left)
CM31 pyranometer DHI (lower left)
CH1 pyrheliometer DNI (upper right)
PIR infrared (lower right)

I'd probably read the file into a DataFrame without meaningful columns, split it into two DataFrames using .iloc[::2] and .iloc[1::2], parse the date time information into and index, then stitch the data back together.

CAPABLE Site Coordinates:
Latitude: 37.1038
Longitude: -76.3872
Elevation: 3 m ASL

cross post from SolarArbiter/solarforecastarbiter-core#541

The text was updated successfully, but these errors were encountered:

kandersolar · 2020-08-03T15:54:40Z

Is that dataset different from the BSRN-formatted LRC dataset provided through pangaea? For example: https://doi.pangaea.de/10.1594/PANGAEA.913689

If it's the same dataset, I'd vote for adding a parser for the standard BSRN format instead because the BSRN format is used for many other stations as well. I actually thought pvlib already had a read_bsrn function but I guess not. The BSRN format is pretty straightforward -- a metadata header followed by nice TSV data.

wholmgren · 2020-08-03T16:20:06Z

Thanks @kanderso-nrel that sounds like a better idea. I created an account but I still don't have permission to download the tsv file. I also tried to download the file via ftp with wget but got a "login incorrect" message. Do you know anything about getting access to more data?

Do you know if the data available over ftp is in tsv format too? It appears that only the ftp files have a regular naming scheme. I'd like to automate the fetch and parsing so I'd prefer a regular naming scheme to random DOIs.

kandersolar · 2020-08-03T17:31:04Z

I created an account but I still don't have permission to download the tsv file. I also tried to download the file via ftp with wget but got a "login incorrect" message.

Ah, there is a username and password (and for whatever reason, the account you can create yourself doesn't work). I think there is just a global login that everyone uses -- I don't feel comfortable sharing it publicly, but you can email Amelie Driemel for it: https://bsrn.awi.de/?id=393.

Do you know if the data available over ftp is in tsv format too?

Looks like the FTP files are in the same format as your example, which I think is called "station-to-archive" format: https://bsrn.awi.de/data/station-to-archive-file-format/

I'd like to automate the fetch and parsing so I'd prefer a regular naming scheme to random DOIs.

I wrote a scraper a while back that uses the pangaea search function to list the datasets I wanted: https://www.pangaea.de/?q=project%3Alabel%3ABSRN+%2Bevent%3Alabel%3ALRC+%2Bcitation%3ABasic+-guidelines

Fetching data from the FTP archive would have been cleaner. I don't think I knew about the FTP archive back then. So maybe for your use case, implementing the more complex "station-to-archive" format would be better. Seems like the choice is a trade-off between nicer data format and easier file retrieval. Side note: I assume you'll want to be fetching the data automatically in the future, but if you just want historical BSRN data, I have local copies of all the US station data and can share if you want.

Possibly helpful links:

PANGAEA data warehouse API? https://www.pangaea.de/about/services.php
Python3 library to fetch PANGAEA data by DOI: https://github.com/pangaea-data-publisher/pangaeapy
- Don't think it knows how to list/search for datasets
R library for fetching PANGAEA data: https://github.com/ropensci/pangaear
- Can list/search for datasets

AdamRJensen · 2020-10-17T20:43:25Z

Hi @wholmgren

The file (.dat) in the second link you refer to is indeed in the "Station-to-archive" file format used by the BSRN. It is described in in detail in the BSRN Technical Plan and briefly on their website: https://bsrn.awi.de/data/station-to-archive-file-format/

As you noted the file format is note very user friendly as data for each timestamp is split over two lines (probably due to archeic restrictions).

The easiest way I have found to access them is through BSRN's FTP server: https://bsrn.awi.de/data/data-retrieval-via-ftp/

I have written a function to parse 'station-to-archive' files (read_bsrn) and a function to get bsrn files from the ftp-server (get_bsrn): https://github.com/AdamRJensen/BSRN/blob/main/bsrn_v3.ipynb

I would be happy to get some feedback on the functions and contribute them to pvlib.

wholmgren · 2020-10-19T16:23:11Z

@AdamRJensen thanks, the functions in the notebook look like a great start and we'd welcome the pull request!

AdamRJensen · 2021-01-21T12:39:39Z

@wholmgren I have rewritten the function a bit to make it simpler and tested it on a few thousand of BSRN files. It's my first pull request, so perhaps you could review it and tell me if I am missing something?

* Add bsrn file to read bsrn files Related to issue #1015. * simplified read_bsrn function Simplified how the start and end line of the data is determined. Improved documentation, e.g. moved constants outside of function. * Simplified selection of rows in read_bsrn * Added read_bsrn to api.rst * Delete 2021_01_16_read_bsrn_pull_request_v2.py * Improved format, e.g removed trailing white spaces * Fixed spacing issues * Update v0.9.0.rst * Add iotools.bsrn and import read_bsrn * Split multiple lines to obey 75 character limit * Corrected indentation * Fixed indentation again * Remove bsrn email in description Co-authored-by: Cliff Hansen <cwhanse@sandia.gov> * Correct COL_SPEC variable The previous values in the COL_SPEC variables were not all correct, leading to incorrect parsing of the data. * Changed air_temperature to temp_air * Add test_bsrn file File is not complete, as I'm awaiting permission from BSRN to upload test file * Reference to FTP updated * Add zipped bsrn test file * Update test filename * Get file month/year from file instead of filename Previously the month and year of the file were determined from the filename. This has now been changed such that the month/year is found from within the file's metadata section (second line). * Fixed formatting/stickler issues * Fixed formatting/stickler issues * Fixed formatting/stickler issues * Fix to test_format_index * Refactored file opening and utc localization * Fixed indentation issue * Fixed hyperlink * Fixed doc error Air temperature was listed as air_temperature in the docstring instead of temp_air. * Handle file start date explicitly Co-authored-by: Will Holmgren <william.holmgren@gmail.com> * Correct pytest fixture magic Co-authored-by: Will Holmgren <william.holmgren@gmail.com> * Fix indentation broken by previous commit * Correct Dataframe to DataFrame in doc string * Add offset to line num after explicitly handling start date * Update test_bsrn.py * Added compression='infer', fixed end line number issue * Fixed test issue * Changed timedelta unit from min to minute * Add files via upload All logical records after LR0100 have been removed to reduce space (be below 25 MB), but also to test the functionality of files with few logical records. * Changed to_timedelta unit from minute' to 'T' * Updated test to cover unzipped and zipped files * Removed error causing blank line in test file * Change to Unix end of line character from file by wholmgren * Remove extra line at end of file * Fix typo in bsrn.py doc string Co-authored-by: Kevin Anderson <57452607+kanderso-nrel@users.noreply.github.com> Co-authored-by: Cliff Hansen <cwhanse@sandia.gov> Co-authored-by: Will Holmgren <william.holmgren@gmail.com> Co-authored-by: Kevin Anderson <57452607+kanderso-nrel@users.noreply.github.com>

wholmgren added solarfx2 DOE SETO Solar Forecasting 2 / Solar Forecast Arbiter io labels Aug 3, 2020

AdamRJensen mentioned this issue Jan 24, 2021

Add read_bsrn function #1145

Merged

8 tasks

wholmgren changed the title ~~Add NASA Langley CAPABLE BSRN site to iotools~~ Add BSRN format reader to iotools Jan 26, 2021

wholmgren added this to the 0.9.0 milestone Jan 26, 2021

wholmgren closed this as completed in #1145 Feb 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add BSRN format reader to iotools #1015

Add BSRN format reader to iotools #1015

wholmgren commented Aug 3, 2020

kandersolar commented Aug 3, 2020

wholmgren commented Aug 3, 2020

kandersolar commented Aug 3, 2020

AdamRJensen commented Oct 17, 2020 •

edited

Loading

wholmgren commented Oct 19, 2020

AdamRJensen commented Jan 21, 2021

Add BSRN format reader to iotools #1015

Add BSRN format reader to iotools #1015

Comments

wholmgren commented Aug 3, 2020

kandersolar commented Aug 3, 2020

wholmgren commented Aug 3, 2020

kandersolar commented Aug 3, 2020

AdamRJensen commented Oct 17, 2020 • edited Loading

wholmgren commented Oct 19, 2020

AdamRJensen commented Jan 21, 2021

AdamRJensen commented Oct 17, 2020 •

edited

Loading