-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add BSRN format reader to iotools #1015
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Is that dataset different from the BSRN-formatted LRC dataset provided through pangaea? For example: https://doi.pangaea.de/10.1594/PANGAEA.913689 If it's the same dataset, I'd vote for adding a parser for the standard BSRN format instead because the BSRN format is used for many other stations as well. I actually thought pvlib already had a |
Thanks @kanderso-nrel that sounds like a better idea. I created an account but I still don't have permission to download the tsv file. I also tried to download the file via ftp with wget but got a "login incorrect" message. Do you know anything about getting access to more data? Do you know if the data available over ftp is in tsv format too? It appears that only the ftp files have a regular naming scheme. I'd like to automate the fetch and parsing so I'd prefer a regular naming scheme to random DOIs. |
Ah, there is a username and password (and for whatever reason, the account you can create yourself doesn't work). I think there is just a global login that everyone uses -- I don't feel comfortable sharing it publicly, but you can email Amelie Driemel for it: https://bsrn.awi.de/?id=393.
Looks like the FTP files are in the same format as your example, which I think is called "station-to-archive" format: https://bsrn.awi.de/data/station-to-archive-file-format/
I wrote a scraper a while back that uses the pangaea search function to list the datasets I wanted: https://www.pangaea.de/?q=project%3Alabel%3ABSRN+%2Bevent%3Alabel%3ALRC+%2Bcitation%3ABasic+-guidelines Fetching data from the FTP archive would have been cleaner. I don't think I knew about the FTP archive back then. So maybe for your use case, implementing the more complex "station-to-archive" format would be better. Seems like the choice is a trade-off between nicer data format and easier file retrieval. Side note: I assume you'll want to be fetching the data automatically in the future, but if you just want historical BSRN data, I have local copies of all the US station data and can share if you want. Possibly helpful links:
|
Hi @wholmgren The file (.dat) in the second link you refer to is indeed in the "Station-to-archive" file format used by the BSRN. It is described in in detail in the BSRN Technical Plan and briefly on their website: https://bsrn.awi.de/data/station-to-archive-file-format/ As you noted the file format is note very user friendly as data for each timestamp is split over two lines (probably due to archeic restrictions). The easiest way I have found to access them is through BSRN's FTP server: https://bsrn.awi.de/data/data-retrieval-via-ftp/ I have written a function to parse 'station-to-archive' files (read_bsrn) and a function to get bsrn files from the ftp-server (get_bsrn): https://github.com/AdamRJensen/BSRN/blob/main/bsrn_v3.ipynb I would be happy to get some feedback on the functions and contribute them to pvlib. |
@AdamRJensen thanks, the functions in the notebook look like a great start and we'd welcome the pull request! |
@wholmgren I have rewritten the function a bit to make it simpler and tested it on a few thousand of BSRN files. It's my first pull request, so perhaps you could review it and tell me if I am missing something? |
* Add bsrn file to read bsrn files Related to issue #1015. * simplified read_bsrn function Simplified how the start and end line of the data is determined. Improved documentation, e.g. moved constants outside of function. * Simplified selection of rows in read_bsrn * Added read_bsrn to api.rst * Delete 2021_01_16_read_bsrn_pull_request_v2.py * Improved format, e.g removed trailing white spaces * Fixed spacing issues * Update v0.9.0.rst * Add iotools.bsrn and import read_bsrn * Split multiple lines to obey 75 character limit * Corrected indentation * Fixed indentation again * Remove bsrn email in description Co-authored-by: Cliff Hansen <cwhanse@sandia.gov> * Correct COL_SPEC variable The previous values in the COL_SPEC variables were not all correct, leading to incorrect parsing of the data. * Changed air_temperature to temp_air * Add test_bsrn file File is not complete, as I'm awaiting permission from BSRN to upload test file * Reference to FTP updated * Add zipped bsrn test file * Update test filename * Get file month/year from file instead of filename Previously the month and year of the file were determined from the filename. This has now been changed such that the month/year is found from within the file's metadata section (second line). * Fixed formatting/stickler issues * Fixed formatting/stickler issues * Fixed formatting/stickler issues * Fix to test_format_index * Refactored file opening and utc localization * Fixed indentation issue * Fixed hyperlink * Fixed doc error Air temperature was listed as air_temperature in the docstring instead of temp_air. * Handle file start date explicitly Co-authored-by: Will Holmgren <william.holmgren@gmail.com> * Correct pytest fixture magic Co-authored-by: Will Holmgren <william.holmgren@gmail.com> * Fix indentation broken by previous commit * Correct Dataframe to DataFrame in doc string * Add offset to line num after explicitly handling start date * Update test_bsrn.py * Added compression='infer', fixed end line number issue * Fixed test issue * Changed timedelta unit from min to minute * Add files via upload All logical records after LR0100 have been removed to reduce space (be below 25 MB), but also to test the functionality of files with few logical records. * Changed to_timedelta unit from minute' to 'T' * Updated test to cover unzipped and zipped files * Removed error causing blank line in test file * Change to Unix end of line character from file by wholmgren * Remove extra line at end of file * Fix typo in bsrn.py doc string Co-authored-by: Kevin Anderson <57452607+kanderso-nrel@users.noreply.github.com> Co-authored-by: Cliff Hansen <cwhanse@sandia.gov> Co-authored-by: Will Holmgren <william.holmgren@gmail.com> Co-authored-by: Kevin Anderson <57452607+kanderso-nrel@users.noreply.github.com>
I need a parser for the NASA Langley CAPABLE BSRN site. Given the importance and the quality of the BSRN sites, I expect that others would benefit from this parser as well.
Does anyone have experience with this site or have a parser for the format?
https://capable.larc.nasa.gov/data/
https://cove.larc.nasa.gov/BSRN/LRC49/
Data from December 2014 to present.
1 month of data per file. Appears to be uploaded in the first few days of the following month. 1 minute intervals.
Another fun fixed width file. Entries like:
The first number is the day of the month. The second is the minute of the day. Times appear to be in UTC.
I believe the ordering is:
I'd probably read the file into a DataFrame without meaningful columns, split it into two DataFrames using
.iloc[::2]
and.iloc[1::2]
, parse the date time information into and index, then stitch the data back together.CAPABLE Site Coordinates:
Latitude: 37.1038
Longitude: -76.3872
Elevation: 3 m ASL
cross post from SolarArbiter/solarforecastarbiter-core#541
The text was updated successfully, but these errors were encountered: