-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fetch GEFS from AWS #712
base: master
Are you sure you want to change the base?
Fetch GEFS from AWS #712
Conversation
Documentation at top of module will need to be updated. I've not yet tested it locally since I don't have wgrib2 on my mac and don't want to fight the compiler. I'll try on golem later today. |
@@ -57,6 +57,12 @@ | |||
CHECK_URL = 'https://nomads.ncep.noaa.gov/pub/data/nccf/com/{}/prod' | |||
BASE_URL = 'https://nomads.ncep.noaa.gov/cgi-bin/' | |||
|
|||
GEFS_BASE_URL = 'https://noaa-gefs-pds.s3.amazonaws.com' | |||
|
|||
# When querying aws for directories, start-after is used to paginate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would be helpful in the function documentation too
) as r: | ||
return await r.text() | ||
listing = await _get(session) | ||
all_dirs = re.findall("gefs\\.([0-9]{8})", listing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as r"gefs\.([0-9]{8})"
? I think r-strings are much easier to read for regex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment like
# xml contains many entries formatted similar to
# <CommonPrefixes><Prefix>gefs.20210101/</Prefix>
# regex creates a list like ['20210101', '20210102'...]
return await r.text() | ||
listing = await _get(session) | ||
all_dirs = re.findall("gefs\\.([0-9]{8})", listing) | ||
if len(all_dirs) < 1000: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aws starts paginating at 1k items?
else: | ||
return all_dirs + await get_available_gefs_dirs( | ||
session, | ||
'gefs.'+all_dirs[-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a proponent of calling kwargs as kwargs, so start_after='gefs.'+all_dirs[-1]
solarforecastarbiter/io/fetch/nwp.py
Outdated
@@ -730,3 +790,11 @@ def check_wgrib2(): | |||
if shutil.which('wgrib2') is None: | |||
logger.error('wgrib2 was not found in PATH and is required') | |||
sys.exit(1) | |||
|
|||
|
|||
def domain_args(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private function
I'm not sure if this is helpful, but I tried running I can provide full verbose output, if that helps. |
docs/source/api.rst
for API changes.docs/source/whatsnew
for all changes. Includes link to the GitHub Issue with:issue:`num`
or this Pull Request with:pull:`num`
. Includes contributor name and/or GitHub username (link with:ghuser:`user`
).This moves fetching for GEFS to AWS. It could probably use some reorganization but this works as is. This is quite a bit slower and moves ~10x the data over the network because we need to fetch the full gefs files and then slice within the lat/lon domain of interest instead of having nomads do that slicing for us.
I took a look at #696 and https://www.weather.gov/media/notification/pdf2/pns20-85ncep_web_access.pdf, and it appears that any of the cli commands that hit nomads are going to contribute to our hitting this rate limit. So moving at least one of the models off of nomads will help to alleviate some of the strain at the cost of bandwitdth/speed. We may need to address #696 separately to handle size 0 responses, but I'm not sure of a good way to throttle the async requests down to a maximum of 60 requests per minute across processes.