Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nwis get_site_data with service='iv' and period='all' sets an inappropriately recent start date #175

Closed
emiliom opened this issue Dec 5, 2019 · 12 comments · Fixed by #181 or #182
Closed
Milestone

Comments

@emiliom
Copy link
Contributor

emiliom commented Dec 5, 2019

@erekalper pointed out this inappropriate (I think) hard-wired behavior for nwis.get_site_data.

The period argument accepts a value of 'all'. When that's used, hard-wired start dates are used, as defined here. For service='iv', the hard-wired start date is set to datetime.datetime(2007, 10, 1). I don't know where this start date came from, but it's misleading and can lead to wrong results. No data prior to 2007-10-1 would be returned without the user being aware of that imposed cutoff. In the case of a time series that ends before 2007-10-1, no data at all are returned. For example:

data = ulmo.usgs.nwis.get_site_data('09111500', service='iv', period='all')

returns no data because this is a time series that runs from 1993 to 2006-09-30 23:45. The data can still be obtained by passing an appropriate start date instead of using period='all', eg:

data = ulmo.usgs.nwis.get_site_data('09111500', service='iv', start='1993-01-01')

It looks like the start date for 'iv' service should be changed to a much older date, probably the same as the one used for 'dv', to be safe (1851-1-1). Or we should follow up with someone from USGS to learn more about this.

@erekalper
Copy link

erekalper commented Dec 6, 2019

I also just found out via trial and error that the earliest date allowable for ivs is 1900-01-01, and for dvs is 1600-01-01. NWIS yells at you if it's earlier than either of those. Just so you know!

@erekalper
Copy link

Final note: NWIS itself has a bad equality somewhere. The start date actually needs to be 1900-01-02 at the earliest for iv. Checked, and 1600-01-01 is fine for dv.

@dharhas
Copy link
Contributor

dharhas commented Dec 6, 2019

If I recall correctly, back when ulmo was started the IV data only went back to 2007 and request failed if you asked for earlier dates.

@emiliom
Copy link
Contributor Author

emiliom commented Dec 6, 2019

If I recall correctly, back when ulmo was started the IV data only went back to 2007 and request failed if you asked for earlier dates.

Thanks. I too have a fuzzy recollection that IV data availability was more limited back then.

I also just found out via trial and error that the earliest date allowable for iv is 1900-01-01, and for dv is 1600-01-01. NWIS yells at you if it's earlier than either of those. Just so you know!

Final note: NWIS itself has a bad equality somewhere. The start date actually needs to be 1900-01-02 at the earliest for iv. Checked, and 1600-01-01 is fine for dv.

Thanks, @erekalper. Great to know.

I think we have our answers. The easiest approach for a fix will be to change the start date for IV to 1900-01-01. I seriously doubt there's any data prior to 1850 in NWIS, but we can change the DV start date to 1600-01-01. I can also ping an NWIS USGS contact about this; I was just on the phone with one an hour ago.

@erekalper
Copy link

Just to be clear, the iv service will error out with 1900-01-01; it oddly needs to be 1900-01-02.

@solomon-negusse
Copy link
Member

Just to be clear, the iv service will error out with 1900-01-01; it oddly needs to be 1900-01-02.

Hi @erekalper, I tested this out and I'm getting valid responses with a start date of 1900-01-01.. I tried a handful of gauges. Here's an example:

In [30]: data = ulmo.usgs.nwis.get_site_data('08031290', service='iv', start='1900-01-01')              
processing data from request: https://nwis.waterservices.usgs.gov/nwis/iv/?format=waterml&site=08031290&startDT=1900-01-01T00%3A00%3A00

In [31]: data[list(data.keys())[0]]['values'][:2]                                                       
Out[31]: 
[{'value': '439.96',
  'qualifiers': 'A',
  'datetime': '2007-10-01T01:00:00-05:00'},
 {'value': '439.96',
  'qualifiers': 'A',
  'datetime': '2007-10-01T01:15:00-05:00'}]

Would be interesting to know if you hit a corner case with the service.

@erekalper
Copy link

It looks like I did! I wasn't testing a case without an end date or without a full timestamp. In light of that, I tried all edge cases that I could think of below:

✔️ No end date, start has no timestamp
https://nwis.waterservices.usgs.gov/nwis/iv/?format=waterml&site=08031290&startDT=1900-01-01
✔️ No end date, start has timestamp
https://nwis.waterservices.usgs.gov/nwis/iv/?format=waterml&site=08031290&startDT=1900-01-01T00%3A00%3A00
✔️ End date, start has no timestamp, end has no timestamp
https://nwis.waterservices.usgs.gov/nwis/iv/?format=waterml&site=08031290&startDT=1900-01-01&endDT=2019-01-01
✔️ End date, start has no timestamp, end has timestamp
https://nwis.waterservices.usgs.gov/nwis/iv/?format=waterml&site=08031290&startDT=1900-01-01&endDT=2019-01-01T00%3A00%3A00
❌ End date, start has timestamp, end has no timestamp
https://nwis.waterservices.usgs.gov/nwis/iv/?format=waterml&site=08031290&startDT=1900-01-01T00%3A00%3A00&endDT=2019-01-01
❌ End date, start has timestamp, end has timestamp
https://nwis.waterservices.usgs.gov/nwis/iv/?format=waterml&site=08031290&startDT=1900-01-01T00%3A00%3A00&endDT=2019-01-01T00%3A00%3A00

It looks like the call fails whenever there's an end date, and the start date also has a timestamp. I tested this in a Jupyter Notebook as well, and those two calls returned an empty dictionary.

@erekalper
Copy link

From those last two, the website returns the following:
image

I get the same error through a starting timestamp of 1900-01-01T04:59:59: https://nwis.waterservices.usgs.gov/nwis/iv/?format=waterml&site=08031290&startDT=1900-01-01T04%3A59%3A59&endDT=2019-01-01T00%3A00%3A00
Any time after that, starting at 1900-01-01T05:00:00, I get an actual return. I wonder if it's somehow assuming a local time vs. UCT? I'm on EST, which is five behind UCT, which is really the only thing I could think of for why we'd see that difference. Is it the same for others here in different local timezones?

@emiliom
Copy link
Contributor Author

emiliom commented Dec 10, 2019

Thanks, @erekalper and @solomon-negusse ! Nice sleuthing.

I don't have anything to add, except to bring in @jkreft-usgs to see if we can interest him in chiming in about these NWIS web service issues and start-datetime limits. Jim or someone on his NWIS team are the ones who can provide definitive answers.

@solomon-negusse
Copy link
Member

Any time after that, starting at 1900-01-01T05:00:00, I get an actual return. I wonder if it's somehow assuming a local time vs. UCT? I'm on EST, which is five behind UCT, which is really the only thing I could think of for why we'd see that difference. Is it the same for others here in different local timezones?

I'm on CST (UTC - 6 hrs) time zone and getting valid response with 1900-01-01T05:00:00.. I'd have expected it to fail up to 1900-01-01T05:59:00 if it was localizing.

@jkreft-usgs
Copy link

Time is hard! There are differences between the different services, as well as extremely confusing time zone rules. I think that the earliest instantaneous data we have goes back to the 30s or so, so choosing a date somewhere in the 1910s will be fine. 2007 used to be a hard cut-off, but there was an effort some number of years ago to back-load data from an offline archive so that it could be available via public web services. Worth noting that we are also planning on building and rolling out new services over the course of the coming months and years that should be much more reasonable, use UTC by default, etc.

@emiliom
Copy link
Contributor Author

emiliom commented Dec 14, 2019

Thanks @jkreft-usgs ! I think we have everything we need to update the ulmo nwis reader.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants