Skip to content
This repository has been archived by the owner on May 28, 2024. It is now read-only.

Why is 01466500 so different? #55

Open
jsadler2 opened this issue Mar 3, 2022 · 7 comments
Open

Why is 01466500 so different? #55

jsadler2 opened this issue Mar 3, 2022 · 7 comments
Assignees
Labels

Comments

@jsadler2
Copy link
Collaborator

jsadler2 commented Mar 3, 2022

Site 01466500 is different. Not bad. Just different.

https://waterdata.usgs.gov/monitoring-location/01466500

This came to our attention when we looked at the hidden states (#52 (comment)).

@jsadler2
Copy link
Collaborator Author

jsadler2 commented Mar 3, 2022

As @amcarter in #52 (comment) showed, it's not the LAI

@jsadler2
Copy link
Collaborator Author

jsadler2 commented Mar 3, 2022

It is off on it's own on the coastal plain where all of the other sites are in piedmont basins.
image

@jsadler2
Copy link
Collaborator Author

jsadler2 commented Mar 3, 2022

When we look at the DO data, 01466500, is a lot lower than the other sites (comparing to just two others here)
image

@jsadler2
Copy link
Collaborator Author

jsadler2 commented Mar 3, 2022

This comes through in the predictions as well:
image

@jsadler2
Copy link
Collaborator Author

jsadler2 commented Mar 3, 2022

So the model is actually tailoring to that site specifically to get lower DO values. So maybe that's part of what we are seeing in the differences in hidden states at 01466500. But when I saw those differences in hidden states, I for some reason was expecting more than a change in scale which is what this looks like to me.

@lekoenig
Copy link
Collaborator

lekoenig commented Mar 9, 2022

I grabbed our PRMS-scale attributes to see what catchment/stream characteristics might set 01466500 apart. Compared to our population of sites, 01466500 (colored below in orange) is located in a catchment that is relatively low-gradient (coastal plain, makes sense) with relatively high canopy cover.
prms_attr

I also pulled some extra catchment characteristics from StreamCat as well as the NHDPlus value-added attributes. Many of these are included in our wish list of input variables (issue #51) and we'll eventually grab these data from analogous datasets on ScienceBase once we have a shared static attributes repo set up with inland salinity.

Based on the plots below, 01466500 is a relatively small stream and its catchment has relatively high canopy cover, sandy/permeable soils, and high wetland cover. It makes sense to me that DO concentrations would be lower at that site if riparian wetlands influence the stream DO signal, either because the stream signal reflects the lower DO concentrations of an upstream wetland, or because organic matter delivered from proximal wetlands fuels DO consumption within the stream.

Soils characteristics:
STATSGO

Geomorphic/stream size characteristics:
nhd_geomorph

Land cover characteristics:
NLCD

So there do seem to be some catchment characteristics that differentiate 01466500 and help explain the lower DO concentrations. I'm not sure how those might be impacting the hidden states that the model is using to predict DO, though. If H3 contained information about light reaching the stream, for example, perhaps that seasonal pattern is disrupted at 01466500 if the water is darker (i.e. has a relatively high organic carbon content) and thus less incoming light reaches biofilms on the streambed. But I may just be 'reading the tea leaves' here...

@lekoenig
Copy link
Collaborator

I've wondered whether the model is essentially learning too much from site 01466500, which does have some catchment characteristics that set it apart from the others. @jsadler2 pointed out that we could test this hypothesis by adding 01466500 to the list of validation sites in the experiment config yml and ask what the impact is of leaving this site out of the model training.

@lekoenig lekoenig added the EDA label Aug 11, 2022
@lekoenig lekoenig self-assigned this Aug 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants