Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add config model and corresponding tests #45

Merged
merged 25 commits into from
Oct 4, 2024
Merged

add config model and corresponding tests #45

merged 25 commits into from
Oct 4, 2024

Conversation

AUdaltsova
Copy link
Contributor

Pull Request

Description

Copied over the relevant parts of the config model from ocf_datapipes (pv, nwp, gsp, satellite) and corresponding tests + data for them.

  • There is quite a lot I had to strip out that is not relevant to what the code uses right now. Is there anything we want to add back before merging?
  • Left several ToDo's for places I wasn't sure were still relevant but left untouched

How Has This Been Tested?

Ran the test pack

  • Yes

Checklist:

  • My code follows OCF's coding style guidelines
  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked my code and corrected any misspellings

@dfulu
Copy link
Member

dfulu commented Sep 2, 2024

I think there is still a lot of stuff we could strip out of the config model. I think we should take it down to the absolute simplest it can be. I think its better to add in things as we need them rather than port everything across just because we have it

@peterdudfield
Copy link
Contributor

I wonder weather we should try to train a model first before bring in stuff from ocf_datapipes.
This is what we I thought agreed here - #7

@AUdaltsova
Copy link
Contributor Author

AUdaltsova commented Sep 2, 2024

@dfulu

I think there is still a lot of stuff we could strip out of the config model. I think we should take it down to the absolute simplest it can be. I think its better to add in things as we need them rather than port everything across just because we have it

Yeah I agree. I think I stripped out all mixins and properties we don't use minus sequence length properties (i think it's safe to strip out), system dropout (also at least not used in here, but correct me if I'm wrong), and this change of filename thing which I'm not sure if it's still in use. Unless you mean removing arguments inside the mixins, which is probably good to do but I wouldn't be sure what to remove (I guess I can just strip everything that tests/test_data/pvnet_test_config.yaml doesn't use?)

@AUdaltsova
Copy link
Contributor Author

@peterdudfield

I wonder weather we should try to train a model first before bring in stuff from ocf_datapipes. This is what we I thought agreed here - #7

Yes I agree that we want to avoid training a model on a different config to limit the searchspace for bugs, but if it's an exact copy I don't think there is any harm in bringing it over (though I guess there's always a risk of human error, especially since we're stripping stuff out... up for debate). I was thinking we can merge a copy and agree not to merge any changes to it before the model is trained, because depending on how long that takes not having a baseline for config here might block development of other stuff?

@AUdaltsova
Copy link
Contributor Author

Also, should we make dropout_timedeltas_mintes a required field? I think it is useful to require it so it gets thought about, and putting [0] as a way of "turning it off" should provide the same behaviours as passing null currently does. It means that this check making sure dropout is less than 0 will need to be changed, but something has to be done about this anyway bc currently config actually allows for dropout timedeltas to be 0, so one of them needs to be changed. Or we can just delete the check inside dropout since config already checks for that. Do you guys know why it was introduced in the firs place maybe?

@dfulu
Copy link
Member

dfulu commented Oct 1, 2024

Tbh now I'm questioning whether test_config.yaml is needed at all, if anything it has less parameters than pvnet_test_config.yaml. But I dislike the idea of substituting it with PVNet config (even if now they are equivalent) because moving forward it will have to be different and I think it's better to amend specific test_config.yaml than swap them in and then out.

Alternatively, we can make the config tests cycle through example configs for all pipelines? Right now it will just be PVNet but once the others are added there will be more. I think it offers higher fidelity to actual use-cases than just meshing every parameter into one file; but also might be too much of a faff for nothing.

I'd leave it for now. At the moment the Configuration class is supposed to work for all pipelines and so I think just testing it once with an overcomplete file is good enough. Once the config has been refactored into lots of composable mixins, we might think about having separate pydantic Configuration for each dataset if they really do need to diverge. Trying to force one config to work for mutliple datapipes is what made the config so messy in ocf_datapipes in the first place.

@dfulu
Copy link
Member

dfulu commented Oct 1, 2024

Also, should we make dropout_timedeltas_mintes a required field? I think it is useful to require it so it gets thought about, and putting [0] as a way of "turning it off" should provide the same behaviours as passing null currently does. It means that this check making sure dropout is less than 0 will need to be changed, but something has to be done about this anyway bc currently config actually allows for dropout timedeltas to be 0, so one of them needs to be changed. Or we can just delete the check inside dropout since config already checks for that. Do you guys know why it was introduced in the firs place maybe?

I'm not sure about this, but I also really feel strongly about it. The current UK PVNet on prod doesn't make use of dropout (except for NWP delay which we should distinguish), though some of the models on dev do. So in that sense I think it could be optional. But feel free to change it if you think it makes more sense. However my preference of an "off" setting would be an empty list rather than [0] if we don't use None

We should definitely at least align those checks! We could delete the check from the function itself if we think it won't be used independently from the config, which has always been true for us. It is a very computationally cheap check though so I'd choose whichever we think is better code quality. @peterdudfield might have some thoughts on this?

There is currently an additional way of turning off dropout which is to set dropout_fraction=0. It might actually be good in the config to check that the dropout_fraction and dropout_timedeltas_mintes are both set to "on" or both set to "off" and not allow one to be "on" and one set to "off" since that could be ambiguous?

Copy link
Member

@dfulu dfulu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really good now @AUdaltsova. My last comments are only nice-to-have's. You can decide what to do with them. Great work!

p.s. when you merge can you make it a squash merge? I think thats good practice so we don't make the git repo really big by storing unneeded history

@AUdaltsova
Copy link
Contributor Author

AUdaltsova commented Oct 1, 2024

However my preference of an "off" setting would be an empty list rather than [0] if we don't use None

I see what you mean and I would usually agree, but I think the whole concept of dropout being "off" is a bit of an illusion: there isn't really such thing as ambiguous delay, you either set it to something or it defaults to instant availability. Which is an "off" in the sense that it's not delayed, but it's still a very specific scenario that you are committing to, so I think it would be nice to think of that and consciously put 0 instead of thinking oh well, just do whatever it is you do, if that makes sense?

EDIT: it occurred to me just now that since there is a separate handling of None it could be beneficial to replace it with an empty list to avoid going through some unnecessary motions. 0 will be treated as an actual dropout value so there will be some extra computation, just resulting in no changes (though probably not expensive at any rate).

It is a very computationally cheap check though so I'd choose whichever we think is better code quality.

Personally, I would go for a config check. This is a restriction on an input parameter and should be checked on input (same as the NWP provider check, which turns out has not been working all this time :) Obviously incorrect providers were being caught down the line when consts are suddenly not there, so it is not even strictly necessary, but I think it should stay and flag this right away instead of half-way through selection)

There is currently an additional way of turning off dropout which is to set dropout_fraction=0. It might actually be good in the config to check that the dropout_fraction and dropout_timedeltas_mintes are both set to "on" or both set to "off" and not allow one to be "on" and one set to "off" since that could be ambiguous?

That is a very good point! Especially since dropout fraction defaults to "off", so can accidentally negate dropout timedeltas and not fail anywhere (I think)

Copy link

codecov bot commented Oct 4, 2024

Codecov Report

Attention: Patch coverage is 92.59259% with 10 lines in your changes missing coverage. Please review.

Project coverage is 93.12%. Comparing base (ef7c83e) to head (aff1414).
Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
ocf_data_sampler/config/model.py 91.89% 9 Missing ⚠️
ocf_data_sampler/config/save.py 90.90% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #45      +/-   ##
==========================================
+ Coverage   93.05%   93.12%   +0.07%     
==========================================
  Files          22       26       +4     
  Lines         691      815     +124     
==========================================
+ Hits          643      759     +116     
- Misses         48       56       +8     
Flag Coverage Δ
?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@AUdaltsova
Copy link
Contributor Author

We should definitely at least align those checks! We could delete the check from the function itself if we think it won't be used independently from the config, which has always been true for us. It is a very computationally cheap check though so I'd choose whichever we think is better code quality. @peterdudfield might have some thoughts on this?

@peterdudfield thought it's good redundancy in case we change the config or use the fnction without it, so I put it back in but made it a leq to be consistent with config check.

It might actually be good in the config to check that the dropout_fraction and dropout_timedeltas_mintes are both set to "on" or both set to "off" and not allow one to be "on" and one set to "off" since that could be ambiguous?

Added a check and test for that!

@AUdaltsova AUdaltsova merged commit a6b0831 into main Oct 4, 2024
3 checks passed
@AUdaltsova AUdaltsova deleted the add-config branch October 4, 2024 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants