Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Metrics and Simple Baselines #1

Closed
11 of 14 tasks
jacobbieker opened this issue Jan 12, 2023 · 12 comments · Fixed by #2
Closed
11 of 14 tasks

Add Metrics and Simple Baselines #1

jacobbieker opened this issue Jan 12, 2023 · 12 comments · Fixed by #2
Assignees
Labels
enhancement New feature or request

Comments

@jacobbieker
Copy link
Member

jacobbieker commented Jan 12, 2023

We should have standard metrics and very simple baselines for use in training our models.

Baseline Models:

  • All 0's
  • All max capacity
  • Last-value persistence
  • Last-day persistence (copy the last day as the next days forecast)
  • PV Live intraday

Metrics:

  • NMAE
  • Error during morning, afternoon, and evening
  • Error across different time horizons
  • Error in different Winter, Spring, Summer, Fall
  • RMSE
  • MAE
  • Count of errors larger than 1 and 2 GigaWatts option
  • Option to keep night time data, or remove it ( defined by sun angle)
  • % of errors above 1.65 sigma, with sigma being the standard deviation of the outturn
@jacobbieker jacobbieker added the enhancement New feature or request label Jan 12, 2023
@jacobbieker jacobbieker self-assigned this Jan 12, 2023
@jacobbieker
Copy link
Member Author

@peterdudfield @dantravers Any other metrics or simple baselines I should include?

@peterdudfield
Copy link

some discussion in here - https://docs.google.com/document/d/1E9pccSVVIfn8m14fUqBCVLWKNiU1dUe_zgTcurqLWww/edit

Probably good to include

  • RMSE, not just MAE.
  • MAE, as well as NMAE
  • Count of errors greater than 1 (and 2) GW (this only makes sense for national forecast)
  • option to keep night time data, or not (this can be defined by sun angle)
  • hmmmmmm,

There's are lots ofthers, but making a v1 first would be good, then we can slow add to them. Good to make it module becasue of this

Also: I'm not sure if nowcasting_utils is the right place, i think all that code is quite out of data. I'd be tempted to make a new repo, ocf-ml-metrics.

@peterdudfield
Copy link

btw: thanks for making the issue @jacobbieker

@jacobbieker
Copy link
Member Author

Yeah, sounds good! Just made the repo, will move this over

@jacobbieker jacobbieker transferred this issue from openclimatefix/nowcasting_utils Jan 12, 2023
@peterdudfield
Copy link

Good use of 'transfer' feature

@dantravers
Copy link

Sounds good! Another "model" I would run is to compare against is PV_Live intraday versus PV_Live updated. This gives an estimate of accuracy for national and GSP that we know we want to beat.

To generalise the "large errors" - for National I think this is good to look at errors > 1GW (or maybe 2GW). For site level or GSP level, I would apply a statistical measure. Propose to count the % of errors which are greater than 1.65sigma, where sigma is the standard deviation of the timeseries of outturns. 1.65sigma equates to the 5 / 95% range of outturns for the site.
Could be any threshold, but this is probably as good as any.

@jacobbieker
Copy link
Member Author

Okay, sounds good, I've added those. How much intraday PV Live do we have @peterdudfield ? Don't remember when we started collecting it

@peterdudfield
Copy link

We avtaully have a few years worth, Jame re run some things.

I would suggestion we use Plive intraday as another baseline mode, rather than intagle each model with PV Live intraday. That make sense?

@peterdudfield
Copy link

Sounds good! Another "model" I would run is to compare against is PV_Live intraday versus PV_Live updated. This gives an estimate of accuracy for national and GSP that we know we want to beat.

To generalise the "large errors" - for National I think this is good to look at errors > 1GW (or maybe 2GW). For site level or GSP level, I would apply a statistical measure. Propose to count the % of errors which are greater than 1.65_sigma, where sigma is the standard deviation of the timeseries of outturns. 1.65_sigma equates to the 5 / 95% range of outturns for the site. Could be any threshold, but this is probably as good as any.

Perhaps a straigh forward threshol could be v0, then for v1 we could look at something a bit more eloborate

@jacobbieker
Copy link
Member Author

Yeah, that's what I was thinking for the model. We can compute the errors and save those too if we want to, but would do that later. Comparing to the day after PV Live is already what the other error metrics do so don't think we need to include that separately. Is the intraday saved somewhere in a file?

I would suggestion we use Plive intraday as another baseline mode, rather than intagle each model with PV Live intraday. That make sense?

@dantravers
Copy link

dantravers commented Feb 13, 2023

  • % of errors above 1.65 sigma, with sigma being the standard deviation of the outturn

Thinking about this further - I would simplify and make the large errors anything above a % of the capacity for that region / site. E.g. 10% of installed capacity.

@dantravers
Copy link

dantravers commented Feb 13, 2023

I would suggest the following metrics are the "headline" metrics that we use to compare models at the first pass, and then look at others for more detail:

  • nMAE
  • split by forecast horizon

For national forecasts:

  • % errors > 1GW
  • MAE (although doesn't provide more information than nMAE, it is humanly understood)
  • split by forecast horizon.

For site-level forecasts:

  • % errors > 10% of installed capacity (large errors - an equivalent of the 1GW for national)

The errors by time of day, season, etc, are useful, and should be used for more detailed comparisons of models. It would be good to standardise on a way to present these metrics. I.e. a particular grid format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants