Skip to content

Enhancement/issue 170#180

Merged
ejsimley merged 10 commits intoNatLabRockies:developfrom
ejsimley:enhancement/issue_170
Jan 14, 2022
Merged

Enhancement/issue 170#180
ejsimley merged 10 commits intoNatLabRockies:developfrom
ejsimley:enhancement/issue_170

Conversation

@ejsimley
Copy link
Collaborator

This fixes issue #170 by replacing the hard-coded start and end dates for the long-term reanalysis data used in the MonteCarloAEP class with dates determined automatically based on the raw data. Additionally, as was suggested in issue #170, an optional argument is added to MonteCarloAEP allowing the user to specify a particular end date. Most of the changes are in the process_reanalysis_data method that prepares the aggregate reanalysis data frame when a MonteCarloAEP object is created.

The changes are summarized as follows:

  • In the MonteCarloAEP.process_reanalysis_data method, instead of using hard-coded start and end dates for the aggregate reanalysis data frame the end date is chosen as the end of the last full month when all reanalysis products contain data. The start date is the beginning of the first full month/day/hour when all reanalysis products contain data, depending on the time resolution.
  • The optional argument end_date_lt is added to MonteCarloAEP. When this argument is defined, it is used as the end date for the aggregate reanalysis data frame instead of the default last full month. However, an error will be raised if the specified end date is past the last full time period when all reanalysis products contain data or if it does not leave enough data to cover the maximum number of years used in the long-term correction. Note that if the time resolution is monthly, then only the month field is considered (e.g., if end_date_lt is "2020-05-15", the last time period will be the full month of May, not April 15 - May 15). Similar logic applies to daily resolution.
  • A bug was fixed in undoing the normalization to 30-day months in the MonteCarloAEP.run_AEP_monte_carlo method. Previously, the list of the number of days per month was applied to the gross_lt time series, assuming the time series started in January and ended in December. But this wasn't always the case for gross_lt. Now the list of number of days per month is shifted to match the specific start and end dates of gross_lt.
  • By forcing the aggregate reanalysis data frame to end at the last full month when both reanalysis products contain data, the AEP uncertainty computed in the example 2 notebook increases from ~9% to ~10%. I updated this value in the notebook discussion. Previously, when ERA5 was selected in a particular iteration the reanalysis time series ended in December 2019, but when MERRA2 was selected it ended in April 2019. Given the new changes, now they both end in April 2019. This change seems to explain the increase in uncertainty by increasing the IAV. In fact, when I specified December 2018 as the end date for both products, IAV decreased significantly, so the IAV appears to be pretty sensitive to when the annual averaging period starts and ends. Not suggesting we change anything now, but thought this was interesting.
  • Added several tests for the new method choosing the start and end dates.
  • Updated the old plant analysis test results since they changed with the new reanalysis end dates.

@ejsimley ejsimley linked an issue Dec 16, 2021 that may be closed by this pull request
@codecov-commenter
Copy link

codecov-commenter commented Dec 17, 2021

Codecov Report

Merging #180 (1f97d74) into develop (f2211db) will increase coverage by 0.43%.
The diff coverage is 96.42%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #180      +/-   ##
===========================================
+ Coverage    66.50%   66.93%   +0.43%     
===========================================
  Files           24       24              
  Lines         1803     1830      +27     
===========================================
+ Hits          1199     1225      +26     
- Misses         604      605       +1     
Impacted Files Coverage Δ
operational_analysis/methods/plant_analysis.py 94.76% <96.42%> (+0.13%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f2211db...1f97d74. Read the comment docs.

Copy link
Collaborator

@RHammond2 RHammond2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks great, and have 3 things that need to be addressed below, which both should be fairly easy and quick to address.

  • I left a comment in the tests to add in tests for working/failing start dates, and to split up the function into a monthly, daily, and hourly test for readability.
  • In Notebook 2, cell 18, there is a comment about adding a line of best fit that doesn't exist, you might just want to remove that.
  • In Notebook 2b, cell 8, the text description of what the plots show is a quite brief, and it might be nice to another sentence or two to fill in some more details on the (in)significance of the differences/similarities between the models and their resolutions.

…a doesn't include enough time for LT correction; improving readability of plant analysis reanalysis test functions
@ejsimley
Copy link
Collaborator Author

@RHammond2 thanks for the quick review! I addressed most of your comments as follows:

  • Since there is no start date argument (the long-term correction always works backward from the end date), I didn't add any tests for valid start dates. However, I did add some error checking if the provided reanalysis data in the PlantData object doesn't contain enough time history, even if the default end date is used.
  • Split the new plant analysis reanalysis test functions into separate monthly, daily, and hourly functions to improve readability, as you suggested
  • Removed the text in cell 18 of example 2 that described the best fit line that didn't exist. Also revised the text elsewhere to reflect how the method now uses bootstrapping to sample different regression slopes and intercepts (by "now" I mean for the past year or so)
  • In notebook 2b, cell 8, I think that's fine to add some more description, but I might ask Nicola what he thinks we should add since he added the ML options and made this notebook. @nbodini would you be able to suggest some more details we could add to describe the results for the four different AEP analysis configurations in cell 8 of notebook 2b? Currently the text says: "For this specific case, we see a decrease in AEP uncertainty when the calculation is performed with a machine learning regression model at daily resolution, which becomes even more significant when performing the calculation at hourly resolution." Thanks!

@nbodini
Copy link
Collaborator

nbodini commented Jan 4, 2022

This looks great, thanks Eric!

I have added the following text to example 2b (I have added this in PR #173 just because it was easier for me..): "Our analysis on a larger set of wind plants (Bodini et al. 2021, Wind Energy) showed how the uncertainty component connected to the regression decreases by up to 60% (relative change) when moving from monthly to daily resolution, and by up to 80% (relative change) when moving from monthly to hourly resolution. " Let me know if you'd like to edit this or add more info.

A couple of comments:

  • in PR Update filters in AEP class #173, I have added the test check_simulation_results_gam_daily_outliers to test the bin filter at daily resolution. I imagine the results of that test will change following the changes being implemented here. Something we should keep in mind when we merge the two PRs.
  • did we figure out how things work in the TIE class re: start-end periods? One thing I noticed is that the TIE test is very generous (it checks only 1 decimal, as opposed to 6 for the AEP test), so we might want to change that to be consistent and to catch any potential issues.

@RHammond2
Copy link
Collaborator

I think Nicola's suggested edit for Notebook 2b, cell 8 will clear things up quite a bit, and add some of that commentary between the plots, just so readers don't get lost along the way, even if it's largely the same text. Also, I think this looks ready to go from my perspective at this point, so I'm updating my review to approved.

Copy link
Collaborator

@RHammond2 RHammond2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ready to go from my perspective now, nice work!

@ejsimley
Copy link
Collaborator Author

ejsimley commented Jan 5, 2022

Thanks for taking a look @nbodini. Regarding your two comments:

  • Yes, the test results will definitely have to be updated in this or PR Update filters in AEP class #173 depending on which is merged first. Let me know if you have a preference for which gets merged first. Otherwise, we could merge Update filters in AEP class #173 first, since it might be easier to verify the new test results without changing the date range too.
  • The TIE class appears to use the full amount of data in whichever reanalysis product is used in a given MC iteration. So there aren't any hard-coded dates we need to change. Do you think we should apply a similar start/end date selection to TIE as is done here for AEP analysis? I think probably the right thing to do would be to at least make sure the different reanalysis products use the same date range for consistency, and then consider adding arguments to specify the number of years used in the LT-correction and an optional end date. But I'm hesitant to start adding that right now. We could add that as an issue and then try to address it in the next month or so. Let me know what you think! Also, I'd support changing the TIE test results to 6 decimal places to be more precise. Feel free to do that in the filtering PR.

@nbodini
Copy link
Collaborator

nbodini commented Jan 5, 2022

For your first comment, it sounds good to me to merge #173 first.

For the TIE, I agree we don't need to add anything now, but yes, in the future it would ne nice to add a couple of extra options/checks as you suggested.

Finally, I'm working to fix the TIE test in #173.

merging develop and resolving some expected conflicts in the example notebooks and tests
…ed AEP analysis test results to reflect new reanalysis end dates
@ejsimley
Copy link
Collaborator Author

@nbodini and @RHammond2, I just merged in Nicola's filtering changes and updated some of the notebooks and test results to account for both filtering and new reanalysis end dates. Specifically, I made the following updates:

  • int_pruf_plant_analysis.py: updated AEP analysis test results that were affected by the new method for determining the reanalysis end date and the filtering approach. All the changes seemed reasonable.
  • example 02 notebook: reran the notebook to generate results with the new end dates. Also updated some of the descriptions of the Monte Carlo AEP method to reflect how the new-ish method works (in addition to the revisions Nicola made).
  • example 02b notebook: just reran the notebook to update the results based on the new reanalysis end dates. None of the conclusions changed.
  • example 05 notebook: I noticed none of the results were shown (most of the cells weren't run), so I just ran the whole notebook so the output plot is visible in Github and the documentation (unless if there was a reason for not running the notebook originally before committing it?).

Please feel free to take a look, otherwise I can merge this tomorrow (Friday).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Hard-coded re-analysis end date

4 participants