-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 229 new ecnt #231
Issue 229 new ecnt #231
Conversation
put common code that use weighted average approach into the single method #229
warnings.filterwarnings('error') | ||
try: | ||
n_ge_obs = sum_column_data_by_name(input_data, columns_names, 'n_ge_obs') | ||
me_ge_obs = sum_column_data_by_name(input_data, columns_names, 'me_ge_obs')/n_ge_obs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this logic is correct. We need to compute an aggregated ME_GE_OBS as a weighted average where N_GE_OBS defines the weight. That's different from what's being compute here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the common case when total is used as a weight we get the sum of total values, get sum of the column values and divide column sum on total sum (weighted_average method)
Here I did the same but replaced total with n_ge_obs. Why this is incorrect? How should it be computed?
warnings.filterwarnings('error') | ||
try: | ||
n_lt_obs = sum_column_data_by_name(input_data, columns_names, 'n_lt_obs') | ||
me_lt_obs = sum_column_data_by_name(input_data, columns_names, 'me_lt_obs')/n_lt_obs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this logic is correct. We need to compute an aggregated ME_LT_OBS as a weighted average where N_LT_OBS defines the weight. That's different from what's being compute here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TatianaBurek thanks for working on these updates. I made a handful of comments that require attention. I'll mark this as "Request Changes". Please just re-request my review once you're finished with the next round of updates.
""" | ||
warnings.filterwarnings('error') | ||
try: | ||
total = get_total_values(input_data, columns_names, aggregation) | ||
crps_emp_fair = sum_column_data_by_name(input_data, columns_names, 'crps_emp_fair') / total | ||
result = round_half_up(crps_emp_fair, PRECISION) | ||
statistic = sum_column_data_by_name(input_data, columns_names, column_name) / total |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TatianaBurek, yes, good point I suspect there's an issue here as well. It's possible I just don't understand how this code is working. But it looks to me like you're summing up the STATISTIC values and TOTAL counts and dividing the first by the second.
Using R to provide an example of aggregating MAE values, where the weight is defined by the total column:
R
MAE = c(5, 10, 8)
TOTAL = c(1000, 1500, 1250)
c("Bad aggregated value using this logic = ", sum(MAE) / sum(TOTAL))
# Prints incorrect value of 0.00613
c("Correct weighted aggregation = ", sum(MAE*TOTAL)/sum(TOTAL))
# Prints correct value of 8
Those values are so different, so I'm assuming this IS working, but I'm just not grasping how.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the preprocessing of the data - calculation of the additional statistics , multiplying by the weight, renaming columns - is happening in agg_stat.py script. For example, here I prepare ecnt data:
METcalcpy/metcalcpy/agg_stat.py
Line 448 in deedf36
def _prepare_ecnt_data(self, data_for_prepare): |
calculate_ methods use the data that was already multiplied by the weight
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The decision on separating preparing and calculating data was made based on the need to improve the speed of bootstraping process. It makes more sense to precalculate base values once and use these values n-replication times during bootstraping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, as long as you're confident that the weighted averages are being computed properly, I'll go ahead and approve. Those details just aren't immediately obvious when reviewing the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approve of these changes after Tatiana double-checked and confirmed that the weighted averages are being computed properly.
I checked that me_ge_obs and me_lt_obs are calculated the same way as other weighted average stats but instead of total as the weight they use n_ge_obs and n_lt_obs. |
Pull Request Testing
Describe testing already performed for these changes:
compared results on the added methods with the statistics calculated by MET
Recommend testing for the reviewer(s) to perform, including the location of input datasets, and any additional instructions:
Do these changes include sufficient documentation updates, ensuring that no errors or warnings exist in the build of the documentation? [Yes or No]
Do these changes include sufficient testing updates? [Yes or No]
Will this PR result in changes to the test suite? [Yes or No]
If yes, describe the new output and/or changes to the existing output:
Please complete this pull request review by [Fill in date].
Pull Request Checklist
See the METplus Workflow for details.
Select: Reviewer(s)
Select: Organization level software support Project or Repository level development cycle Project
Select: Milestone as the version that will include these changes