Issue 229 new ecnt #231

TatianaBurek · 2022-11-07T16:39:16Z

Pull Request Testing

Describe testing already performed for these changes:
compared results on the added methods with the statistics calculated by MET
Recommend testing for the reviewer(s) to perform, including the location of input datasets, and any additional instructions:
Do these changes include sufficient documentation updates, ensuring that no errors or warnings exist in the build of the documentation? [Yes or No]
Do these changes include sufficient testing updates? [Yes or No]
Will this PR result in changes to the test suite? [Yes or No]

If yes, describe the new output and/or changes to the existing output:
Please complete this pull request review by [Fill in date].

Pull Request Checklist

See the METplus Workflow for details.

Review the source issue metadata (required labels, projects, and milestone).
Complete the PR definition above.
Ensure the PR title matches the feature or bugfix branch name.
Define the PR metadata, as permissions allow.
Select: Reviewer(s)
Select: Organization level software support Project or Repository level development cycle Project
Select: Milestone as the version that will include these changes
After submitting the PR, select Development with the original issue number.
After the PR is approved, merge your changes. If permissions do not allow this, request that the reviewer do the merge.
Close the linked issue and delete your feature or bugfix branch from GitHub.

put common code that use weighted average approach into the single method #229

…core groups #229

metcalcpy/util/ecnt_statistics.py

JohnHalleyGotway · 2022-11-07T19:11:29Z

metcalcpy/util/ecnt_statistics.py

+    warnings.filterwarnings('error')
+    try:
+        n_ge_obs = sum_column_data_by_name(input_data, columns_names, 'n_ge_obs')
+        me_ge_obs = sum_column_data_by_name(input_data, columns_names, 'me_ge_obs')/n_ge_obs


I don't think this logic is correct. We need to compute an aggregated ME_GE_OBS as a weighted average where N_GE_OBS defines the weight. That's different from what's being compute here.

In the common case when total is used as a weight we get the sum of total values, get sum of the column values and divide column sum on total sum (weighted_average method)
Here I did the same but replaced total with n_ge_obs. Why this is incorrect? How should it be computed?

JohnHalleyGotway · 2022-11-07T19:11:49Z

metcalcpy/util/ecnt_statistics.py

+    warnings.filterwarnings('error')
+    try:
+        n_lt_obs = sum_column_data_by_name(input_data, columns_names, 'n_lt_obs')
+        me_lt_obs = sum_column_data_by_name(input_data, columns_names, 'me_lt_obs')/n_lt_obs


I don't think this logic is correct. We need to compute an aggregated ME_LT_OBS as a weighted average where N_LT_OBS defines the weight. That's different from what's being compute here.

JohnHalleyGotway

@TatianaBurek thanks for working on these updates. I made a handful of comments that require attention. I'll mark this as "Request Changes". Please just re-request my review once you're finished with the next round of updates.

JohnHalleyGotway · 2022-11-08T22:04:58Z

metcalcpy/util/ecnt_statistics.py

    """
    warnings.filterwarnings('error')
    try:
        total = get_total_values(input_data, columns_names, aggregation)
-        crps_emp_fair = sum_column_data_by_name(input_data, columns_names, 'crps_emp_fair') / total
-        result = round_half_up(crps_emp_fair, PRECISION)
+        statistic = sum_column_data_by_name(input_data, columns_names, column_name) / total


@TatianaBurek, yes, good point I suspect there's an issue here as well. It's possible I just don't understand how this code is working. But it looks to me like you're summing up the STATISTIC values and TOTAL counts and dividing the first by the second.

Using R to provide an example of aggregating MAE values, where the weight is defined by the total column:

R MAE = c(5, 10, 8) TOTAL = c(1000, 1500, 1250) c("Bad aggregated value using this logic = ", sum(MAE) / sum(TOTAL)) # Prints incorrect value of 0.00613 c("Correct weighted aggregation = ", sum(MAE*TOTAL)/sum(TOTAL)) # Prints correct value of 8

Those values are so different, so I'm assuming this IS working, but I'm just not grasping how.

the preprocessing of the data - calculation of the additional statistics , multiplying by the weight, renaming columns - is happening in agg_stat.py script. For example, here I prepare ecnt data:

METcalcpy/metcalcpy/agg_stat.py

Line 448 in deedf36

def _prepare_ecnt_data(self, data_for_prepare):

calculate_ methods use the data that was already multiplied by the weight

The decision on separating preparing and calculating data was made based on the need to improve the speed of bootstraping process. It makes more sense to precalculate base values once and use these values n-replication times during bootstraping

OK, as long as you're confident that the weighted averages are being computed properly, I'll go ahead and approve. Those details just aren't immediately obvious when reviewing the code.

JohnHalleyGotway

I approve of these changes after Tatiana double-checked and confirmed that the weighted averages are being computed properly.

TatianaBurek · 2022-11-09T17:22:58Z

I checked that me_ge_obs and me_lt_obs are calculated the same way as other weighted average stats but instead of total as the weight they use n_ge_obs and n_lt_obs.

TatianaBurek added 6 commits November 4, 2022 11:53

ECNT_SPREAD_MD stat is added

5b24467

put common code that use weighted average approach into the single method #229

ECNT_SPREAD_MD stat is added #229

48db47d

mae and mae_oerr stat are added #229

c5854e8

mae and mae_oerr as zero_perf_score_stats #229

f8f268f

added ECNT bias ratio stats #229

c0e48cc

print a message if a statistic doesn't belong to any of the perfect s…

5b8cd36

…core groups #229

TatianaBurek added this to the METcalcpy-2.0 milestone Nov 7, 2022

TatianaBurek requested a review from JohnHalleyGotway November 7, 2022 16:39

JohnHalleyGotway reviewed Nov 7, 2022

View reviewed changes

metcalcpy/util/ecnt_statistics.py Show resolved Hide resolved

JohnHalleyGotway reviewed Nov 7, 2022

View reviewed changes

metcalcpy/util/ecnt_statistics.py Show resolved Hide resolved

JohnHalleyGotway reviewed Nov 7, 2022

View reviewed changes

JohnHalleyGotway requested changes Nov 7, 2022

View reviewed changes

replace commonn code with weighted_average method #229

deedf36

JohnHalleyGotway reviewed Nov 8, 2022

View reviewed changes

JohnHalleyGotway approved these changes Nov 9, 2022

View reviewed changes

TatianaBurek merged commit 1210d10 into develop Nov 9, 2022

TatianaBurek deleted the issue_229_new_ecnt branch November 9, 2022 21:21

TatianaBurek linked an issue Nov 9, 2022 that may be closed by this pull request

Implement aggregation of the new ECNT statistics: SPREAD_MD, MAE, MAE_OERR, BIAS_RATIO, N_GE_OBS, ME_GE_OBS, N_LT_OBS, and ME_LT_OBS #229

Closed

21 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue 229 new ecnt #231

Issue 229 new ecnt #231

TatianaBurek commented Nov 7, 2022

JohnHalleyGotway Nov 7, 2022

TatianaBurek Nov 8, 2022

JohnHalleyGotway Nov 7, 2022

JohnHalleyGotway left a comment

JohnHalleyGotway Nov 8, 2022

TatianaBurek Nov 9, 2022

TatianaBurek Nov 9, 2022

JohnHalleyGotway Nov 9, 2022

JohnHalleyGotway left a comment

TatianaBurek commented Nov 9, 2022

Issue 229 new ecnt #231

Issue 229 new ecnt #231

Conversation

TatianaBurek commented Nov 7, 2022

Pull Request Testing

Pull Request Checklist

JohnHalleyGotway Nov 7, 2022

Choose a reason for hiding this comment

TatianaBurek Nov 8, 2022

Choose a reason for hiding this comment

JohnHalleyGotway Nov 7, 2022

Choose a reason for hiding this comment

JohnHalleyGotway left a comment

Choose a reason for hiding this comment

JohnHalleyGotway Nov 8, 2022

Choose a reason for hiding this comment

TatianaBurek Nov 9, 2022

Choose a reason for hiding this comment

TatianaBurek Nov 9, 2022

Choose a reason for hiding this comment

JohnHalleyGotway Nov 9, 2022

Choose a reason for hiding this comment

JohnHalleyGotway left a comment

Choose a reason for hiding this comment

TatianaBurek commented Nov 9, 2022