Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature 1259 es_prob_stats #2067

Merged
merged 20 commits into from
Mar 2, 2022
Merged

Conversation

JohnHalleyGotway
Copy link
Collaborator

@JohnHalleyGotway JohnHalleyGotway commented Feb 24, 2022

Expected Differences

This PR is now ready for review.

  • Do these changes introduce new tools, command line arguments, or configuration file options? [Yes]

    If yes, please describe:

    Listed below are the additions to the Ensemble-Stat config file:
   prob_pct_thresh = [ ==0.25 ]; // to define Nx2 PCT table (totally new entry)
   eclv_points = 0.05; // to define ECLV points (already exists in other tools)

And adds entries to output_flag for newly supported prob line types (already exists in other tools):

   pct   = NONE;
   pstd  = NONE;
   pjc   = NONE;
   prc   = NONE;
   eclv  = NONE;
  • Do these changes modify the structure of existing or add new output data types (e.g. statistic line types or NetCDF variables)? [No]

    If yes, please describe:

Pull Request Testing

  • Describe testing already performed for these changes:

    Tested manually with 3 specific uses in mind:
  1. When prob_cat_thresh is defined to list explicit thresholds, they are applied and corresponding prob stats are reported.
  2. When prob_cat_thresh is defined, along with climo bins, each prob_cat_thresh is used to derive and ensemble relative frequency (shown as PROB(NAME>THRESH) in the output). Those pairs (ensemble probabilities / observations) are SUBSET into climo bins. Output is generated for each bin and the summary BIN_MEAN is reported as the average across those climo bins. In this case, the BIN_MEAN TOTAL column contains the sum of the bin totals.
  3. When prob_cat_thresh is empty, climo bins are requested, and climo data is supplied, the probability thresholds are defined relative to the climo distribution. Instead of using climo bins to SUBSET, we use them to define the probability thresholds. The probability forecast are evaluated for each bin using ALL available obs, and the prob statistics are written to the output. The BIN_MEAN TOTAL column now contains the mean of the bin totals. Since all obs are used for each climo bins, those bin totals are a constant value.

Differentiating between these 3 uses is confusing.

  • Recommend testing for the reviewer(s) to perform, including the location of input datasets, and any additional instructions:

    Review the code changes, documentation updates (please check carefully for typos and readability), and unit test output differences.

  • Do these changes include sufficient documentation updates, ensuring that no errors or warnings exist in the build of the documentation? [Yes]

  • Do these changes include sufficient testing updates? [Yes]
    Modified the configuration for 2 existing tests.

  • Will this PR result in changes to the test suite? [Yes]

    If yes, describe the new output and/or changes to the existing output:

    Generates 5 new output files for 2 existing tests. Also modified the contents of the .stat files for those tests. The new files are for the PCT, PSTD, PJC, PRC, and ECLV line types.

  • Please complete this pull request review by [Wed 3/2].

Pull Request Checklist

See the METplus Workflow for details.

  • Review the source issue metadata (required labels, projects, and milestone).
  • Complete the PR definition above.
  • Ensure the PR title matches the feature or bugfix branch name.
  • Define the PR metadata, as permissions allow.
    Select: Reviewer(s)
    Select: Organization level software support Project or Repository level development cycle Project
    Select: Milestone as the version that will include these changes
  • After submitting the PR, select Linked issues with the original issue number.
  • After the PR is approved, merge your changes. If permissions do not allow this, request that the reviewer do the merge.
  • Close the linked issue and delete your feature or bugfix branch from GitHub.

…, PRJ, and ECLV outputs from ensemble-stat. Also added fcat_ta and ocat_ta ThreshArray objects to store the needed fcst and obs thresholds. Still need to actually update ensemble_stat.cc to use them and also need to make config file, unit test, and documentation updates.
…he writing of STAT output lines for both point and gridded verification into a common write_txt_files() function. In the process, I discovered that we were checking the top-level conf_info.output_flag option instead of the vx_opt specific ones. These changes switch to using the vx_opt ones so that we can control the output written task by task.
…stat to define how PCT thresholds. Still need to update documentation, tests, and config files.
@JohnHalleyGotway JohnHalleyGotway added this to the MET 10.1.0 milestone Feb 24, 2022
@JohnHalleyGotway JohnHalleyGotway marked this pull request as draft February 24, 2022 15:46
pd_pnt.extend(pd_ens.n_obs);

// Determine the number of climo CDF bins
n_bin = (pd_pnt.cmn_na.n_valid() > 0 && pd_pnt.csd_na.n_valid() > 0 ?
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@j-opatz this is the spot in the code whose logic I'd like to discuss. I think if the user hasn't defined prob_cat_thresh but has defined climo_mean and climo_stdev along with climo bins, we want to use those thresholds to define/evaluate probabilities. But this is the logic that always gets so confusing!

…as consistent as possible with the default configuration.
…ether the total column should be summed or averaged. Previously, they were always summed since the climo bins were used to SUBSET the matched pairs. In Ensemble-Stat, the full set of pairs can now thresholded multiple times based on the climo bins. As such, the TOTAL value for each input should remain constant. Rather then summing those totals, they should now be averaged (but this is the average of a constant value).
Copy link
Contributor

@j-opatz j-opatz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The write-up is done well, given that the necessary topic of discussion is confusing. Unit tests seem to have the corresponding output, and while I think this feature correctly adds the requested ability in EnsembleStat, it will be best proven when provided back to the EMC requestor and it's used there.

@JohnHalleyGotway JohnHalleyGotway merged commit 514c0c3 into develop Mar 2, 2022
@JohnHalleyGotway JohnHalleyGotway deleted the feature_1259_es_prob_stats branch March 2, 2022 18:21
JohnHalleyGotway added a commit that referenced this pull request Mar 2, 2022
Co-authored-by: Julie Prestopnik <jpresto@seneca.rap.ucar.edu>
Co-authored-by: johnhg <johnhg@ucar.edu>
Co-authored-by: Seth Linden <linden@kiowa.rap.ucar.edu>
Co-authored-by: John Halley Gotway <johnhg@ucar.edu>
Co-authored-by: j-opatz <59586397+j-opatz@users.noreply.github.com>
Co-authored-by: Howard Soh <hsoh@kiowa.rap.ucar.edu>
Co-authored-by: John Halley Gotway <johnhg@kiowa.rap.ucar.edu>
Co-authored-by: jprestop <jpresto@ucar.edu>
Co-authored-by: Howard Soh <hsoh@seneca.rap.ucar.edu>
Co-authored-by: Seth Linden <linden@ucar.edu>
Co-authored-by: hsoh-u <hsoh@ucar.edu>
Co-authored-by: George McCabe <23407799+georgemccabe@users.noreply.github.com>
Co-authored-by: John Halley Gotway <johnhg@seneca.rap.ucar.edu>
Co-authored-by: MET Tools Test Account <met_test@seneca.rap.ucar.edu>
Co-authored-by: mo-mglover <78152252+mo-mglover@users.noreply.github.com>
Co-authored-by: davidalbo <dave@ucar.edu>
Co-authored-by: lisagoodrich <33230218+lisagoodrich@users.noreply.github.com>
@JohnHalleyGotway JohnHalleyGotway restored the feature_1259_es_prob_stats branch March 3, 2022 00:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants