Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CLI for ingesting tabular forecasts and observations into EMOS #16

Open
wants to merge 17 commits into
base: improver1538_tabular_ingestion_functions
Choose a base branch
from

Conversation

gavinevans
Copy link
Owner

@gavinevans gavinevans commented Oct 19, 2021

Closes: metoppv#1538

Dependent upon metoppv#1572, metoppv#1582.

Description
This PR builds upon work done in metoppv#1582 to add an estimate_emos_coefficients_from_table CLI that converts the pandas dataframe into an iris cube (metoppv#1582) and passes that iris cube to the existing EMOS functionality.

Note that this PR does contain an update to acceptance.py to support providing a directory as a known good output in the acceptance tests. This is required for compatibility with a partitioned parquet file.

Please note that the decision about whether to use fastparquet 0.5.0 or 0.7.1 is subject to change due to issues with the Conda environments (https://github.com/MetOffice/improver_suite/pull/1026). The first three commits in this PR assumed the use of 0.7.1. This commit reverts the version of fastparquet to 0.5.0.

Further information is available in this comment.

Testing:

  • Ran tests and they passed OK
  • Added new tests for the new feature(s)

@fionaRust
Copy link

I'm reviewing this

@gavinevans gavinevans force-pushed the improver1538_tabular_ingestion_functions branch from 8b134fa to ed31c06 Compare October 21, 2021 17:35
@codecov-commenter
Copy link

codecov-commenter commented Oct 22, 2021

Codecov Report

Merging #16 (9c98027) into improver1538_tabular_ingestion_functions (8b134fa) will increase coverage by 0.03%.
The diff coverage is 100.00%.

Impacted file tree graph

@@                             Coverage Diff                              @@
##           improver1538_tabular_ingestion_functions      #16      +/-   ##
============================================================================
+ Coverage                                     98.04%   98.08%   +0.03%     
============================================================================
  Files                                           109      110       +1     
  Lines                                          9872    10069     +197     
============================================================================
+ Hits                                           9679     9876     +197     
  Misses                                          193      193              
Impacted Files Coverage Δ
improver/ensemble_copula_coupling/constants.py 100.00% <ø> (ø)
improver/calibration/__init__.py 100.00% <100.00%> (ø)
improver/calibration/dataframe_utilities.py 100.00% <100.00%> (ø)
improver/calibration/ensemble_calibration.py 100.00% <100.00%> (ø)
improver/wxcode/weather_symbols.py 96.88% <100.00%> (+0.08%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f63415a...9c98027. Read the comment docs.

@gavinevans gavinevans force-pushed the improver1538_tabular_ingestion_cli4 branch from 5288575 to 23cb8c1 Compare October 22, 2021 07:47
Copy link

@bayliffe bayliffe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Goodness you've written a lot of code for all of this calibration work. Two cli exceptions need tests, otherwise this looks fine to me. I've not run the tests as I am waiting on very slow conda dependency solving.

f"The requested filepath {forecast} does not contain the "
f"requested contents: {filters}"
)
raise IOError(msg)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not tested.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test added,

f"The requested filepath {truth} does not contain the "
f"requested contents: {filters}"
)
raise IOError(msg)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not tested.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test added.

…s cube (metoppv#1582)

* Modifications to functions required to support converting tabular data into iris cubes.

* Modifications to the columns expected within the truth table.

* Modify environments in preparation for changes required for ingestion of forecast and observation tables.

* Edits to expect the period column to be a timedelta64 dtype.

* Corrections to __init__.py

* Modifications to use assertCubeEqual to ensure the full cubes are compared.

* Modifications following review comments.

* Minor updates to docstrings.

* Add missing unit tests.

* Correction to csv files.

* Sort lists.

* Minor docstring amendment.

* NOT WORKING: Working commit to try to avoid using the truth altitude, latitude and longitude at all and replace with those from the forecast.

* Extend docstrings.

* Extended documentation updates.

* Refinement and addition of tests for column name checking.

* Modifications to tidy up dataframe preparation.

* Minor extended documentation edits.

* Further minor docstring edit.

* Correct test class naming.

* Fix isort.
iterations may require increasing, as there will be more
coefficients to solve.
percentiles (List[float]):
The set of percentiles to be used for estimating EMOS coefficients.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The set of percentiles to be used for estimating EMOS coefficients.
The set of percentiles to be used for estimating EMOS coefficients.
These should be a set of equally spaced quantiles.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated this docstring.

@@ -340,6 +365,8 @@ def _prepare_dataframes(
DataFrame expected to contain the following columns: ob_value,
time, wmo_id, diagnostic, latitude, longitude and altitude.
Any other columns are ignored.
percentiles:
The set of percentiles to be used for estimating EMOS coefficients.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The set of percentiles to be used for estimating EMOS coefficients.
The set of percentiles to be used for estimating EMOS coefficients.
These should be a set of equally spaced quantiles.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated this docstring.

forecast_period,
training_length,
percentiles=percentiles,
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think as we discussed last week, some leadtime-cycle combinations won't be possible. This will result in no cubes being returned from this function. In this case we should just return and not produce output

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct that this is handling within the plugins by None being returned, although this is a bit trickier within the CLIs, as the CLIs are expecting to save a netCDF file, if a --output is provided. I've made a minor edit to skip the saving if the result is None.

@gavinevans gavinevans force-pushed the improver1538_tabular_ingestion_cli4 branch from 5ddf334 to a390238 Compare October 22, 2021 15:34
gavinevans and others added 5 commits October 22, 2021 18:42
… dependencies between the calibration and ECC code.
…rover1538_tabular_ingestion_cli4

* improver1538_move_tabular_ingestion_functions:
  Move dataframe to cube utilities to a separate file to avoid circular dependencies between the calibration and ECC code.
* Move dataframe to cube utilities to a separate file to avoid circular dependencies between the calibration and ECC code.

* Minor docstring updates.
…lar_ingestion_cli4

* upstream/master:
  Move dataframe to cube utilities (metoppv#1593)
  Added constant for ultraviolet_index_daytime_max. (metoppv#1590)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants