Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow reading Hazard events that are not dates from xarray #837

Merged
merged 11 commits into from
Jan 16, 2024

Conversation

peanutfun
Copy link
Member

@peanutfun peanutfun commented Jan 15, 2024

Changes proposed in this PR:

  • Try interpreting values of the event coordinate as dates or ordinals for default values of Hazard.date. If that fails, issue a warning and set default values to zeros.
  • Try interpreting values of the event coordinate as dates for default values of Hazard.event_name. If that fails, issue a warning and set default values to empty strings.
  • Update tests.

The method still tries to read the values as dates but issues a warning if that does not work. The fallback is zeros in case of Hazard.date and empty strings for Hazard.event_name.

This extends #795, which actually introduced tighter restrictions for the 'event' coordinate. This causes several issues in my new flood module, see CLIMADA-project/climada_petals#64.

This fixes #829.

PR Author Checklist

PR Reviewer Checklist

* Set default value of `Hazard.event_name` to empty string.
* Try interpreting values of the event coordinate as dates or ordinals
  for default values of `Hazard.date`. If that fails, issue a warning
  and set default values to zeros.
* Update tests.
climada/hazard/base.py Outdated Show resolved Hide resolved
@peanutfun
Copy link
Member Author

Failing tests have nothing to do with this PR

Comment on lines 516 to 517
* ``date``: The ``event`` coordinate interpreted as date or ordinal, or
zeros if that fails (which will issue a warning).
Copy link
Member

@chahank chahank Jan 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about the value 0 since this is then not a valid ordinal (must be larger equal to 1). Thus, one would get a non-valid event_date which then creates easily problems down the line when a method expects an ordinal. This then might lead to hard-to-debug/understand error messages for users. Any other idea?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct. I actually messed up the test. It is fixed now. I also did not realize that 0 does not work as ordinal. I now chose 1 as default value, the ordinal of date 01.01.0001.

@@ -553,13 +555,13 @@ def from_xarray_raster(
and Examples) before loading the Dataset as Hazard.
* Single-valued data for variables ``frequency``. ``event_name``, and
``event_date`` will be broadcast to every event.
* The ``event`` dimension need not be a time or date.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand what this is meaning here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, could be clearer. This way, maybe?

Suggested change
* The ``event`` dimension need not be a time or date.
* The values of the ``event`` coordinate need not be times or dates.

* To avoid confusion in the call signature, several parameters are keyword-only
arguments.
* The attributes ``Hazard.haz_type`` and ``Hazard.unit`` currently cannot be
read from the Dataset. Use the method parameters to set these attributes.
* This method does not read coordinate system metadata. Use the ``crs`` parameter
to set a custom coordinate system identifier.
* This method **does not** read lazily. Single data arrays must fit into memory.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it load lazily now? That would be great.

Copy link
Member Author

@peanutfun peanutfun Jan 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has been doing so for quite a while, I just missed deleting this line 🙃

Comment on lines +522 to +523
* ``event_name``: String representation of the event date or empty strings
if that fails (which will issue a warning).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be sure I understand: if the data contains a field 'name', it will be ignored by default?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This list only specifies the default values. These are chosen if the dataset does not contain a field event_name OR the user chose to ignore it via read_xarray_raster(..., data_vars=dict(event_name=""))

# Integers
time = np.arange(size)
dataset["time"] = time
self.time = ["1970-01-01", "1970-01-02"] # These will be 0, 1 as ordinals
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ordinals must be larger equal to 1. Is the comment thus incorrect?

Copy link
Member Author

@peanutfun peanutfun Jan 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, the test was wrong. You are correct.

@chahank
Copy link
Member

chahank commented Jan 15, 2024

Great, good improved fix. A few comments mostly for clarity and one main question: is it smart to use a non-valid ordinal as the default date?

@@ -553,13 +555,13 @@ def from_xarray_raster(
and Examples) before loading the Dataset as Hazard.
* Single-valued data for variables ``frequency``. ``event_name``, and
``event_date`` will be broadcast to every event.
* The ``event`` dimension need not be a time or date.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* The ``event`` dimension need not be a time or date.
* The values of the ``event`` coordinates can be of any type (times, dates, strings, ...)

@chahank
Copy link
Member

chahank commented Jan 16, 2024

Suggestion for the intro docstring wording:

This method reads data that can be interpreted using three coordinates: event,
        latitude, and longitude. The three coordinates to be read can be
        specified via the ``coordinate_vars`` parameter. The data and the coordinates themselves may be organized
        in arbitrary dimensions in the Dataset (e.g. four dimensions 'year', 'month',
        'day' and 'altitude' for the coordinate 'event').  See Notes and Examples if you
        want to load single-event data that does not contain an event dimension.

@chahank
Copy link
Member

chahank commented Jan 16, 2024

Excellent work! Ready to merge upon consideration of the suggested docstring changes.

@peanutfun
Copy link
Member Author

@chahank Please have another look at eeda6b3 and merge if you concur.

climada/hazard/base.py Outdated Show resolved Hide resolved
@chahank chahank merged commit 7e9b4f8 into develop Jan 16, 2024
0 of 6 checks passed
@chahank chahank deleted the feature/hazard-xarray-read-no-time-event branch January 16, 2024 11:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants