Skip to content

Conversation

@fluidnumerics-joe
Copy link
Contributor

  • Chose the correct base branch (main for v3 changes, v4-dev for v4 changes)
  • Added tests

This pull request provides a minimal implementation of the ParticleSet for execution as discussed in #2034 .

Highlight of changes

  • repeatdt logic is removed
  • Dynamic instantiation of the _pclass type is removed for now; this currently means that particle.ei (encoded indices) are not stored and available for reuse. There's some discussion to be had around how this is incorporated cleanly and this will ultimately be brought in through a future PR.
  • Particle-Particle interaction is removed in this version. My understanding that that interacting particles will be brought in once the issues around non-interacting particles settle down.
  • Time handling is done through datetime and timedelta objects
  • A new test is added for pset.execute with the unstructured stommel gyre generic dataset.
  • Notebook tutorial for using Parcels with unstructured Stommel gyre is updated

Copy link
Contributor

@VeckoTheGecko VeckoTheGecko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a quick first pass - can do a more in depth review tomorrow

Comment on lines 769 to 772
if isinstance(endtime, datetime):
raise NotImplementedError(
"If fieldset.time_interval is None, endtime must be a timedelta not a datetime"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this block needs to be removed (overloading of runtime param - which we discussed wouldn't work for analytical model output).

For context, a line of thought that was being explored was that endtime could either be the datetime type in the model output or a timedelta (to signify a runtime).

This wouldn't work, however, for analytical model output which would have model output in timedelta. Hence using endtime would be ambigious. Hence we opted for going with runtime still to delineate these two modes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the field_timeinterval is None and the endtime is a datetime (line 768 immediately above this block in which this check lies), I don't see how we could envision calculating the duration of the simulation as there is no starting datetime from the fieldset to reference.

In any case, I'm reading that the suggested change here is to ensure that endtime has the same type as the fieldset. I'll go ahead an put in the bit for the runtime being a timedelta, otherwise, can't support the existing tests that are passing (stommel gyre has no time interval)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, I'm reading that the suggested change here is to ensure that endtime has the same type as the fieldset

This ends up being quite straightforward - all we need to do here is:

runtime = endtime - time_interval.left

That way:

  • if time_interval is None, this would error out as expected
  • as long as endtime and the time_interval are compatible types (arithmetic is defined on it) then it will produce a timedelta object. This futureproofs us as well, and means that we just trust the packages we rely on.

No further checks need to be done - except some attention to explainable error messages. The following should be sufficient.

# I'll need to add the TimeLike alias
def get_runtime(endtime: TimeLike, time_interval: None | TimeInterval) -> timedelta:
    if time_interval is None:
        raise ValueError(f"FieldSet does not have a time interval. Can't calculate runtime from provided {endtime=!r}")
    return endtime - time_interval.left

# in pset.execute...
runtime = get_runtime(endtime, fieldset.time_interval)

Let me know if you want to wrap this into this PR - or if it should be done in a different one. I have no preference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it's easier to use runtime or endtime as the single source of truth in the simulation. Whatever is chosen, endtime = get_endtime(runtime, fieldset.time_interval) would be pretty similar logic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, what I've done is to allow for either runtime or endtime to be sent in. The runtime is assumed to be a timedelta. endtime data type is either timedelta or datetime but must match the time datatype from the fieldset (as deduced from the time_interval.left data type. There's a few cases that are handled.

  • If runtime and endtime are both provided an error is thrown
  • If runtime is provided and endtime is not provided, the start_time (local variable used for controlling the forward stepping loop) is set to timedelta(seconds=0) and the end_time (local variable used for determining the last time value in the forward stepping loop) is set to runtime
  • If runtime is not provided and endtime is provided, the endtime is checked to match the type of the time_interval.left. The start_time is set to time_interval.left and the end_time is set to the minimum of endtime or time_interval.right
  • If runtime is not provided, endtime is provided, and time_interval is None then an error is thrown.

The way I've implemented it surely is not the cleanest and tidiest, but it works. I'd say further cleanup should come in a following PR.

time = np.array([np.datetime64(t) for t in time])
if time.size > 0 and isinstance(time[0], np.timedelta64) and not self.time_origin:
raise NotImplementedError("If fieldset.time_origin is not a date, time of a particle must be a double")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message in the line above is not true anymore, right? Particle time cannot be a double?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I'll get this on my next round of edits

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in a55c852

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, now it doesn't support numpy.datetime64 or numpy.timedelta64 anymore. I fixed it in 68293ab

Copy link
Contributor

@VeckoTheGecko VeckoTheGecko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I have looked through in full now.

@fluidnumerics-joe
Copy link
Contributor Author

Here's what I've gathered from the comments above

  • The endtime input must match the same type as the time dimension in the fields of the underlying fieldset. From what I can tell, the most straightforward way to get the type of the time dimension is to get the type of one of the time_interval endpoints.
  • We need to bring back runtime, which is required to be a timedelta object.
  • We need to enforce either endtime or runtime being set. I would argue that if there is a valid time_interval, we could default to using the valid time_timerval
  • All time handling should remain as datetime or timedelta objects; in either case, it should be consistent with the time dimension type coming from the underlying dataset.

Some open questions

  • What are the data types for the time dimension for idealized simulations that don't leverage a calendar system ? Do they actually come in as timedelta or something else? I guess I'll find out later today with this MITgcm example I'll send across in Adding real-world circulation models datasets #2053

This test is marked with xfail since the changes required to support
particlefile will require a significant overhaul of the particlefile
module which is out of scope for this PR
@fluidnumerics-joe
Copy link
Contributor Author

I've added a test for using the outputfile, but this is currently failing with


tests/v4/test_particleset.py:87: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
parcels/particleset.py:648: in ParticleFile
    return ParticleFile(*args, particleset=self, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
parcels/particlefile.py:57: in __init__
    self._parcels_mesh = self.particleset.fieldset.gridset.grids[0].mesh
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <parcels.fieldset.FieldSet object at 0x7f4de06ffda0>, name = 'gridset'

    def __getattr__(self, name):
        """Get the field by name. If the field is not found, check if it's a constant."""
        if name in self.fields:
            return self.fields[name]
        elif name in self.constants:
            return self.constants[name]
        else:
>           raise AttributeError(f"FieldSet has no attribute '{name}'")
E           AttributeError: FieldSet has no attribute 'gridset'

parcels/fieldset.py:68: AttributeError
=========================================================================================================================== short test summary info ============================================================================================================================
FAILED tests/v4/test_particleset.py::test_uxstommelgyre_pset_execute_output - AttributeError: FieldSet has no attribute 'gridset'

There are some larger changes required here that I feel are getting out of scope for this PR.

@fluidnumerics-joe
Copy link
Contributor Author

I believe I've addressed a good deal of the issues that were raised. To help keep this PR moving, if it looks like I've resolved your issue, please resolve the comment. Otherwise, I will assume it is not resolved and work on it in my next round of edits.

@fluidnumerics-joe
Copy link
Contributor Author

fluidnumerics-joe commented Jul 1, 2025

@VeckoTheGecko - you may find this interesting... when MITgcm runs without a calendar, the time index refers to the iteration count in the model, which is stored as an int32 in the xarray object. See #2053 (comment)

Note that the file this comes from is produced using MITgcm's built in NetCDF IO package; this is different than what is described in XGCM documention (e.g. here ), where the NetCDF file referenced is created by converting MITgcm's native MDS format to NetCDF using xmitgcm . It appears we may need to coerce time to be a dimension and coordinate with knowledge of the parent simulation timestep, if reading NetCDF output straight from MITgcm.

@erikvansebille
Copy link
Member

This PR is good to be merged, as far as I'm concerned

@VeckoTheGecko
Copy link
Contributor

It appears we may need to coerce time to be a dimension and coordinate with knowledge of the parent simulation timestep, if reading NetCDF output straight from MITgcm.

I think we should just not support int time dimensions. If a user wants to use MITgcm runs without a calendar, they have to update the time dimension to timedelta or datetime/cftime on the xarray dataset level before passing to Parcels. We can have an explainable error message denoting the problem.

@VeckoTheGecko
Copy link
Contributor

I've added a test for using the outputfile, but this is currently failing with

...

There are some larger changes required here that I feel are getting out of scope for this PR.

Agreed - let's xfail it and it can be handled later. Good to introduce the test here though.

@VeckoTheGecko
Copy link
Contributor

To summarise:

After that I think we're good to merge.

@fluidnumerics-joe
Copy link
Contributor Author

I have a couple issues in the v4 tests that are failing and I'll take care of resolving these in the morning.

Copy link
Member

@erikvansebille erikvansebille left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While testing my own run on this branch, I found some small bugs that I've either fixed or(see comments) or propose simple changes to below

time = np.array([np.datetime64(t) for t in time])
if time.size > 0 and isinstance(time[0], np.timedelta64) and not self.time_origin:
raise NotImplementedError("If fieldset.time_origin is not a date, time of a particle must be a double")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, now it doesn't support numpy.datetime64 or numpy.timedelta64 anymore. I fixed it in 68293ab

@fluidnumerics-joe
Copy link
Contributor Author

@VeckoTheGecko
Copy link
Contributor

Well, this is an odd thing to error out on : https://github.com/OceanParcels/Parcels/actions/runs/16056110569/job/45310801757?pr=2060#step:4:779

data looks to be a numpy array and not an xarray dataarray - hence doesn't have a .values attribute - could that be it?

@fluidnumerics-joe
Copy link
Contributor Author

The field that is passed into the interpolator data attribute is a uxarray.dataarray. This hasn't changed at all in the generic dataset. an it's suddenly failing?

@fluidnumerics-joe
Copy link
Contributor Author

Also, removing the .values fails other tests, since (again). field.data is a DataArray

@VeckoTheGecko
Copy link
Contributor

VeckoTheGecko commented Jul 4, 2025

@fluidnumerics-joe the problem is here:

https://github.com/OceanParcels/Parcels/blob/d544d4c66d63287644efa83fe9fd6a16d63022d6/parcels/kernel.py#L320

This looks to have been introduced in 7e7391d to help with dask arrays (with fe26424 being introduced so that fieldset could be None to help with debugging and to fix #867 ). I think this block of code can be safely removed.

The kernel side of the code is something that we haven't touched yet.... perhaps its time we start looking at it somewhat

EDIT: Correction

@fluidnumerics-joe
Copy link
Contributor Author

Nice find. What's weird is this test was working just fine in 2de98a5

@fluidnumerics-joe
Copy link
Contributor Author

@VeckoTheGecko - good find. That resolved the issues on the v4 tests. Updating this branch and will re-request review

Copy link
Member

@erikvansebille erikvansebille left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to be merged so we can move forward!

@fluidnumerics-joe fluidnumerics-joe merged commit 7d2157c into v4-dev Jul 7, 2025
8 checks passed
@fluidnumerics-joe fluidnumerics-joe deleted the minimal-v4-particleset branch July 7, 2025 14:28
@github-project-automation github-project-automation bot moved this from Backlog to Done in Parcels development Jul 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants