New ModelComponent
class
#636
Replies: 9 comments 16 replies
-
Suggested implementationI think that the Making the model components class attributes is something I’m really not a fan of. I’ve had frustrating and confusing experiences with them in the past. In my experience is makes it hard to track which components a model has. I also don’t quite see yet what the downside is on making them instance attributes. Regardless of whether we go with class attributes, I think we should still come up with a good way of keeping track of all the different components. That way we can implement something like I am in favour of moving away from the mixin architecture in favour of this. This is more in line with “Composition over inheritance”, which is a well known best practice in the python community. As a small benefit, I also suspect LSPs will have an easier time dealing with the proposed pattern than with the mixins. (and I’m a big fan of LSPs). I’m not clear yet on how this proposal solves the mentioned issue, but if that is something we can pick up along the way, all the better, of course. I’m not super familiar with the Model._run_log_method yet, but I think it’s better to go with the following syntax than the one suggested above: grid:
create:
arg: value Although more verbose, this allows for easier input checking. Introducing extra syntax like ChallengesI am really not a fan of making the model components relocatable outside of a model. Conceptually, this makes little sense to me. I think that it’s going to create a lot of extra maintenance burden, especially if we have to keep things up to date as you mention. HydroMT is already a very complicated beast to work on and I would rather not add to that. In fact, to make sure that we make HydroMT as stable as possible for V1, I think we ought to move towards providing less external surface area, not more. For the potential maintenance burden this would add, I’d really like to see a more concrete benefit to the user to convince me it is worth it. As an example of the maintenance burden, it would add: you already mention two, the root and keeping the model updated. If the user can simply change things without going through the parent class, then that adds a lot of difficulty in making sure that things stay in sync. Likewise, with the root, if we know we are making the model components in our factories, we can provide the coupling there. However, if the components get instantiated, passed around and used independently, we lose this control. They also now need to have all this information stored locally and we create a distributed consensus problem for ourselves. I’m trying to be as constructive as possible here, but I really don’t like this part of the idea. I don't think we should prioritise backwards compatibility here. I really think of this feature as a V1 feature and so since we’ll be breaking backwards compatibility intentionally. Therefore, I don’t think backwards compatibility has to be the major consideration here. I don’t think putting the pressure on ourselves to maintain a working version on main while implementing big changes like these is a good idea. Therefore, I propose an alternate workflow: making a release candidate (RC for short). We make a v1-rc branch, and start developing the incompatible changes there. We can incorporate developments in main at our own pace to make sure we don’t run into trouble there. Plugins can then test and develop against this release candidate and give us feedback. When everything is ready, we press the release button, and promote the release candidate to a full stable release, and the plugins can do the same. This way, we maintain actual functionality while we’re making our big breaking changes. I don’t mind the idea of a 2 day sprint as mentioned a priori. However, if that is going to be our modus operandi, then I see a lot of these sprints in the semi-near future. Given our schedules, I’m not sure that’s a very sustainable solution. |
Beta Was this translation helpful? Give feedback.
-
Thanks @savente93 for your response!
I agree, the model components should be instance attributes. I suggested to have a list with all component names as a class attribute to keep track of the different components. This could also be an instance attribute and should ideally not be editable.
Regarding the
I agree with this approach. The challenge to me how keep a shared root, and DataCatalog instance synced between the
I like this approach. We should however continue testing and co-developing with the plugin teams to make sure our v1 solutions work for them. So we need them to somehow be able to keep up with us to avoid the implementation gap to v1 becomes a huge burden. Something to discuss with @alimeshgi for the next update with the plugins. |
Beta Was this translation helpful? Give feedback.
-
Took me a while to fully understand but in principle I like the idea :)
I just browsed through So even if when initializing, what we get is a view and not a copy (so if indeed they are synced), it may mean that the more functions we add in The other way is now the So if we don't find a way to share the |
Beta Was this translation helpful? Give feedback.
-
Thanks for your reply @hboisgon!
I agree with logger, but region and crs could also just be
A method where you need to update different components should and can still be defined at the Model level. Within that method you can than do exampleBased on the quick test below I think the syncing itself is actually no issue. #%% test ModelComponent class
from hydromt import DataCatalog
from pathlib import Path
class ModelComponent:
def __init__(self, data_catalog, model_root):
self.data_catalog = data_catalog
self.model_root = model_root
self._data = None
def read():
pass
def write():
pass
class ModelRoot:
def __init__(self, path, mode='r'):
self.set(path, mode='r')
def set(self, path, mode='r'):
self.path = Path(path)
self.mode = mode
class Model:
components = {'grid': ModelComponent, 'forcing': ModelComponent}
def __init__(self, root, mode='r', data_libs=[]):
self.model_root = ModelRoot(root, mode)
self.data_catalog = DataCatalog(data_libs)
for name, component in self.components.items():
setattr(self, name, component(self.data_catalog, self.model_root))
def read(self):
for name in self.components:
getattr(self, name).read()
def write(self):
for name in self.components:
getattr(self, name).write() #%% test
mod = Model('test', mode='r')
print(len(mod.data_catalog._sources))
#>>> 0
print(mod.model_root.path, mod.model_root.mode)
#>>> test r
# update mod.model_root, check if ModelComponent is updated
mod.model_root.set('test2', mode='w')
assert mod.grid.model_root == mod.model_root == mod.forcing.model_root
print(mod.grid.model_root.path, mod.grid.model_root.mode)
#>>> test2 w
# update mod.data_catalog, check if mod.grid.data_catalog is updated
mod.data_catalog.from_predefined_catalogs('deltares_data')
assert mod.grid.data_catalog == mod.data_catalog == mod.forcing.data_catalog
print(len(mod.grid.data_catalog._sources))
#>>> 136
# update mod.grid.model_root
mod.grid.model_root.set('test3', mode='w+')
assert mod.grid.model_root == mod.model_root == mod.forcing.model_root
print(mod.model_root.path, mod.model_root.mode)
#>>> test3 w+ |
Beta Was this translation helpful? Give feedback.
-
Nice that you found a way to sync properties @DirkEilander ! I think you are right that crs can easily be a But indeed maybe after data_catalog, root, logger and region, we may not need so much anymore. |
Beta Was this translation helpful? Give feedback.
-
I'm still thinking this through, but just a question. How much do we expect people to subclass these models? Because if we go with this design, the regular inheritance will not function as expected, and we'll need to make extra considerations for that. Mind you, I'm talking about sub-classing the |
Beta Was this translation helpful? Give feedback.
-
The implementation example below creates a With this structure we avoid that we would break a This would solve potential issues raised by @hboisgon. #%% test ModelComponent class
import weakref
from pathlib import Path
class ModelComponent:
def __init__(self, model):
self._model_ref = weakref.ref(model)
self._data = None
@property
def model(self):
# Access the Model instance through the weak reference
return self._model_ref()
@property
def data(self):
return self._data
def set(self, value):
self._data = value
class ModelRoot:
def __init__(self, path, mode='r'):
self.set(path, mode)
def set(self, path, mode='r'):
self.path = Path(path)
self.mode = mode
def __repr__(self):
return f"ModelRoot(path={self.path}, mode={self.mode})"
class Model:
components = {'grid': ModelComponent, 'forcing': ModelComponent}
def __init__(self, root, mode='r', data_libs=[]):
self.root = ModelRoot(root, mode)
# initialize components with linked model instance
for name, component in self.components.items():
setattr(self, name, component(model=self))
def __repr__(self):
return f"Model(root={self.root})"
# access model property from component
mod = Model('test', mode='r')
mod.root.set('test2', mode='w')
assert mod.root == mod.grid.model.root
print(mod.grid.model.root)
# >>> ModelRoot(path=test2, mode=w)
## acces one component from another
mod.grid.model.forcing.set('test')
print('forcing data:', mod.forcing.data)
# >>> forcing data: test
# infinite instance.attribute.instance.attribute stack .. ?
print(mod.grid.model.forcing.model.grid.model.forcing.model)
# >>> Model(root=ModelRoot(path=test2, mode=w)) |
Beta Was this translation helpful? Give feedback.
-
We have come to more consensus on the topic during a meeting with the developers: @Jaapel @DirkEilander @Tjalling-dejong @savente93 . We want to continue with this refactor, because we see the following advantages:
Although the problem is actually a dependency graph and could benefit in performance by parallel processes, we decide for this first iteration to not look into that resolve step yet. We propose the structure below where we made the following considerations:
from abc import ABC, abstractmethod
from typing import Type, TypeVar, cast
import weakref
T = TypeVar("T", bound=ModelComponent)
class Model():
def __init__(self):
self._components: dict[str, ModelComponent] = {}
# We could add the ModelRegionComponent by default to every model
self.add_component('region', ModelRegionComponent())
def add_component(self, name, component) -> None:
self._components[name] = component
def create(self) -> None:
# We will need to ensure that the order of the components is important for execution
for c in self._components.values():
c.create()
# Python 3.10 would support def get_component[T](self, name: str) -> T.
def get_component(self, name: str, _: Type[T]) -> T:
return cast(T, self._components[name])
@property
def region(self) -> ModelRegionComponent:
return self.get_component('region', ModelRegionComponent)
# Automatically try to resolve the component by name.
# Making it possible to use any component as a property.
# Type hinting will most likely not work when you use it this way.
# Fine for beginner users, not recommended within our own code base.
# @property tells mypy that any accessors are allowed to return a ModelComponent if it exists
@property
def __getattr__(self, name) -> ModelComponent:
return self._components[name]
class ModelComponent(ABC):
def __init__(self, model: Model):
self._model_ref = weakref.ref(model)
@abstractmethod
def create(self) -> None:
...
class ModelRegionComponent(ModelComponent):
_region: hydromt.GeoDataFrame
def __init__(self, model: Model):
super().__init__(model)
def create(self) -> None:
# Something to create the region.
self._region = hydromt.read_file("path/to/region.shp")
@property
def region(self) -> hydromt.GeoDataFrame:
return self._region
class GridComponent(ModelComponent):
_grid: hydromt.RasterDataset
def __init__(self, model: Model):
super().__init__(model)
def create(self):
self._grid = hydromt.RasterDataset()
def add_data(self, data):
# Make a nice implementation to add data to the grid.
self._grid.set(data)
@property
def grid(self) -> hydromt.RasterDataset:
return self._grid
# HydroMT-core users could make an empty model and add components to it
mod = Model()
grid = GridComponent(mod)
mod.add_component('grid', grid)
grid.create()
# Because of the __get_attr()__ implementation, users can also dynamically get the grid:
mod = Model()
mod.add_component('grid', GridComponent(mod))
mod.grid.create() # this will work, but mypy doesn't help the user
# HydroMT plugin developers could inherit from the Model class and add components to it.
# Then also add extra properties for known components.
# This is useful when performing static type checking by users of the plugin.
class WflowModel(Model):
def __init__(self, ):
self.add_component("static_maps", GridComponent(self))
@property
def static_maps(self) -> GridComponent:
return self.get_component('static_maps', GridComponent) |
Beta Was this translation helpful? Give feedback.
-
The model components create, write, and set have component specific arguments. How will these arguments be passed in this implementation? :` def create(self) -> None:
# We will need to ensure that the order of the components is important for execution
for c in self._components.values():
c.create() Are the arguments coming from the config component? |
Beta Was this translation helpful? Give feedback.
-
Current implementation
Currently model components, such as
maps
,geoms
,grid
,mesh
, etc. are defined as properties of theModel
class or extensions such as theGridMixin
orMeshMixin
. All these properties have their own associatedread_
,write_
, andset_
methods and in some case some additional genericsetup_
methods. This is perhaps too restrictive and requires quite some duplication of code.It is restrictive in the sense that all models must use this terminology and squeeze the model data into these objects. In practice however, models may have multiple of the same components (e.g. SFINCS has a computational grid and subgrid) or model specific components are more meaningful to users (e.g. exposure and vulnerability model component classes in FIAT rather than geoms and tables).
The current approach also requires duplication of code as some components, e.g.
maps
,forcing
, andstates
are basically the same component (all dict of xarray objects) and share a lot of the code which is now partly circumvented by calling methods defined in the workflows.Suggested implementation
ModelComponent
template class with abstractread
,write
, andset
methods (and perhaps more) and a dynamicdata
property (not abstract) which reads the data when called for the first time (similar to e.g. currentGridModel.grid
property). Additonally, we should make subclasses forGrid
,Mesh
,Geoms
,Maps
,Table
, etc. The genericsetup_*
methods should also be implemented here.model_components
that can be used in theModel.read
andModel.write
classes to loop over and read/write all components. This would defy the need for the GridMixin etc.ModelComponent
(sub)class to replace methods with their own methods and add more.GridModel
etc classes, but this might be less needed if we write a "model factory" method to create Model classes on the fly with custom components.GridModel.setup_grid_data_from_raster
it wouldGridModel.grid.add_data_from_raster
, etc.Model._run_log_method
Challenges
_read
,_write
), theroot
and thedata_catalog
are defined at the Model but should be available to theModelComponent
classes. Not sure if these can be accessed from theModelComponent
class which is basically an attribute ofModel
. Either way it would be useful if it can function also outside of theModel
class. Are there ways to do this except for passing these properties to all components on initialization? If this creates a copy rather than a view these must also be updated when changed.Model.grid
(this now returns axarray.Dataset
) but inModel.grid.data
. However with some tricks this could be "hidden" by forwarding methods and attributes to the data class to make theModel.grid
object appear like and have the same methods as theModel.grid.data
object. Furthermore methods likeModel.write_grid
will becomeModel.grid.write
. This could be covered by temporarily keep theModel.write_grid
method. However, I don't think this can be implemented with full backwards compatibility in a cheap and maintainable way. Rather I think we should implement this in consultation with all plugins and perhaps plan a 2 day sprint once ready to get all plugins to use the new implementation.Beta Was this translation helpful? Give feedback.
All reactions