New `ModelComponent` class #636

DirkEilander · 2023-11-06T17:53:49Z

DirkEilander
Nov 6, 2023
Maintainer

Current implementation

Currently model components, such as maps, geoms, grid, mesh, etc. are defined as properties of the Model class or extensions such as the GridMixin or MeshMixin. All these properties have their own associated read_, write_, and set_ methods and in some case some additional generic setup_ methods. This is perhaps too restrictive and requires quite some duplication of code.

It is restrictive in the sense that all models must use this terminology and squeeze the model data into these objects. In practice however, models may have multiple of the same components (e.g. SFINCS has a computational grid and subgrid) or model specific components are more meaningful to users (e.g. exposure and vulnerability model component classes in FIAT rather than geoms and tables).

The current approach also requires duplication of code as some components, e.g. maps, forcing, and states are basically the same component (all dict of xarray objects) and share a lot of the code which is now partly circumvented by calling methods defined in the workflows.

Suggested implementation

Create common ModelComponent template class with abstract read, write, and set methods (and perhaps more) and a dynamic data property (not abstract) which reads the data when called for the first time (similar to e.g. current GridModel.grid property). Additonally, we should make subclasses for Grid, Mesh, Geoms, Maps, Table, etc. The generic setup_* methods should also be implemented here.
These model components will be implemented as instance attributes of subclasses of Model (e.g. in plugins) and should be registered in a separate class attribute model_components that can be used in the Model.read and Model.write classes to loop over and read/write all components. This would defy the need for the GridMixin etc.
Where needed plugins can subclass a ModelComponent (sub)class to replace methods with their own methods and add more.
We could still provide general GridModel etc classes, but this might be less needed if we write a "model factory" method to create Model classes on the fly with custom components.
This also potentially solves the naming discussion in Rename the generic setup method to distinguish better between the setup_schematisation and add data methods #475. Instead of GridModel.setup_grid_data_from_raster it would GridModel.grid.add_data_from_raster, etc.
We should be able to call these model component methods from the yaml hydromt configuration files, see e.g. below. This requires a small change in the Model._run_log_method

grid.create:
  arg: value

Challenges

Some model properties such as the mode (i.e. _read, _write), the root and the data_catalog are defined at the Model but should be available to the ModelComponent classes. Not sure if these can be accessed from the ModelComponent class which is basically an attribute of Model. Either way it would be useful if it can function also outside of the Model class. Are there ways to do this except for passing these properties to all components on initialization? If this creates a copy rather than a view these must also be updated when changed.
backwards compatibility. For a plugin user the main change will be that the data is not stored directly in e.g. Model.grid (this now returns a xarray.Dataset) but in Model.grid.data. However with some tricks this could be "hidden" by forwarding methods and attributes to the data class to make the Model.grid object appear like and have the same methods as the Model.grid.data object. Furthermore methods like Model.write_grid will become Model.grid.write. This could be covered by temporarily keep the Model.write_grid method. However, I don't think this can be implemented with full backwards compatibility in a cheap and maintainable way. Rather I think we should implement this in consultation with all plugins and perhaps plan a 2 day sprint once ready to get all plugins to use the new implementation.

savente93 · 2023-11-20T10:24:56Z

savente93
Nov 20, 2023
Maintainer

Suggested implementation

I think that the ModelComponent idea is a good idea, that way plugins can easily subclass it and add custom or multiple ones as is necessary. I’m not sure I understand why it should be abstract. It seems like there is generic functionality there that we could implement that subclasses could use. As a general principle, I’m not too big a fan of mixing abstract and non-abstract implementations. A lot of object-oriented languages like Java and C++ simply disallow it if I remember correctly. Though, I suppose that’s not a sticking point right now, I’m okay with deferring that until implementation time and seeing what works best. Implementing the subclasses you mention also makes sense.

Making the model components class attributes is something I’m really not a fan of. I’ve had frustrating and confusing experiences with them in the past. In my experience is makes it hard to track which components a model has. I also don’t quite see yet what the downside is on making them instance attributes.

Regardless of whether we go with class attributes, I think we should still come up with a good way of keeping track of all the different components. That way we can implement something like Model.write as mentioned in the proposal in a generic way. But I’m open to being corrected/convinced otherwise

I am in favour of moving away from the mixin architecture in favour of this. This is more in line with “Composition over inheritance”, which is a well known best practice in the python community. As a small benefit, I also suspect LSPs will have an easier time dealing with the proposed pattern than with the mixins. (and I’m a big fan of LSPs).

I’m not clear yet on how this proposal solves the mentioned issue, but if that is something we can pick up along the way, all the better, of course.

I’m not super familiar with the Model._run_log_method yet, but I think it’s better to go with the following syntax than the one suggested above:

grid:
  create:
    arg: value

Although more verbose, this allows for easier input checking. Introducing extra syntax like grid.create makes this much harder for us because then we have to define and parse another special syntax. I think it’s better to stick with vanilla yaml.

Challenges

I am really not a fan of making the model components relocatable outside of a model. Conceptually, this makes little sense to me. I think that it’s going to create a lot of extra maintenance burden, especially if we have to keep things up to date as you mention. HydroMT is already a very complicated beast to work on and I would rather not add to that. In fact, to make sure that we make HydroMT as stable as possible for V1, I think we ought to move towards providing less external surface area, not more. For the potential maintenance burden this would add, I’d really like to see a more concrete benefit to the user to convince me it is worth it.

As an example of the maintenance burden, it would add: you already mention two, the root and keeping the model updated. If the user can simply change things without going through the parent class, then that adds a lot of difficulty in making sure that things stay in sync. Likewise, with the root, if we know we are making the model components in our factories, we can provide the coupling there. However, if the components get instantiated, passed around and used independently, we lose this control. They also now need to have all this information stored locally and we create a distributed consensus problem for ourselves. I’m trying to be as constructive as possible here, but I really don’t like this part of the idea.

I don't think we should prioritise backwards compatibility here. I really think of this feature as a V1 feature and so since we’ll be breaking backwards compatibility intentionally. Therefore, I don’t think backwards compatibility has to be the major consideration here.

I don’t think putting the pressure on ourselves to maintain a working version on main while implementing big changes like these is a good idea. Therefore, I propose an alternate workflow: making a release candidate (RC for short). We make a v1-rc branch, and start developing the incompatible changes there. We can incorporate developments in main at our own pace to make sure we don’t run into trouble there. Plugins can then test and develop against this release candidate and give us feedback. When everything is ready, we press the release button, and promote the release candidate to a full stable release, and the plugins can do the same. This way, we maintain actual functionality while we’re making our big breaking changes. I don’t mind the idea of a 2 day sprint as mentioned a priori. However, if that is going to be our modus operandi, then I see a lot of these sprints in the semi-near future. Given our schedules, I’m not sure that’s a very sustainable solution.

0 replies

DirkEilander · 2023-11-20T17:26:24Z

DirkEilander
Nov 20, 2023
Maintainer Author

Thanks @savente93 for your response!

Making the model components class attributes is something I’m really not a fan of. I’ve had frustrating and confusing experiences with them in the past. In my experience is makes it hard to track which components a model has. I also don’t quite see yet what the downside is on making them instance attributes.
Regardless of whether we go with class attributes, I think we should still come up with a good way of keeping track of all the different components. That way we can implement something like Model.write as mentioned in the proposal in a generic way. But I’m open to being corrected/convinced otherwise

I agree, the model components should be instance attributes. I suggested to have a list with all component names as a class attribute to keep track of the different components. This could also be an instance attribute and should ideally not be editable.

I’m not super familiar with the Model._run_log_method yet, but I think it’s better to go with the following syntax than the one suggested above:

grid:
  create:
    arg: value

Regarding the Model._run_log_method method and the HydroMT config file. I suggested the syntax below because in the current format each first level key indicates a method and secondary-level keys and argument the method's arguments. I think for users it will be much better to keep this consistent between methods of Model and ModelComponent. Also the order of the methods is important as it is used to execute the methods in that specific order. With the the nested config all grid methods need to be executed in sequence because there can only be one grid key at the first level. However, in practice a user may want to do a grid.create, then vector.setup_x, followed by grid.setup_y and grid.write.

grid.create:
    arg: value

Likewise, with the root, if we know we are making the model components in our factories, we can provide the coupling there. However, if the components get instantiated, passed around and used independently, we lose this control.

I agree with this approach. The challenge to me how keep a shared root, and DataCatalog instance synced between the Model class and ModelComponent classes. If the DataCatalog is instanciated first and passed to each ModelComponent when initialized, will these all point the the same instance and therefore be in sync?

Therefore, I propose an alternate workflow: making a release candidate (RC for short). We make a v1-rc branch, and start developing the incompatible changes there.

I like this approach. We should however continue testing and co-developing with the plugin teams to make sure our v1 solutions work for them. So we need them to somehow be able to keep up with us to avoid the implementation gap to v1 becomes a huge burden. Something to discuss with @alimeshgi for the next update with the plugins.

0 replies

hboisgon · 2023-11-21T02:15:44Z

hboisgon
Nov 21, 2023
Maintainer

Took me a while to fully understand but in principle I like the idea :)
Looking at the biggest challenge we have, I am not fully sure if it feasible tough...

Some model properties such as the mode (i.e. _read, _write), the root and the data_catalog are defined at the Model but should be available to the ModelComponent classes. Not sure if these can be accessed from the ModelComponent class which is basically an attribute of Model. Either way it would be useful if it can function also outside of the Model class. Are there ways to do this except for passing these properties to all components on initialization? If this creates a copy rather than a view these must also be updated when changed.

I just browsed through grid.py implementation of GridModel and grid and the number of properties shared between GridModel and GridComponent would really be a lot if all grid methods move to the Component level. You mention already data_catalog, root, but also region, crs, logger, bounds etc should be provided and we don't have so many functions yet for grid. Also for methods like create_grid, we also use it to set sometimes the region geoms for example which risks to never be instantiated and written out if taken out of the create_grid method.

So even if when initializing, what we get is a view and not a copy (so if indeed they are synced), it may mean that the more functions we add in GridComponent, the more properties from GridModel we need to pass to initialize it.

The other way is now the add_data_from_raster which is in workflows.grid can be moved to GridComponent instead but this means we still need a GridModel.add_grid_data_from_raster method to pass the necessary arguments. And still the same for read_grid and write_grid as model root and read/write mode is needed to be passed.

So if we don't find a way to share the Model properties that the Component need in a nice way, I am not sure if this change is worth it compared to what we have now (would not even get rid of the Mixin I think), apart maybe that creating new components would be a little clearer than the current way.
I'll keep thinking because I really like the idea though!

0 replies

DirkEilander · 2023-11-21T14:34:58Z

DirkEilander
Nov 21, 2023
Maintainer Author

Thanks for your reply @hboisgon!

I just browsed through grid.py implementation of GridModel and grid and the number of properties shared between GridModel and GridComponent would really be a lot if all grid methods move to the Component level. You mention already data_catalog, root, but also region, crs, logger, bounds etc should be provided and we don't have so many functions yet for grid.

I agree with logger, but region and crs could also just be ModelComponent attributes right? If you add a layer to your grid you probably want to retrieve data just for the extent of the grid layer and perhaps for some models the crs of different components can vary. Either way, it would be required to define a list of shared attributes between the Model and ModelComponent classes.

Also for methods like create_grid, we also use it to set sometimes the region geoms for example which risks to never be instantiated and written out if taken out of the create_grid method.

A method where you need to update different components should and can still be defined at the Model level. Within that method you can than do self.grid.set(da) and self.geoms.set(gdf).

example

Based on the quick test below I think the syncing itself is actually no issue.
This example also shows how we could keep track of model components @savente93.

#%% test ModelComponent class 
from hydromt import DataCatalog
from pathlib import Path

class ModelComponent:
    def __init__(self, data_catalog, model_root):
        self.data_catalog = data_catalog
        self.model_root = model_root
        self._data = None

    def read():
        pass

    def write():
        pass

class ModelRoot:
    def __init__(self, path, mode='r'):
        self.set(path, mode='r')

    def set(self, path, mode='r'):
        self.path = Path(path)
        self.mode = mode

class Model:
    components = {'grid': ModelComponent, 'forcing': ModelComponent}

    def __init__(self, root, mode='r', data_libs=[]):
        self.model_root = ModelRoot(root, mode)
        self.data_catalog = DataCatalog(data_libs)

        for name, component in self.components.items():
            setattr(self, name, component(self.data_catalog, self.model_root))

    def read(self):
        for name in self.components:
            getattr(self, name).read()

    def write(self):
        for name in self.components:
            getattr(self, name).write()

 #%% test
mod = Model('test', mode='r')
print(len(mod.data_catalog._sources))
#>>> 0
print(mod.model_root.path, mod.model_root.mode)
#>>> test r

# update mod.model_root, check if ModelComponent is updated
mod.model_root.set('test2', mode='w')
assert mod.grid.model_root == mod.model_root == mod.forcing.model_root
print(mod.grid.model_root.path, mod.grid.model_root.mode)
#>>> test2 w

# update mod.data_catalog, check if mod.grid.data_catalog is updated
mod.data_catalog.from_predefined_catalogs('deltares_data')
assert mod.grid.data_catalog == mod.data_catalog == mod.forcing.data_catalog
print(len(mod.grid.data_catalog._sources))
#>>> 136

# update mod.grid.model_root
mod.grid.model_root.set('test3', mode='w+')
assert mod.grid.model_root == mod.model_root == mod.forcing.model_root
print(mod.model_root.path, mod.model_root.mode)
#>>> test3 w+

0 replies

hboisgon · 2023-11-22T07:09:12Z

hboisgon
Nov 22, 2023
Maintainer

Nice that you found a way to sync properties @DirkEilander !

I think you are right that crs can easily be a Component properties (still is an argument of the create function like grid.create), however for region I think this is still a property of Model. The example I have in mind is geoms where you can have lakes and reservoirs which have then different boundaries than your Model. I guess also for forcing where you can store geodataset of points versus gridded data as well, it would be useful to save the model region.
Also actually thinking about it, the first time you'll prepare forcing, it will be empty so you'll still need to know something about the destination CRS.. but if we save model region then we have the CRS too.

But indeed maybe after data_catalog, root, logger and region, we may not need so much anymore.

2 replies

DirkEilander Nov 22, 2023
Maintainer Author

I see the point that a region is needed when initializing a new component. In that case the region should not only be a dynamic property of Model which is retrieved on the fly from geoms (which is the case now), but a Model instance attribute. Perhaps even defined within a ModelRegion class?

DirkEilander Nov 22, 2023
Maintainer Author

Note that it's not necessarily my idea that all setup methods are defined at the component level. Some can also be defined at the Model level if the method requires to get or set data from multiple components.

savente93 · 2023-11-22T10:41:09Z

savente93
Nov 22, 2023
Maintainer

I'm still thinking this through, but just a question. How much do we expect people to subclass these models? Because if we go with this design, the regular inheritance will not function as expected, and we'll need to make extra considerations for that. Mind you, I'm talking about sub-classing the Model class, not the ModelComponent sub-classing the ModelComponent is no problem. However, with this design, it seems like we won't need much inheritance and people can just add more components to get to the model they want. If that is the case, then this is not such an issue, but I wanted to mention it. What do you two think?

2 replies

DirkEilander Nov 22, 2023
Maintainer Author

I think most plugins will still need to subclass Model to add their own methods and components. As mentioned in reply above not all methods will be defined at the component class, for instance if the method requires to get or set data from multiple components. But indeed compared to the current design more users will be able to use the Model class directly.

the regular inheritance will not function as expected

Just to understand, why is this the case? Of course a plugin would have to also initialize the parent Model class when initializing the plugin (see below), but that's also the case in the current design. Is there something else to it which I'm missing? (could well be the case).

class SfincsModel(Model):
    components = {'grid': ModelComponent, 'subgrid': ModelComponent}

    def __init__(self, root, mode='r', data_libs=[]):
        super().__init__(
            root=root,
            mode=mode,
            data_libs=data_libs,
        )

savente93 Nov 27, 2023
Maintainer

Well, it's a bit of a gotcha, I'm not saying that it's necessarily wrong but that it becomes very confusing very quickly (to me at least) . By necessity, the example will be a little confusing but consider the following script:

class A:
    components = {"grid": "grid_a", "subgrid": "subgrid_a"}

    def __init__(self) -> None:
        self.components["mesh"] = "mesh_a"


class B(A):
    components = {"grid": "grid_b"}

    def __init__(self) -> None:
        super().__init__()
        self.components["network"] = "network_b"


a = A()
print(f"instance a.components: {a.components}")
print("adding region component to a")
a.components["region"] = "region_a"

print(f"instance a.components: {a.components}")
print(f"class A.components:    {A.components}")
print()

b1 = B()
print(f"instance b1.components: {b1.components}")
print(f"class B.components:     {B.components}")

b2 = B()
b2.components["region"] = "region_b2"
print("adding region component to b2")
print(f"instance b2.components: {b2.components}")
print(f"instance b1.components: {b1.components}")
print(f"class B.components:     {B.components}")
print()


class C(B):
    components = {"grid": "grid_c"}

    def __init__(self) -> None:
        super().__init__()
        self.components["mesh"] = "mesh_c"


c = C()
print(f"instance c.components: {c.components}")
print(f"class C.components:    {C.components}")
print()

This should give the following output:

instance a.components: {'grid': 'grid_a', 'subgrid': 'subgrid_a', 'mesh': 'mesh_a'}
adding region component to a
instance a.components: {'grid': 'grid_a', 'subgrid': 'subgrid_a', 'mesh': 'mesh_a', 'region': 'region_a'}
class A.components:    {'grid': 'grid_a', 'subgrid': 'subgrid_a', 'mesh': 'mesh_a', 'region': 'region_a'}

instance b1.components: {'grid': 'grid_b', 'mesh': 'mesh_a', 'network': 'network_b'}
class B.components:     {'grid': 'grid_b', 'mesh': 'mesh_a', 'network': 'network_b'}
adding region component to b2
instance b2.components: {'grid': 'grid_b', 'mesh': 'mesh_a', 'network': 'network_b', 'region': 'region_b2'}
instance b1.components: {'grid': 'grid_b', 'mesh': 'mesh_a', 'network': 'network_b', 'region': 'region_b2'}
class B.components:     {'grid': 'grid_b', 'mesh': 'mesh_a', 'network': 'network_b', 'region': 'region_b2'}

instance c.components: {'grid': 'grid_c', 'mesh': 'mesh_c', 'network': 'network_b'}
class C.components:    {'grid': 'grid_c', 'mesh': 'mesh_c', 'network': 'network_b'}

Couple of things to note. Because we overwrite class variables simply because of how inheritance works, neither B nor C has ended up with a subgrid component which I would actually expect personally. Second, we have added a region component to b2 while we only added it to b1, so if one is going to have multiple of the same components, this is going to get confusing quick. Last, because the parts in the __init__ function get applied we now end up with a situation where we have some parts of our parents end up in our classes but not all of them. And this is without the dynamic/lazy loading we do. With this specific way of doing this, it's going to be very important to keep track in which order and manner things get initialised, something that already caused us trouble in the past in e.g. #256. I'm not saying it can't be done properly but I'm already struggling with just this toy example, so I foresee trouble when we scale this up to actual components.

DirkEilander · 2023-11-23T16:51:56Z

DirkEilander
Nov 23, 2023
Maintainer Author

The implementation example below creates a Model class with multiple components (such as 'grid' and 'forcing'), each being an instance of ModelComponent and linked to the parent Model instance through its model attribute.

With this structure we avoid that we would break a ModelComponent if we find that we need to keep more objects in sync between Model and ModelComponent at a later stage. weakref allows an indirect reference the Model instance avoiding circular references which might hinder garbage collection.

This would solve potential issues raised by @hboisgon.
Any thought on this implementation @savente93?
If you both agree I will make an issue suggesting an implementation along this line.

#%% test ModelComponent class 
import weakref
from pathlib import Path

class ModelComponent:
    def __init__(self, model):
        self._model_ref = weakref.ref(model)
        self._data = None

    @property
    def model(self):
        # Access the Model instance through the weak reference
        return self._model_ref()  

    @property
    def data(self):
        return self._data
    
    def set(self, value):
        self._data = value

class ModelRoot:
    def __init__(self, path, mode='r'):
        self.set(path, mode)

    def set(self, path, mode='r'):
        self.path = Path(path)
        self.mode = mode

    def __repr__(self):
        return f"ModelRoot(path={self.path}, mode={self.mode})"

class Model:
    components = {'grid': ModelComponent, 'forcing': ModelComponent}

    def __init__(self, root, mode='r', data_libs=[]):
        self.root = ModelRoot(root, mode)

        # initialize components with linked model instance
        for name, component in self.components.items():
            setattr(self, name, component(model=self))

    def __repr__(self):
        return f"Model(root={self.root})"

# access model property from component
mod = Model('test', mode='r')
mod.root.set('test2', mode='w')
assert mod.root == mod.grid.model.root
print(mod.grid.model.root)
# >>> ModelRoot(path=test2, mode=w)

## acces one component from another
mod.grid.model.forcing.set('test')
print('forcing data:', mod.forcing.data)
# >>> forcing data: test

# infinite instance.attribute.instance.attribute stack .. ?
print(mod.grid.model.forcing.model.grid.model.forcing.model)
# >>> Model(root=ModelRoot(path=test2, mode=w))

2 replies

savente93 Nov 27, 2023
Maintainer

I have seen this reply, but have not had the time to look at it in detail, I will try to do so today.

savente93 Nov 27, 2023
Maintainer

sorry but I'm toast for today, I'll have to come back to this tomorrow

deltamarnix · 2024-02-29T15:47:24Z

deltamarnix
Feb 29, 2024
Collaborator

We have come to more consensus on the topic during a meeting with the developers: @Jaapel @DirkEilander @Tjalling-dejong @savente93 .

We want to continue with this refactor, because we see the following advantages:

Flexibility of adding components, like multiple grids
Maintainability
Clearer terminology
robustness and reproducibility are the most important aspects for users

Although the problem is actually a dependency graph and could benefit in performance by parallel processes, we decide for this first iteration to not look into that resolve step yet.

We propose the structure below where we made the following considerations:

There are both beginner and advanced users. So a simple getter like model.grid or model.region is desired. Static typing will not work in those cases, but beginner users might not be looking for that.
Static typing is a useful tool and we need to make sure that we don't have a too dynamic implementation with duck-typing. For example, we could have considered a data: Any attribute in ModelComponent.
Models will follow a CRUD structure.
Models have a read and write function.
Plugins need an EntryPoint to subscribe themselves to HydroMT. We do this by inheritance of Model.
Components have a name to simplify accessors, and users can then use those names as accessors. This makes static typing more difficult.
We still need to make a plan on how to validate before create is called, because we have encountered moments where the user receives an error in a very late stadium of their execution.

from abc import ABC, abstractmethod
from typing import Type, TypeVar, cast
import weakref

T = TypeVar("T", bound=ModelComponent)

class Model():
    def __init__(self):
        self._components: dict[str, ModelComponent] = {}
        # We could add the ModelRegionComponent by default to every model
        self.add_component('region', ModelRegionComponent())

    def add_component(self, name, component) -> None:
        self._components[name] = component

    def create(self) -> None:
        # We will need to ensure that the order of the components is important for execution
        for c in self._components.values():
            c.create()

    # Python 3.10 would support def get_component[T](self, name: str) -> T.
    def get_component(self, name: str, _: Type[T]) -> T:
        return cast(T, self._components[name])

    @property
    def region(self) -> ModelRegionComponent:
        return self.get_component('region', ModelRegionComponent)

    # Automatically try to resolve the component by name.
    # Making it possible to use any component as a property.
    # Type hinting will most likely not work when you use it this way.
    # Fine for beginner users, not recommended within our own code base.
    # @property tells mypy that any accessors are allowed to return a ModelComponent if it exists
    @property 
    def __getattr__(self, name) -> ModelComponent:
        return self._components[name]


class ModelComponent(ABC):
    def __init__(self, model: Model):
        self._model_ref = weakref.ref(model)

    @abstractmethod
    def create(self) -> None:
        ...

class ModelRegionComponent(ModelComponent):
    _region: hydromt.GeoDataFrame

    def __init__(self, model: Model):
        super().__init__(model)

    def create(self) -> None:
        # Something to create the region.
        self._region = hydromt.read_file("path/to/region.shp")

    @property
    def region(self) -> hydromt.GeoDataFrame:
        return self._region


class GridComponent(ModelComponent):
    _grid: hydromt.RasterDataset

    def __init__(self, model: Model):
        super().__init__(model)

    def create(self):
        self._grid = hydromt.RasterDataset()

    def add_data(self, data):
        # Make a nice implementation to add data to the grid.
        self._grid.set(data)

    @property
    def grid(self) -> hydromt.RasterDataset:
        return self._grid


# HydroMT-core users could make an empty model and add components to it
mod = Model()
grid = GridComponent(mod)
mod.add_component('grid', grid)
grid.create()

# Because of the __get_attr()__ implementation, users can also dynamically get the grid:
mod = Model()
mod.add_component('grid', GridComponent(mod))
mod.grid.create() # this will work, but mypy doesn't help the user

# HydroMT plugin developers could inherit from the Model class and add components to it.
# Then also add extra properties for known components.
# This is useful when performing static type checking by users of the plugin.
class WflowModel(Model):
    def __init__(self, ):
        self.add_component("static_maps", GridComponent(self))

    @property
    def static_maps(self) -> GridComponent:
        return self.get_component('static_maps', GridComponent)

9 replies

DirkEilander Mar 1, 2024
Maintainer Author

The more than one type of data is a good point, I haven't seen this case yet but I can imagine it will be developed.

We can add static typing for the data property of each ModelComponent subclass right? In that case the user gets hints on the type regardless whether it is called data or e.g. grid. We also don't need to specify a data property in the ModelComponent class to avoid overwriting of types if that's a concern. So for me the argument for not using data feels more theoretical and based on static typing languages, than based on user friendliness.

For no we can choose one option, and I'm fine with following your suggestion. I would however like to get feedback from users after v1-alpha to understand if our choice also makes sense to them. Would that be a good way forward for everyone?

deltamarnix Mar 4, 2024
Collaborator

I can live with the naming convention of data. But data will have to be implemented per component and there will be no base implementation.

class ModelRegionComponent(ModelComponent):

    def __init__(self):
        self._data = GeoDataFrame()

    @property
    def data(self):
        return self._data

deltamarnix Mar 4, 2024
Collaborator

@Jaapel If we do it like this as you propose:

@property 
def __getattr__(self, name: str) -> ModelComponent:
    try:
        return super().__getattr__(name)
    except AttributeError:
        return self._components[name]

We don't know what super().__getattr__(name) will return, that is of type Any. So we would have to make it Any still.

deltamarnix Mar 4, 2024
Collaborator

Actually, it seems that it is not required to try-catch in this case. Because there is a difference between __getattr__ and __getattribute

https://medium.com/@satishgoda/python-attribute-access-using-getattr-and-getattribute-6401f7425ce6

Jaapel Mar 4, 2024

I did not know this! Then this is not needed. And it is indeed not possible to type .data access, as the return type depends on the subclass. unless we start using Generics for the "hydromt types" (like GeoDataFrame, where the .data will return a geopandas.GeoDataFrame etc.). Then we could type the return type based on that. Another option is overload from the typing module. But that does not support plugin-components.

Tjalling-dejong · 2024-03-01T09:33:51Z

Tjalling-dejong
Mar 1, 2024
Collaborator

The model components create, write, and set have component specific arguments. How will these arguments be passed in this implementation? :`

def create(self) -> None:
        # We will need to ensure that the order of the components is important for execution
        for c in self._components.values():
            c.create()

Are the arguments coming from the config component?

1 reply

DirkEilander Mar 1, 2024
Maintainer Author

In the current implementation we do have Model.read and Model.write calling all the read and write methods of the components. The ModelComponent.read and ModelComponent.write methods should not have any required arguments (optional is ok) so that a user can read all components by calling Model.read without having to pass arguments. Required arguments are also typically not needed as the file formats and paths are either fixed or prescribed in the model simulation config. Note that the order of reading and writing does matter, config should always be read first and written last.

We can indeed not make a Model.create method as all ModelComponent.create (and add_data*) methods do have specific required arguments. In the current setup we do have the Model.build and Model.update methods instead that loop over a dict of methods with user defined arguments, which is typically parsed from the hydromt yaml config file, to create required model components and adds data. This method needs to be adjusted to work with components, see also my initial message, but can largely stay as is.

New ModelComponent class #636

DirkEilander Nov 6, 2023 Maintainer

Current implementation

Suggested implementation

Challenges

Replies: 9 comments · 16 replies

savente93 Nov 20, 2023 Maintainer

Suggested implementation

Challenges

DirkEilander Nov 20, 2023 Maintainer Author

hboisgon Nov 21, 2023 Maintainer

DirkEilander Nov 21, 2023 Maintainer Author

example

hboisgon Nov 22, 2023 Maintainer

DirkEilander Nov 22, 2023 Maintainer Author

DirkEilander Nov 22, 2023 Maintainer Author

savente93 Nov 22, 2023 Maintainer

DirkEilander Nov 22, 2023 Maintainer Author

savente93 Nov 27, 2023 Maintainer

DirkEilander Nov 23, 2023 Maintainer Author

savente93 Nov 27, 2023 Maintainer

savente93 Nov 27, 2023 Maintainer

deltamarnix Feb 29, 2024 Collaborator

DirkEilander Mar 1, 2024 Maintainer Author

deltamarnix Mar 4, 2024 Collaborator

deltamarnix Mar 4, 2024 Collaborator

deltamarnix Mar 4, 2024 Collaborator

Jaapel Mar 4, 2024

Tjalling-dejong Mar 1, 2024 Collaborator

DirkEilander Mar 1, 2024 Maintainer Author

New `ModelComponent` class #636

DirkEilander
Nov 6, 2023
Maintainer

Replies: 9 comments 16 replies

savente93
Nov 20, 2023
Maintainer

DirkEilander
Nov 20, 2023
Maintainer Author

hboisgon
Nov 21, 2023
Maintainer

DirkEilander
Nov 21, 2023
Maintainer Author

hboisgon
Nov 22, 2023
Maintainer

DirkEilander Nov 22, 2023
Maintainer Author

DirkEilander Nov 22, 2023
Maintainer Author

savente93
Nov 22, 2023
Maintainer

DirkEilander Nov 22, 2023
Maintainer Author

savente93 Nov 27, 2023
Maintainer

DirkEilander
Nov 23, 2023
Maintainer Author

savente93 Nov 27, 2023
Maintainer

savente93 Nov 27, 2023
Maintainer

deltamarnix
Feb 29, 2024
Collaborator

DirkEilander Mar 1, 2024
Maintainer Author

deltamarnix Mar 4, 2024
Collaborator

deltamarnix Mar 4, 2024
Collaborator

deltamarnix Mar 4, 2024
Collaborator

Tjalling-dejong
Mar 1, 2024
Collaborator

DirkEilander Mar 1, 2024
Maintainer Author