Replies: 2 comments
-
In my experience each software uses one file format for configuration, for instance Wflow and Ribasim use TOML; SFINCS uses its own input file format. For HydroMT we have decided to use yaml because of a few reasons: 1) it is an easy readable format where the key of each section corresponds to a method of a Model class, 2) it is widely used for workflows and catalog configurations, 3) it provides a more condensed format than toml for nested arguments. In my opinion, supporting different formats would only add confusion for users and to my knowledge their is no demand for TOML from users at this point. Furthermore, pro users can go around the CLI and use toml in their scripts in combination with the DataCatalog.from_dict method and the Model.build method if they want to. I therefore suggest to only implement toml support in the Model.read_config/write_config methods to support models like Wflow and Ribasim that use TOML, and not as possible configuration file format for the HydroMT configuration & datacatalog files. |
Beta Was this translation helpful? Give feedback.
-
Given that we haven't had any new perspectives added than the one Dirk added, I think we can safely assume that that is the consensus (i.e. only support toml for model configuration), so I'm going to close this RFC. The PR for the toml support is already there so I'll open that up for review. The ADR PR will come later. |
Beta Was this translation helpful? Give feedback.
-
toml-config-support
Summary
Currently we only support yaml for configuring either the data catalogs, or the models. Users have expressed desire for the option to configure HydroMT models using the toml format. This, however raises the question: Should we only support it for the model configuration or everywhere? This RFC argues in favor of supporting toml throughout all of HydroMT
Motivation
Users have expressed the desire to be able to use toml for their model configuration, because this is more compatible with other systems. Therefore it is desired to at least allow the configuration of models in toml. For teams that use toml for their other configurations, it avoids having to mix and match somewhat incompatible language formats. (yaml supports
null
whereas toml has no such functionality)Toml is also gaining more popularity as a configuration format, (e.g.
pyproject.toml
). This means that having support for TOML would also open up the possibility for tooling made for other purposes such as linters and schema checkers to be used for HydroMT as well.Finally a recommendation has been made by DSC to use toml as a first option for new projects within Deltares. To strive for a uniformity with future projects a strong case would be made to at least support toml for those projects, if not outright move towards toml (though the latter is outside the scope of this RFC). Even if our users experience some discomfort from it, it might make the Deltares software suite more consistent to external users which can then be used to only use toml.
Guide-level explanation
hydromt build sfincs /path/to/model_root -r "{'bbox': [4.6891,52.9750,4.9576,53.1994]}" -i /path/to/sfincs_config.yaml -d /path/to/data_catalog.yml -v
the user should be able to type
hydromt build sfincs /path/to/model_root -r "{'bbox': [4.6891,52.9750,4.9576,53.1994]}" -i /path/to/sfincs_config.toml -d /path/to/data_catalog.toml -v
(only changed the extentions on the config files)
hydromt build sfincs /path/to/model_root -r "{'bbox': [4.6891,52.9750,4.9576,53.1994]}" -f toml -i /path/to/sfincs_config.toml -d /path/to/data_catalog.toml -v
hydromt build sfincs /path/to/model_root -r "{'bbox': [4.6891,52.9750,4.9576,53.1994]}" --config-format toml -i /path/to/sfincs_config.toml --catalog-format toml -d /path/to/data_catalog.toml -v
pyyaml, tomli-w
Since TOML does not support null values like YAML does, the extra cleaning function is needed. This function can be trivially adjusted to simply discard
None
values if these are not necessary.Reference-level explanation
For this more detail on this see #444
Drawbacks
One potential drawback is that, as some core team members have expressed, mixing and matching languages could be confusing for users. This is mitigated, in the authors view, by the fact that we would only accept toml, meaning it would only be used by those that are already familiar with it. Everything else, such as HydroMT configs, the main catalog, and examples should remain in yaml, and should not be duplicated. We can add a simple section on how to move from yaml to toml akin to the section above, for those that are interested but otherwise only use yaml in our documentation.
One other drawback is that due to it's flexible nature, there are multiple ways of representing the same data structure in toml. This could provide some issues with consistency, although this should be fairly easily remedied by introducing a lintier.
Rationale and alternatives
ini
configuration for HydroMT but this has since been deprecated. It is technically still supported but it's use is discouraged. This means even though our aim was to move to a single supported format, there is precedent for supporting multiple formats concurrently.ini
totoml
could cause, but given that the transition toyaml
was deemed sufficient, and the translation fromyaml
totoml
was described above, the author does not expect this to create a significant barrier.In toml there are two options for this
While this is not much better in toml this can also be presented as
This somewhat hides the nested structure, which can either be a up or downside depending on the readers considerations. One thing that should also be noted there is that this can impose quite significant constraints on the yaml given that for readability purposes, line length usually has an upper limit imposed on it (as it was until recently in HydroMT). In many cases this can be elevated by splitting values across multiple lines, but this is not always possible (as is the case with paths like above). Here the indents count towards the character limit, especially since usually spaces are preferred over tabs, this limitation can be quite significant. In toml there is at least the option of mitigating this by using the syntax used in the second snippet.
Finally, this last syntax could compose quite well with changes currently under review in #438 Here we introduce the options of
version
which apply diffs to theirbase
catalog entry. Consider the possible alternatives in both yaml and toml:While the yaml maintains more of a visual hierarchy toml provides more flexibility in how to define these mappings. Note that even in the above example, this is not the only way to represent this structure within the toml format, which can be either an advantage or a disadvantage depending on the reader's viewpoints.
Prior art
While yaml is a widely supported format by big industry tools such as (but not limited too)
It has also received some criticism:
Yaml aims to be mostly human readable. Some criticism has focused on the complexities that this can introduce. In addition the significant whitespace can be bad for it's compressibility making it more error propone and less portable between editors. Yet it still remains a very widely accepted format.
There are two main competitors to Yaml:
pyproject.toml
) and with it tools such as PoetryOne well known fact that poses a risk here is the fact that once adopted it can be very hard to convince users to leave it behind, and thus to deperacte it down the line for any reason. Some examples that come to mind are:
ini
file format.get-poetry.py
Therefore the decision to support another language should not be taken lightly. However, given the adoption of toml in the above mentioned tools and communities it is very unlikely that support for toml will wain in the coming years. It is equally unlikely that an objective winner will emerge from the yaml vs. toml debate, therefore there is a case to be made for supporting both.
Unresolved questions
Future possibilities
While outside the scope of this RFC, one could imagine a future where, due to the aforementioned uniformity across the Deltares software suite, HydroMT is encouraged to move towards Toml entirely. While this is not imminently happening to anyone's knowledge, a transitional period where we support both yaml and toml would make this transition easier to say the least.
Beta Was this translation helpful? Give feedback.
All reactions