-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reason about a configured experiment #2170
Comments
I agree that this is bad, but not that carrying a god configuration object around is an improvement. All of these configs should result in an experiment object that can be run, and this should happen as early as possible in the code. This object is not configuration, it is state and behavior resulting from interpreting configuration. The driving principle is separation of configuration and the internals of ERT. They must be allowed to vary independently. |
I'm not arguing that we should carry around a god configuration object. The milestone I'm currently failing to create due to ZenHub issues states that we should reevaluate how far into the code the configuration object is passed. However, I do believe that it makes sense that the workspace (see #2168) serves a god configuration object which by the requester is then translated into an experiment object instead of passed around... 🤷🏻 |
After evaluating back and forth this might actually be the best place to make this note: I think there should be created a single experiment configuration object that is returned by the workspace module. However, that configuration object should afterwards quickly be turned into an experiment object as @jondequinor points out. |
We've also discussed, though I can take the blame should this be labelled a bad idea: A configuration object (CO) should be separate from the user configuration (UC) such that the CO can be created from an ERT2 configuration file (E2C). This introduces the configuration translation layer (CTL) which sets in stone the decoupling of UC and CO. It's also strangulation of the legacy configuration where the CTL is the proxy. There will be pieces of data missing from the E2C that can't be guessed in the CTL, and that data delta needs to be provided by the user. But the migration of an ERT2 asset could be minor (barring unforeseen complexity, which there is plenty of). Of course, this does not make sense for a pilot project, where the configuration and model is greenfield development, but designing for both E2C and a new configuration format provided by ert 3 will have benefits beyond what's described here. |
Good point @jondequinor! I agree that this approach would have many benefits for which no other approach can provide in a reasonable manner... I still think gathering the current complexity in ERT3 configuration would be an improvement over the current situation and a better starting point for creating a decoupled user configuration and configuration object in general though🤔 |
Seeing as there is some discussion about the configuration here I will add my thoughts, though it does possibly span multiple milestones. First of I agree that we should limit how far the configuration should pass into the code, however it might be time we take a step back and evaluate the lessons learned from ert3 so far and capitalizer on them before we move on: Records We are in the process of formalizing them, and with the dark storage API we should be able to use them in ert2 to test out future data flow concepts. Stages This concept would most likely help our users tremendously, for example the Eclipse stage would alleviate a lot of pain currently in ert2, not within the scope of the current config, but more along the lines of what @jondequinor proposes. Also they have a lot of the domain knowledge that will go into these stages. This will be a core concept that a lot of our current users could provide feedback on and help us iterate. Perhaps we should try to use these new concepts in ert2 to achieve a few goals we have talked about previously: Analysis module We should be able to split these out and make them stand-alone, we would need this for ert3 anyway, and with the storage api and records it would be feasible. Configuration Create a minimum configuration for ert2.5 (figuratively), we are half way there as it is a milestone we have worked towards previously, it would incorporate records and stages. We would get a lot of testing and could iterate faster with more users while making their configuration more explicit. tl:dr My main concern is that we are bypassing the partially untapped resource that is many of our users, and not taking input from them. There should be opportunity to try out the core concepts on our current users and iterate as we go. With the api-strategy we currently have that should not really slow down becoming cloud native, and we would get a lot more testing by current users. |
How do you suggest such input is gathered? |
Our strategy is indeed to expose as much as reasonably possible of any new implementation and functionality at the place where most of our users are. And today all of our users are on-premise, which sets a clear context to the first sentence. Hopefully we'll have test users soon in Azure as well, but most of our users will remain on-premise using ERT2 for quite some time. Due to this I'm happy to see that this challenge is concerning you @oyvindeide! One thing we do have to keep in mind though is that the threshold for making breaking changes to the ERT2 configuration is currently viewed to be very high. Let me try to comment on your concrete suggestions... When it comes to records I agree that we should be able to to utilise the record concept further in ERT2 when it has matured during the ongoing milestone. However, there is a journey from being able to represent readable data for visualisation via the API backed by block_fs and use that as the main source for all data communication within the application. It is a good idea, but it will have to be done over multiple steps. Some of which I think we already have on the current list of future milestones 👍🏻 Like separating the analysis modules, storage and evaluation which you mention above and is intended to be initiated here: https://github.com/equinor/ert/blob/main/docs/rst/manual/architecture/dev-strategy.rst#temporary-storage-for-evaluator-and-analysis. When it comes to rolling out stages in ERT2; I think that is a really good idea! We are currently rolling out the legacy ensemble evaluator on-premise and that is a necessary first step. After that, to get further on this track, I guess we would have to start running the ERT2 forward model as a unix step with shared disk transmitters. However, this already is quite an effort as we would have to:
The above steps would be valuable to get testing of the prefect evaluator and to start removing responsibilities from enkf main. But I will not be a walk in the part :) And notice that it would not bring any new functionality to our user yet. Afterwards, we could consider introducing the stage concept itself in ERT2. From a configuration perspective I think the best option would be to allow for a new keyword then in the ERT2 configuration where you with a keyword point to the stage configuration file that would then be equal to the stage configuration file of ERT3 and explicitly name the stage for which you want to define as your forward model. I don't think this would easily combine with usage of
|
Not that it really matters for the arguments here, but this is a non-issue as long as we don't create a central scheduler, which we currently don't do. |
Really? But don't we create a temporary scheduler during the experiment? And can one not communicate with that scheduler during its lifetime? I agree that it is not the main part of the discussion anyway, but now I got a bit curious :) |
You are of course correct, I forget that it cannot be local only 🤦 |
This was resolved in #2433 |
Introduction
Currently we pass around a collection of configuration objects in ERT3. Example:
I think it makes sense to hold the configuration of an experiment in an object that can represent the entire configuration instead of passing around a collection of individually incomplete objects.
The text was updated successfully, but these errors were encountered: