Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reason about a configured experiment #2170

Closed
markusdregi opened this issue Oct 8, 2021 · 12 comments
Closed

Reason about a configured experiment #2170

markusdregi opened this issue Oct 8, 2021 · 12 comments

Comments

@markusdregi
Copy link
Contributor

Introduction
Currently we pass around a collection of configuration objects in ERT3. Example:

def run(
    ensemble: ert3.config.EnsembleConfig,
    stages_config: ert3.config.StagesConfig,
    experiment_config: ert3.config.ExperimentConfig,
    parameters_config: ert3.config.ParametersConfig,
    workspace_root: pathlib.Path,
    experiment_name: str,
) -> None:

I think it makes sense to hold the configuration of an experiment in an object that can represent the entire configuration instead of passing around a collection of individually incomplete objects.

@jondequinor
Copy link
Contributor

jondequinor commented Oct 8, 2021

I agree that this is bad, but not that carrying a god configuration object around is an improvement.

All of these configs should result in an experiment object that can be run, and this should happen as early as possible in the code. This object is not configuration, it is state and behavior resulting from interpreting configuration.

The driving principle is separation of configuration and the internals of ERT. They must be allowed to vary independently.

@markusdregi
Copy link
Contributor Author

I'm not arguing that we should carry around a god configuration object. The milestone I'm currently failing to create due to ZenHub issues states that we should reevaluate how far into the code the configuration object is passed. However, I do believe that it makes sense that the workspace (see #2168) serves a god configuration object which by the requester is then translated into an experiment object instead of passed around... 🤷🏻

@markusdregi
Copy link
Contributor Author

After evaluating back and forth this might actually be the best place to make this note: I think there should be created a single experiment configuration object that is returned by the workspace module. However, that configuration object should afterwards quickly be turned into an experiment object as @jondequinor points out.

@jondequinor
Copy link
Contributor

jondequinor commented Oct 8, 2021

We've also discussed, though I can take the blame should this be labelled a bad idea: A configuration object (CO) should be separate from the user configuration (UC) such that the CO can be created from an ERT2 configuration file (E2C). This introduces the configuration translation layer (CTL) which sets in stone the decoupling of UC and CO. It's also strangulation of the legacy configuration where the CTL is the proxy.

There will be pieces of data missing from the E2C that can't be guessed in the CTL, and that data delta needs to be provided by the user. But the migration of an ERT2 asset could be minor (barring unforeseen complexity, which there is plenty of).

Of course, this does not make sense for a pilot project, where the configuration and model is greenfield development, but designing for both E2C and a new configuration format provided by ert 3 will have benefits beyond what's described here.

@sondreso sondreso added this to SCOUT Oct 11, 2021
@markusdregi
Copy link
Contributor Author

Good point @jondequinor! I agree that this approach would have many benefits for which no other approach can provide in a reasonable manner...

I still think gathering the current complexity in ERT3 configuration would be an improvement over the current situation and a better starting point for creating a decoupled user configuration and configuration object in general though🤔

@oyvindeide
Copy link
Collaborator

oyvindeide commented Oct 11, 2021

Seeing as there is some discussion about the configuration here I will add my thoughts, though it does possibly span multiple milestones.

First of I agree that we should limit how far the configuration should pass into the code, however it might be time we take a step back and evaluate the lessons learned from ert3 so far and capitalizer on them before we move on:

Records

We are in the process of formalizing them, and with the dark storage API we should be able to use them in ert2 to test out future data flow concepts.

Stages

This concept would most likely help our users tremendously, for example the Eclipse stage would alleviate a lot of pain currently in ert2, not within the scope of the current config, but more along the lines of what @jondequinor proposes. Also they have a lot of the domain knowledge that will go into these stages. This will be a core concept that a lot of our current users could provide feedback on and help us iterate.

Perhaps we should try to use these new concepts in ert2 to achieve a few goals we have talked about previously:

Analysis module

We should be able to split these out and make them stand-alone, we would need this for ert3 anyway, and with the storage api and records it would be feasible.

Configuration

Create a minimum configuration for ert2.5 (figuratively), we are half way there as it is a milestone we have worked towards previously, it would incorporate records and stages. We would get a lot of testing and could iterate faster with more users while making their configuration more explicit.

tl:dr

My main concern is that we are bypassing the partially untapped resource that is many of our users, and not taking input from them. There should be opportunity to try out the core concepts on our current users and iterate as we go. With the api-strategy we currently have that should not really slow down becoming cloud native, and we would get a lot more testing by current users.

@jondequinor
Copy link
Contributor

My main concern is that we are bypassing the partially untapped resource that is many of our users, and not taking input from them.

How do you suggest such input is gathered?

@markusdregi
Copy link
Contributor Author

Our strategy is indeed to expose as much as reasonably possible of any new implementation and functionality at the place where most of our users are. And today all of our users are on-premise, which sets a clear context to the first sentence. Hopefully we'll have test users soon in Azure as well, but most of our users will remain on-premise using ERT2 for quite some time. Due to this I'm happy to see that this challenge is concerning you @oyvindeide!

One thing we do have to keep in mind though is that the threshold for making breaking changes to the ERT2 configuration is currently viewed to be very high.

Let me try to comment on your concrete suggestions...

When it comes to records I agree that we should be able to to utilise the record concept further in ERT2 when it has matured during the ongoing milestone. However, there is a journey from being able to represent readable data for visualisation via the API backed by block_fs and use that as the main source for all data communication within the application. It is a good idea, but it will have to be done over multiple steps. Some of which I think we already have on the current list of future milestones 👍🏻

Like separating the analysis modules, storage and evaluation which you mention above and is intended to be initiated here: https://github.com/equinor/ert/blob/main/docs/rst/manual/architecture/dev-strategy.rst#temporary-storage-for-evaluator-and-analysis.

When it comes to rolling out stages in ERT2; I think that is a really good idea! We are currently rolling out the legacy ensemble evaluator on-premise and that is a necessary first step. After that, to get further on this track, I guess we would have to start running the ERT2 forward model as a unix step with shared disk transmitters. However, this already is quite an effort as we would have to:

  • resolve the current security issues that running prefect on premise would impose (access to the disk area can no longer be used for authentication as the prefect server can be connected to and used to spawn tasks running as the user launching the server, which might not be you) and
  • extract the logic of enkf main that currently persists data to the run path and implement it in transmitters and transformations instead while being backwards compatible on the format that it is persisted.

The above steps would be valuable to get testing of the prefect evaluator and to start removing responsibilities from enkf main. But I will not be a walk in the part :) And notice that it would not bring any new functionality to our user yet.

Afterwards, we could consider introducing the stage concept itself in ERT2. From a configuration perspective I think the best option would be to allow for a new keyword then in the ERT2 configuration where you with a keyword point to the stage configuration file that would then be equal to the stage configuration file of ERT3 and explicitly name the stage for which you want to define as your forward model. I don't think this would easily combine with usage of FORWARD_MODEL nor SIMULATION_STEP, which implies that we probably need to say that if you use stages, the entire forward model must be defined in stages. The remaining challenges then would be:

  • the stage only describes the data within the stages. We would need to still describe the data that corresponds to the records within the stage (mostly the parameters in ERT2, as the other data can still be copied over disk 🤔). We could either set up a linking scheme as in ERT3's ensemble.yml by pointing at a data element and state it to be this specific input record, or we could make a hard assumption on that the parameter names corresponds directly with the input names of the stages. The later looses some neat flexibility of reuse of stages, but avoids introducing even more configuration into ERT2.
  • the assumption of not having shared disk gives some flexibility within ERT3 when it comes to reason about the independence of different stages, passing data from one stage to another etc. This assumption can probably not be made in ERT2 due to how it is currently utilised. Is it possible to allow for (some) multi stage usage in ERT2 without introducing unreasonable inconsistencies?

@sondreso
Copy link
Collaborator

sondreso commented Oct 15, 2021

resolve the current security issues that running prefect on premise would impose (access to the disk area can no longer be used for authentication as the prefect server can be connected to and used to spawn tasks running as the user launching the server, which might not be you)

Not that it really matters for the arguments here, but this is a non-issue as long as we don't create a central scheduler, which we currently don't do.

@markusdregi
Copy link
Contributor Author

Not that it really matters for the arguments here, but this is a non-issue as long as we don't create a central scheduler, which we currently don't do.

Really? But don't we create a temporary scheduler during the experiment? And can one not communicate with that scheduler during its lifetime?

I agree that it is not the main part of the discussion anyway, but now I got a bit curious :)

@sondreso
Copy link
Collaborator

Really? But don't we create a temporary scheduler during the experiment? And can one not communicate with that scheduler during its lifetime?

You are of course correct, I forget that it cannot be local only 🤦

@sondreso sondreso closed this as completed Nov 9, 2021
@sondreso sondreso moved this to Done in SCOUT Nov 9, 2021
@sondreso sondreso reopened this Nov 9, 2021
@sondreso
Copy link
Collaborator

This was resolved in #2433

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

4 participants