-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix step number used by parallel io and OutputWriter::writeTimeStep #769
Conversation
I am really uneasy about ca5164e. it seems weired that this has worked before given that the documentation says the parameter has to be a report step. Please double check that. |
@blattms: I think reportStep is wrong, but I need to double check. It may be that the documentation was not updated or the output code has changed in the meantime. I have idea about the current state of the output. |
The number is used to get wells from the schedule in EclipseWriter::writeTimeStep. I assume that this results in undefined behaviour if it is not the report_step. In my debugging run in the second adaptive time step for the first report step (=0) |
@@ -344,7 +344,7 @@ namespace Opm | |||
if (initConfig->restartRequested() && ((initConfig->getRestartStep()) == (timer.currentStepNum()))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consistently this might need to be timer.reportStepNum()
, too.
Maybe @joakim-hove or @bska could shed some light on the right step number to use for output? My current impression is that at least one of the following holds when writing output
|
At least that is what is documented.
When running in parallel a well state object with the well information of the whole grid needs to constructed to gather the information from all processes. Previously, this was done with the report step exported by the timer. This was wrong for the following reason: The output occurs after solving the time step and the timer is already incremented. This means that we constructed the well state for gathering the data for the next report step, already. Unfortunately, at that step some wells that we have computed results for might have been shut. In that case an exception with message "global state does not contain well ..." was thrown. This problem occured for Model number 2 and might have been due to shut wells because of banned cross flow. With this commit we use the last report step if this is not an initial write and not a substep.
6511222
to
4a6be3d
Compare
rebased to resolve conficts with current master |
Sorry - saw the comment just now. Will try to see if I can give any input here. |
I am on this, reviewing. |
Great! |
@@ -370,7 +377,7 @@ namespace Opm | |||
|
|||
*/ | |||
|
|||
eclWriter_->writeTimeStep(timer.currentStepNum(), | |||
eclWriter_->writeTimeStep(timer.reportStepNum(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am on board with this change. I agree, it seems weird that it worked! Reading the code in opm-output indicates to me this must be correct.
This might not answer anything; but anyway:
]<-------]<--------]<--------}<--------]<--------]
0 1 2 3 4 5
The simulation illustrated consist of five report steps, assuming they are all 1 day long we will get: CASE.X0000 : Initial state
CASE.X0001 : State after 1 day of simulation
CASE.X0002 : State after 2 days of simulation
...
CASE.X0005 : Final state - after 5 days of simulation. [*] Non unified that is - for increased understanding of these matters it might be beneficial to remove the |
What is the most clean way to approach this is not clear to me; but I guess one could think of step zero as the step from (-∞START] - i..e the equilibriation and then step 1 goes from <STARTE,DATES(1)].
This might be less than perfect in opm-output at present, but my goal is that the output layer just takes the report step as an id uses it unmodified as such. |
Thanks @atgeirr and @joakim-hove. That clarifies things. So to sum this up for @joakim-hove's example: Assume that we are simulating from day 1 0:00 until day 1 24:00 (the second interval in the sketch). For substeps we will be using report step 1 and the current time. At the end we will use report step 2 and the current time. In both cases the well information (shut/open) is defined by report step 1. To preserve this information in the parallel case I needed to introduce the That means my patch is meaningful and does the right thing. Now the point 3.ii by @joakim-hove seems to lead to vanishing information of the wells. Data of wells that are open in this and shut in the next report step might not get written (as the step passed is 2 and the state is read from the schedule by opm-output). Is that correct? I do not think that this is a problem but we should keep it in mind. |
BTW I might have messed up the dataset of Model 2. That could be the problem of my runs. master with this PR merged works fine for @GitPaean |
After discussion with @joakim-hove we have come to the conclusion that the code in opm-output is the one that should be changed. The underlying problem is, as described in @joakim-hove's third point above, that "The report step is used for two conceptually different things", both to get the well schedule and to identify the restart point via file name or sequence number. Those two numbers should be different for all cases but the initial state. Continuing with the 5-step example, the current master would pass 0, 1, ..., 5 to the output-writer, which is correct for the sequence numbers but fails when it gets the schedule (off by one, except for the initial call). This patch does the opposite error: it passes 0, 0, 1, ..., 4 to the output-writer, which gives the right schedule but the sequence number will be wrong in the output file(s). The proper solution is therefore to change opm-output so that it will subtract one from the sequence number before getting the schedule, unless already zero. With that change, the current master branch will give the correct result and this PR should be closed. We should keep it open until such a change has been implemented and tested to work though, since it provides a workaround (if you can live with wrong filenames/sequence numbers). @joakim-hove has taken on the task of drafting the fix in opm-output. |
IMHO this statement is not true. The reportstep number is not changed by my PR. I just correct the number used in ParallelDebugOutput to query the wellstate from the schedule (this is where your 0, 0, ... applies). I cannot see how changes in opm-output will fix the problem in ParallelDebugOutput. What am I missing? |
I do not think that this PR is a work around for the problem that @joakim-hove wants to solve. These seem unconnected: |
After off-line communication I have realized that this fix is needed in addition to the changes proposed for opm-output. I therefore intend to merge this unless someone disagree strongly (and soon...). |
When running in parallel a well state object with the well information of the whole grid needs to constructed to gather the information from all processes. Previously, this was done with the report step exported by thetimer. This was wrong for the following reason:
The output occurs after solving the time step and the timer is already incremented. This means that we constructed the well state for gathering the data for the next report step, already. Unfortunately, at that step some wells that we have computed results for might have been shut. In that case an exception with message "global state does not contain well ..." was thrown.
This problem occured for Model number 2 and might have been due to shut wells because of banned cross flow. With this PR Model number 2 runs through again with 2 processes.
With this PR we use the last report step if this is not an initial write and not a substep. In addition we pass the report step (instead of the substep number) to OutputWriter::writeTimeStep as that is needed according to the documentation.