[DEFECT] Raven will not produce the required csv if the number of failures description lines exceed 200 #1082

Jimmy-INL · 2019-10-28T02:40:25Z

1000 runs of relap are expected to have about 10% of failed runs, this might cause the parser not to find the run complete statement in the last 200 lines which prevents it from proceeding.

Describe the defect
there should be a smarter way of dumping errors, or create a different log.

What did you expect to see happen?

I expected the output csv to be produced.

What did you see instead?

Nothing was produced.

Do you have a suggested fix for the development team?

I removed the 200 lines limit of the search in the RAVEN parser.

Describe how to Reproduce
Steps to reproduce the behavior:
1.
2.
3.
4.

Screenshots and Input Files
Please attach the input file(s) that generate this error. The simpler the input, the faster we can find the issue.

Platform (please complete the following information):

OS: [e.g. iOS]
Version: [e.g. 22]
Dependencies Installation: [CONDA or PIP]

For Change Control Board: Issue Review

This review should occur before any development is performed as a response to this issue.

1. Is it tagged with a type: defect or task?
2. Is it tagged with a priority: critical, normal or minor?
3. If it will impact requirements or requirements tests, is it tagged with requirements?
4. If it is a defect, can it cause wrong results for users? If so an email needs to be sent to the users.
5. Is a rationale provided? (Such as explaining why the improvement is needed or why current code is wrong.)

For Change Control Board: Issue Closure

This review should occur when the issue is imminently going to be closed.

1. If the issue is a defect, is the defect fixed?
2. If the issue is a defect, is the defect tested for in the regression test system? (If not explain why not.)
3. If the issue can impact users, has an email to the users group been written (the email should specify if the defect impacts stable or master)?
4. If the issue is a defect, does it impact the latest release branch? If yes, is there any issue tagged with release (create if needed)?
5. If the issue is being closed without a pull request, has an explanation of why it is being closed been provided?

PaulTalbot-INL · 2020-04-02T14:28:50Z

From what I understand, this is the RAVEN code interface issue, where we use the last handful of lines from the code run to determine whether the run was a success or failure; however, especially on the HPC, the error and warning prints at the end of the file might be long enough that we don't detect a successful run.

Potential fixes are to use RAVEN's return code to determine failure/success, and possibly assure that we exit with a meaningful return code depending on how the step finishes. For example, MonteCarlo with missing samples isn't a failure necessarily, while Grid with missing samples is.

alfoa · 2021-01-21T19:36:52Z

Checklist passed.
Closure approved via #1409

Closes #1082 Closes #368 Closes #1212 (detecting RAVEN code interface failures) Closes #1413 (memory debugging tool) Closes #1414 (expected meta keys from SVL) Closes #1416 (maxCycles in Interpolated ROMCollection) Closes #1417 (grad point perturbation distance scaling) Closes #1418 (nyquist length adjusting for ARMA in ROMCollection) Introduces and partially addresses #1412 (gradient points can sample outside constraints) Introduces and partially addresses #1415 (HPC interfaces) What are the significant changes in functionality due to this change request? Major Changes Allows CodeInterfaces to set whether they print failed runs to screen or not via CodeInterface.printFailedRuns, which defaults to True Added "successful run" file creation for RAVEN for failure checking in RAVEN CodeInterface; this replace the "read the last 500 lines of screen output" approach that was used previously Memory profiler decorator added for RAVEN run debugging Extension of SupervisedLearning to allow "expected meta keys" to be declared Constraint handling added to GradientApproximation methods within GradientDescent optimization, assuring samples points are not violating boundary, explicit, and implicit constraints. This has been placed in the FiniteDifference class, and may need reworking or inclusion in other sibling classes to be suitably general. Sawtooth qsub command script added. This needs to become a sort of "HPC interface" tool instead of an ad-hoc script manipulation. Debugging utility in utils.Debugging added to allow analyzing the size of Python objects. Minor Changes Disables screen printing of failed runs for RAVEN code interface, for clarity (nested inner RAVEN failed runs within the outer RAVEN screen output is very difficult to follow) Minor adjustments for RAVEN code interface to set up head and remote nodes for use with RAY Many tweaks to JobHandler for working with RAY Improved traceback reporting for failures in EnsembleModel configuration, allowing for additional clarity on model failures (especially ExternalModels) maxCycles added as option to ROMCollection objects to limit the number of generated "years", mostly for debugging purposes Improved modularity for boundary checking in the RavenSampled optimizer New gradDistanceScalar option for GradientApproximater class allowing fine control over how far a gradient evaluation point should be from an optimal point. initialStepScale added to StepManipulator children to allow user manipulation of the initial step size in relative terms. nyquistScalar added to ARMA to allow finite control over which Fourier bases are included in the segments and which are considered global Added an error when segmenting a ROMCollection results in unequal segments. This needs some design considerations to decide how to handle successfully. Retention of all ROMCollection.Clustered segment ROMs has been disabled, for performance considerations; this also disables random sampling from the clusters to generate new signals. The right solution is probably to generate distributions of characteristics that can then be sampled from to generate representative ROMs on demand. CashFlow updated. Tests Added Test for gradient descent optimization attempting to sample outside of boundary/explicit/implicit constraints ARMA test added for Interpolated ROMCollection sampling between multiple setpoint years. Co-authored-by: Dylan McDowell <dylanjm@users.noreply.github.com> Co-authored-by: alfoa <andrea.alfonsi@inl.gov> Co-authored-by: Joshua J. Cogliati <joshua.cogliati@inl.gov>

Jimmy-INL added priority_normal defect labels Oct 28, 2019

Jimmy-INL mentioned this issue Apr 13, 2020

[WIP] Looking for run complete in the whole log not only the last 200 lines #1212

Closed

9 tasks

PaulTalbot-INL mentioned this issue Jan 19, 2021

2020-12 HERON milestone changes #1409

Merged

20 tasks

alfoa added the RAVENv2.1 All tasks and defects that will go in RAVEN v2.1 label Jan 21, 2021

alfoa closed this as completed in #1409 Jan 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DEFECT] Raven will not produce the required csv if the number of failures description lines exceed 200 #1082

[DEFECT] Raven will not produce the required csv if the number of failures description lines exceed 200 #1082

Jimmy-INL commented Oct 28, 2019 •

edited by alfoa

Loading

PaulTalbot-INL commented Apr 2, 2020

alfoa commented Jan 21, 2021

[DEFECT] Raven will not produce the required csv if the number of failures description lines exceed 200 #1082

[DEFECT] Raven will not produce the required csv if the number of failures description lines exceed 200 #1082

Comments

Jimmy-INL commented Oct 28, 2019 • edited by alfoa Loading

1000 runs of relap are expected to have about 10% of failed runs, this might cause the parser not to find the run complete statement in the last 200 lines which prevents it from proceeding.

What did you expect to see happen?

What did you see instead?

Do you have a suggested fix for the development team?

For Change Control Board: Issue Review

For Change Control Board: Issue Closure

PaulTalbot-INL commented Apr 2, 2020

alfoa commented Jan 21, 2021

Jimmy-INL commented Oct 28, 2019 •

edited by alfoa

Loading