-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DEFECT] Raven will not produce the required csv if the number of failures description lines exceed 200 #1082
Comments
From what I understand, this is the RAVEN code interface issue, where we use the last handful of lines from the code run to determine whether the run was a success or failure; however, especially on the HPC, the error and warning prints at the end of the file might be long enough that we don't detect a successful run. Potential fixes are to use RAVEN's return code to determine failure/success, and possibly assure that we exit with a meaningful return code depending on how the step finishes. For example, MonteCarlo with missing samples isn't a failure necessarily, while Grid with missing samples is. |
Checklist passed. |
Closes #1082 Closes #368 Closes #1212 (detecting RAVEN code interface failures) Closes #1413 (memory debugging tool) Closes #1414 (expected meta keys from SVL) Closes #1416 (maxCycles in Interpolated ROMCollection) Closes #1417 (grad point perturbation distance scaling) Closes #1418 (nyquist length adjusting for ARMA in ROMCollection) Introduces and partially addresses #1412 (gradient points can sample outside constraints) Introduces and partially addresses #1415 (HPC interfaces) What are the significant changes in functionality due to this change request? Major Changes Allows CodeInterfaces to set whether they print failed runs to screen or not via CodeInterface.printFailedRuns, which defaults to True Added "successful run" file creation for RAVEN for failure checking in RAVEN CodeInterface; this replace the "read the last 500 lines of screen output" approach that was used previously Memory profiler decorator added for RAVEN run debugging Extension of SupervisedLearning to allow "expected meta keys" to be declared Constraint handling added to GradientApproximation methods within GradientDescent optimization, assuring samples points are not violating boundary, explicit, and implicit constraints. This has been placed in the FiniteDifference class, and may need reworking or inclusion in other sibling classes to be suitably general. Sawtooth qsub command script added. This needs to become a sort of "HPC interface" tool instead of an ad-hoc script manipulation. Debugging utility in utils.Debugging added to allow analyzing the size of Python objects. Minor Changes Disables screen printing of failed runs for RAVEN code interface, for clarity (nested inner RAVEN failed runs within the outer RAVEN screen output is very difficult to follow) Minor adjustments for RAVEN code interface to set up head and remote nodes for use with RAY Many tweaks to JobHandler for working with RAY Improved traceback reporting for failures in EnsembleModel configuration, allowing for additional clarity on model failures (especially ExternalModels) maxCycles added as option to ROMCollection objects to limit the number of generated "years", mostly for debugging purposes Improved modularity for boundary checking in the RavenSampled optimizer New gradDistanceScalar option for GradientApproximater class allowing fine control over how far a gradient evaluation point should be from an optimal point. initialStepScale added to StepManipulator children to allow user manipulation of the initial step size in relative terms. nyquistScalar added to ARMA to allow finite control over which Fourier bases are included in the segments and which are considered global Added an error when segmenting a ROMCollection results in unequal segments. This needs some design considerations to decide how to handle successfully. Retention of all ROMCollection.Clustered segment ROMs has been disabled, for performance considerations; this also disables random sampling from the clusters to generate new signals. The right solution is probably to generate distributions of characteristics that can then be sampled from to generate representative ROMs on demand. CashFlow updated. Tests Added Test for gradient descent optimization attempting to sample outside of boundary/explicit/implicit constraints ARMA test added for Interpolated ROMCollection sampling between multiple setpoint years. Co-authored-by: Dylan McDowell <dylanjm@users.noreply.github.com> Co-authored-by: alfoa <andrea.alfonsi@inl.gov> Co-authored-by: Joshua J. Cogliati <joshua.cogliati@inl.gov>
1000 runs of relap are expected to have about 10% of failed runs, this might cause the parser not to find the run complete statement in the last 200 lines which prevents it from proceeding.
Describe the defect
there should be a smarter way of dumping errors, or create a different log.
What did you expect to see happen?
I expected the output csv to be produced.
What did you see instead?
Nothing was produced.
Do you have a suggested fix for the development team?
I removed the 200 lines limit of the search in the RAVEN parser.
Describe how to Reproduce
Steps to reproduce the behavior:
1.
2.
3.
4.
Screenshots and Input Files
Please attach the input file(s) that generate this error. The simpler the input, the faster we can find the issue.
Platform (please complete the following information):
For Change Control Board: Issue Review
This review should occur before any development is performed as a response to this issue.
For Change Control Board: Issue Closure
This review should occur when the issue is imminently going to be closed.
The text was updated successfully, but these errors were encountered: