Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DEFECT] Raven will not produce the required csv if the number of failures description lines exceed 200 #1082

Closed
10 tasks done
Jimmy-INL opened this issue Oct 28, 2019 · 2 comments · Fixed by #1409
Closed
10 tasks done
Labels
defect priority_normal RAVENv2.1 All tasks and defects that will go in RAVEN v2.1

Comments

@Jimmy-INL
Copy link
Collaborator

Jimmy-INL commented Oct 28, 2019


1000 runs of relap are expected to have about 10% of failed runs, this might cause the parser not to find the run complete statement in the last 200 lines which prevents it from proceeding.

Describe the defect
there should be a smarter way of dumping errors, or create a different log.

What did you expect to see happen?

I expected the output csv to be produced.

What did you see instead?

Nothing was produced.

Do you have a suggested fix for the development team?

I removed the 200 lines limit of the search in the RAVEN parser.

Describe how to Reproduce
Steps to reproduce the behavior:
1.
2.
3.
4.

Screenshots and Input Files
Please attach the input file(s) that generate this error. The simpler the input, the faster we can find the issue.

Platform (please complete the following information):

  • OS: [e.g. iOS]
  • Version: [e.g. 22]
  • Dependencies Installation: [CONDA or PIP]

For Change Control Board: Issue Review

This review should occur before any development is performed as a response to this issue.

  • 1. Is it tagged with a type: defect or task?
  • 2. Is it tagged with a priority: critical, normal or minor?
  • 3. If it will impact requirements or requirements tests, is it tagged with requirements?
  • 4. If it is a defect, can it cause wrong results for users? If so an email needs to be sent to the users.
  • 5. Is a rationale provided? (Such as explaining why the improvement is needed or why current code is wrong.)

For Change Control Board: Issue Closure

This review should occur when the issue is imminently going to be closed.

  • 1. If the issue is a defect, is the defect fixed?
  • 2. If the issue is a defect, is the defect tested for in the regression test system? (If not explain why not.)
  • 3. If the issue can impact users, has an email to the users group been written (the email should specify if the defect impacts stable or master)?
  • 4. If the issue is a defect, does it impact the latest release branch? If yes, is there any issue tagged with release (create if needed)?
  • 5. If the issue is being closed without a pull request, has an explanation of why it is being closed been provided?
@PaulTalbot-INL
Copy link
Collaborator

From what I understand, this is the RAVEN code interface issue, where we use the last handful of lines from the code run to determine whether the run was a success or failure; however, especially on the HPC, the error and warning prints at the end of the file might be long enough that we don't detect a successful run.

Potential fixes are to use RAVEN's return code to determine failure/success, and possibly assure that we exit with a meaningful return code depending on how the step finishes. For example, MonteCarlo with missing samples isn't a failure necessarily, while Grid with missing samples is.

@alfoa
Copy link
Collaborator

alfoa commented Jan 21, 2021

Checklist passed.
Closure approved via #1409

@alfoa alfoa added the RAVENv2.1 All tasks and defects that will go in RAVEN v2.1 label Jan 21, 2021
alfoa added a commit that referenced this issue Jan 25, 2021
Closes #1082 Closes #368 Closes #1212 (detecting RAVEN code interface failures)
Closes #1413 (memory debugging tool)
Closes #1414 (expected meta keys from SVL)
Closes #1416 (maxCycles in Interpolated ROMCollection)
Closes #1417 (grad point perturbation distance scaling)
Closes #1418 (nyquist length adjusting for ARMA in ROMCollection)
Introduces and partially addresses #1412 (gradient points can sample outside constraints)
Introduces and partially addresses #1415 (HPC interfaces)
What are the significant changes in functionality due to this change request?
Major Changes

    Allows CodeInterfaces to set whether they print failed runs to screen or not via CodeInterface.printFailedRuns, which defaults to True
    Added "successful run" file creation for RAVEN for failure checking in RAVEN CodeInterface; this replace the "read the last 500 lines of screen output" approach that was used previously
    Memory profiler decorator added for RAVEN run debugging
    Extension of SupervisedLearning to allow "expected meta keys" to be declared
    Constraint handling added to GradientApproximation methods within GradientDescent optimization, assuring samples points are not violating boundary, explicit, and implicit constraints. This has been placed in the FiniteDifference class, and may need reworking or inclusion in other sibling classes to be suitably general.
    Sawtooth qsub command script added. This needs to become a sort of "HPC interface" tool instead of an ad-hoc script manipulation.
    Debugging utility in utils.Debugging added to allow analyzing the size of Python objects.

Minor Changes

    Disables screen printing of failed runs for RAVEN code interface, for clarity (nested inner RAVEN failed runs within the outer RAVEN screen output is very difficult to follow)
    Minor adjustments for RAVEN code interface to set up head and remote nodes for use with RAY
    Many tweaks to JobHandler for working with RAY
    Improved traceback reporting for failures in EnsembleModel configuration, allowing for additional clarity on model failures (especially ExternalModels)
    maxCycles added as option to ROMCollection objects to limit the number of generated "years", mostly for debugging purposes
    Improved modularity for boundary checking in the RavenSampled optimizer
    New gradDistanceScalar option for GradientApproximater class allowing fine control over how far a gradient evaluation point should be from an optimal point.
    initialStepScale added to StepManipulator children to allow user manipulation of the initial step size in relative terms.
    nyquistScalar added to ARMA to allow finite control over which Fourier bases are included in the segments and which are considered global
    Added an error when segmenting a ROMCollection results in unequal segments. This needs some design considerations to decide how to handle successfully.
    Retention of all ROMCollection.Clustered segment ROMs has been disabled, for performance considerations; this also disables random sampling from the clusters to generate new signals. The right solution is probably to generate distributions of characteristics that can then be sampled from to generate representative ROMs on demand.
    CashFlow updated.

Tests Added

    Test for gradient descent optimization attempting to sample outside of boundary/explicit/implicit constraints
    ARMA test added for Interpolated ROMCollection sampling between multiple setpoint years.


Co-authored-by: Dylan McDowell <dylanjm@users.noreply.github.com>
Co-authored-by: alfoa <andrea.alfonsi@inl.gov>
Co-authored-by: Joshua J. Cogliati <joshua.cogliati@inl.gov>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect priority_normal RAVENv2.1 All tasks and defects that will go in RAVEN v2.1
Projects
None yet
3 participants