Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling of failed evaluations #154

Merged
merged 46 commits into from
Feb 12, 2024
Merged

Improve handling of failed evaluations #154

merged 46 commits into from
Feb 12, 2024

Conversation

AngelFP
Copy link
Member

@AngelFP AngelFP commented Dec 11, 2023

Addresses #143, #144.

This PR adds a status parameter to Trial, which can be either CANDIDATE, RUNNING, COMPLETED or FAILED. This status is also used to inform Ax of whether a trial has failed, so that it can be properly handled by the surrogate model. Failed trials in the Ax Service generators can be labeled as FAILED or ABANDONED, where the latter implies that the failed trial will not be suggested again. This behavior is controlled by the new parameter abandon_failed_trials (True by default).

With the proposed implementation, trials will be considered as failed if any of these two conditions are met:

  1. LibEnsemble reports that the submitted task has failed. This only applies to the TemplateEvaluator.
  2. The evaluation returns NaN for the value of any of the objectives. This applies to all evaluators.

Case 2 includes the case in which the evaluation or analysis function failed to provide a value of the objective. Previously, this would result in the objective being returned with 0 as value, which could confuse the optimizer.

Changes

  • Implement new TrialStatus class.
  • Add status property to Trial and other related methods.
  • Add trial_status to history.
  • Prefill output array with NaNs.
  • Handle FAILED trials in the Ax generators. By default they are set as ABANDONED so that they are not suggested again. This behavior can be controlled with the abandon_failed_trials argument.
  • Remove unnecessary variables in sim_specs["out"].
  • In the generator, distinguish between completed and evaluated trials. An evaluated trial is one whose evaluation has completed or failed.
  • Apply a workaround to prevent the cwd to change when running with threading comms.
  • Add option to mark trials as failed after completion (addresses Add option to remove evaluations #144).
  • Add new tests.

AngelFP and others added 30 commits November 10, 2023 17:35
@AngelFP AngelFP added the enhancement New feature or request label Dec 12, 2023
@AngelFP AngelFP changed the title [WIP] Improve handling of failed evaluations Improve handling of failed evaluations Dec 15, 2023
@RemiLehe
Copy link
Collaborator

RemiLehe commented Feb 7, 2024

Thanks for this PR @AngelFP. Could you fix the conflicts with the main branch?

@AngelFP
Copy link
Member Author

AngelFP commented Feb 8, 2024

Thanks for this PR @AngelFP. Could you fix the conflicts with the main branch?

Good you noticed that. Conflicts solved :)

# is changed to the exploration directory after the call to `libE`.
# As a workaround, the cwd is stored and then set again at the end of
# `run`.
cwd = os.getcwd()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we bring this up to the libEnsemble team?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I opened a new issue Libensemble/libensemble#1244

@RemiLehe RemiLehe merged commit 48d42fc into main Feb 12, 2024
8 checks passed
@RemiLehe RemiLehe deleted the feature/failed_trials branch February 12, 2024 20:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants