Skip to content

Commit

Permalink
Updating failure rate message and accounting (facebook#2723)
Browse files Browse the repository at this point in the history
Summary:
Pull Request resolved: facebook#2723

This diff adds a clarification to failure-rate-exceeded errors that "abandoned" trials are also added in the failure rate accounting, which can help users look this up on their own.

In addition the diff changes the denominator of the failure-rate computation to only consider  trials with a `terminal` status.

Differential Revision: D61914570
  • Loading branch information
SebastianAment authored and facebook-github-bot committed Aug 28, 2024
1 parent 373fe81 commit 560a650
Showing 1 changed file with 11 additions and 7 deletions.
18 changes: 11 additions & 7 deletions ax/service/scheduler.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,8 +77,9 @@
"""
FAILURE_EXCEEDED_MSG = (
"Failure rate exceeds the tolerated trial failure rate of {f_rate} (at least "
"{n_failed} out of first {n_ran} trials failed). Checks are triggered both at "
"the end of a optimization and if at least {min_failed} trials have failed."
"{n_failed} out of first {n_ran} trials failed or abandoned). Checks are triggered both at "
"the end of a optimization and if at least {min_failed} trials have either failed, "
"or have been abandoned, potentially automatically due to issues with the trial."
)


Expand Down Expand Up @@ -850,13 +851,16 @@ def error_if_failure_rate_exceeded(self, force_check: bool = False) -> None:
):
return

num_ran_in_scheduler = (
len(self.experiment.trials) - self._num_preexisting_trials
num_ran_in_scheduler = sum(
1
for idx, t in self.experiment.trials.items()
if idx >= self._num_preexisting_trials and t.status.is_terminal
)

failure_rate_exceeded = (
num_bad_in_scheduler / num_ran_in_scheduler
) > self.options.tolerated_trial_failure_rate
(num_bad_in_scheduler / num_ran_in_scheduler)
> self.options.tolerated_trial_failure_rate

)

if failure_rate_exceeded:
if self._num_trials_bad_due_to_err > num_bad_in_scheduler / 2:
Expand Down

0 comments on commit 560a650

Please sign in to comment.