-
-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make running trials in timeline plot visible #731
Make running trials in timeline plot visible #731
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #731 +/- ##
==========================================
+ Coverage 67.20% 68.22% +1.02%
==========================================
Files 35 35
Lines 2293 2329 +36
==========================================
+ Hits 1541 1589 +48
+ Misses 752 740 -12 ☔ View full report in Codecov by Sentry. |
The last version could get maxRunDuration of zero when we have waiting or failed trials without any complete or pruned trials. It leads to false in the judge of hasRunning. However, in this case, we should be able to say that we have running trials. I fixed this issue in this commit.
b8f9ae5
to
895e327
Compare
@keisuke-umezawa Could you review this PR? |
I checked the behavior with the following (checking the dashboard while the following is running to check the green bar grows over time): import time
import optuna
def objective(trial: optuna.Trial) -> float:
x = trial.suggest_float("x", -5, 5)
time.sleep(30)
return x
if __name__ == "__main__":
study = optuna.create_study(storage="sqlite:///demo.db")
study.optimize(objective, n_trials=100) Another check (check dashboard after the run finishes so that we can see that the green bar does not squash the other bars after a while): def create_study_with_various_trials(storage: str) -> None:
rng = np.random.RandomState(42)
def _objective_intermediate_values(trial: optuna.Trial) -> float:
x = trial.suggest_categorical("x", choices=["a", "b", "c"])
rnd = rng.random()
if rnd < 0.25:
trial.report(trial.number, step=0)
trial.report(trial.number + 1, step=1)
elif rnd < 0.5:
trial.report(trial.number, step=0)
raise optuna.TrialPruned()
elif rnd < 0.75:
raise optuna.TrialPruned()
else:
raise ValueError("Unexpected Error.")
return 0.0
study = optuna.create_study(study_name="various-trials", storage=storage)
study.optimize(_objective_intermediate_values, n_trials=40, catch=(Exception,))
study.enqueue_trial({"x": "a"})
study.ask({"x": CategoricalDistribution(["a", "b", "c"])})
if __name__ == "__main__":
create_study_with_various_trials(storage="sqlite:///demo.db") |
const start = t.datetime_start?.getTime() ?? new Date().getTime() | ||
const now = new Date().getTime() | ||
// This is an ad-hoc handling to check if the trial is running. | ||
return now - start < maxRunDuration * 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[question] Why do we need those lines for ad-hoc handling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me, it can be just simply with const hasRunning = trials.some((t) => t.state === runningKey)
, but do you have some reasons that you need to write those lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove it as well, but then the problem is that the timeline for RUNNING
trials, which was not somehow killed properly (it is basically preemption kill, which often happens in our internal cluster) then the timeline plot for the other trials will not be visible because they will be squashed by the running trials, which are not really running because of preemption kill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left come questions and comments. Could you check them?
const completes = bars.map((b, i) => b.datetime_complete ?? starts[i]) | ||
const runDurations = bars.map((b) => { | ||
const start = b.datetime_start?.getTime() ?? new Date().getTime() | ||
const complete = b.datetime_complete?.getTime() ?? start |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this since we can use the current max time for running trials?
const complete = b.datetime_complete?.getTime() ?? maxDatetime.getTime()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
start
is necessary for waiting trials.
return Math.max( | ||
1, | ||
!isRunning ? complete - start : maxDatetime.getTime() - start | ||
) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[question] Why do we need to get max here? If we fix the code as I said in the above, those lines can be just complete - start
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me, it seems that we can use the following lines:
const runDurations = bars.map((b, i) => {
const start = starts[i].getTime()
const complete = b.datetime_complete?.getTime() ?? maxDatetime.getTime()
return complete - start
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lower bound 1
is necessary to be able to recognize at least bars exist in case runDurations=0
.
Otherwise, I think your code is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, actually, const complete = b.datetime_complete?.getTime() ?? maxDatetime.getTime()
is incorrect when we think of trialState===WAITING
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I understand that you want to show waiting
and in some cases also running
trials.
In the optuna, it uses the current time for complete if a trial does not have it when it is waiting
or running
.
ref: https://github.com/optuna/optuna/blob/master/optuna/visualization/_timeline.py#L85-L86 Are you planning to make back port to optuna itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did the backport here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I wrongly added approved. I changed it to ng 🙇
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on it!
I just add a nits comment. If you fix it, you can merge it.
Btw, it makes some differences with optuna itself, and it is recommended to make back port to optuna.
return Math.max( | ||
1, | ||
!isRunning ? complete - start : maxDatetime.getTime() - start | ||
) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I understand that you want to show waiting
and in some cases also running
trials.
In the optuna, it uses the current time for complete if a trial does not have it when it is waiting
or running
.
ref: https://github.com/optuna/optuna/blob/master/optuna/visualization/_timeline.py#L85-L86 Are you planning to make back port to optuna itself?
@keisuke-umezawa I made a backport to Optuna! Are there any other things I should do? |
@nabenabe0928 |
Contributor License Agreement
This repository (
optuna-dashboard
) and Goptuna share common code.This pull request may therefore be ported to Goptuna.
Make sure that you understand the consequences concerning licenses and check the box below if you accept the term before creating this pull request.
Reference Issues/PRs
This PR makes the running trials in the timeline plot visible.
Currently, running trials are handled as a run duration of zero.
For this reason, we cannot see running trials in the timeline plot although it actually exists.
In this PR, I aim to make them appear in the timeline plot.
Plus, I revised the code so that trials, which were killed before they completed, will handle the max time in the plot as the max completed/pruned/failed timing.
What does this implement/fix? Explain your changes.
This PR makes the following changes: