-
Notifications
You must be signed in to change notification settings - Fork 7k
[train] update failure policy log message #58274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[train] update failure policy log message #58274
Conversation
Signed-off-by: Matthew Deng <matthew.j.deng@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request successfully improves the readability of the failure policy log message by reformatting it into a multi-line string. The change is clear and effective. I have provided one suggestion to use a single multi-line f-string, which could make the code that generates the message even more readable and idiomatic.
| f"[FailurePolicy] {decision.value}\n" | ||
| f" Source: {error_source}\n" | ||
| f" Error count: {error_count} (max allowed: {retry_limit})\n\n" | ||
| f"{training_failed_error}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For improved readability and to make the multi-line string structure more explicit, consider using a single triple-quoted f-string. This avoids repeating f"" and using explicit newline characters (\n), making the log message template easier to read and maintain.
f"""[FailurePolicy] {decision.value}
Source: {error_source}
Error count: {error_count} (max allowed: {retry_limit})
{training_failed_error}"""## Description Update formatting of FailurePolicy log message to be more readable. ## Additional information **Before:** ``` [FailurePolicy] Decision: FailureDecision.RAISE, Error source: controller, Error count / maximum errors allowed: 1/0, Error: Training failed due to controller error: Worker group is not active. Call WorkerGroup.create() to create a new worker group. ``` **After:** ``` [FailurePolicy] RAISE Source: controller Error count: 1 (max allowed: 0) Training failed due to controller error: Worker group is not active. Call WorkerGroup.create() to create a new worker group. ``` Signed-off-by: Matthew Deng <matthew.j.deng@gmail.com>
## Description Update formatting of FailurePolicy log message to be more readable. ## Additional information **Before:** ``` [FailurePolicy] Decision: FailureDecision.RAISE, Error source: controller, Error count / maximum errors allowed: 1/0, Error: Training failed due to controller error: Worker group is not active. Call WorkerGroup.create() to create a new worker group. ``` **After:** ``` [FailurePolicy] RAISE Source: controller Error count: 1 (max allowed: 0) Training failed due to controller error: Worker group is not active. Call WorkerGroup.create() to create a new worker group. ``` Signed-off-by: Matthew Deng <matthew.j.deng@gmail.com>
## Description Update formatting of FailurePolicy log message to be more readable. ## Additional information **Before:** ``` [FailurePolicy] Decision: FailureDecision.RAISE, Error source: controller, Error count / maximum errors allowed: 1/0, Error: Training failed due to controller error: Worker group is not active. Call WorkerGroup.create() to create a new worker group. ``` **After:** ``` [FailurePolicy] RAISE Source: controller Error count: 1 (max allowed: 0) Training failed due to controller error: Worker group is not active. Call WorkerGroup.create() to create a new worker group. ``` Signed-off-by: Matthew Deng <matthew.j.deng@gmail.com> Signed-off-by: Aydin Abiar <aydin@anyscale.com>
## Description Update formatting of FailurePolicy log message to be more readable. ## Additional information **Before:** ``` [FailurePolicy] Decision: FailureDecision.RAISE, Error source: controller, Error count / maximum errors allowed: 1/0, Error: Training failed due to controller error: Worker group is not active. Call WorkerGroup.create() to create a new worker group. ``` **After:** ``` [FailurePolicy] RAISE Source: controller Error count: 1 (max allowed: 0) Training failed due to controller error: Worker group is not active. Call WorkerGroup.create() to create a new worker group. ``` Signed-off-by: Matthew Deng <matthew.j.deng@gmail.com> Signed-off-by: Future-Outlier <eric901201@gmail.com>
Description
Update formatting of FailurePolicy log message to be more readable.
Additional information
Before:
After: