-
Notifications
You must be signed in to change notification settings - Fork 7k
[Data] - Preserve existing handlers for logging - don't use dictConfig
#57169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Data] - Preserve existing handlers for logging - don't use dictConfig
#57169
Conversation
Signed-off-by: Goutam V. <goutam@anyscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively addresses the issue of configuring logging at module import time by deferring it until a ray.data operation is active. The use of a thread-safe, double-checked locking pattern in configure_logging ensures that initialization is performed lazily and idempotently, which is a robust approach. The changes correctly remove the eager configuration from ray/data/__init__.py and introduce the lazy call in register_dataset_logger. I have one suggestion to further improve the implementation by moving environment variable access out of the critical section to minimize lock contention.
Signed-off-by: Goutam <goutam@anyscale.com>
e074bc9 to
7804296
Compare
Signed-off-by: Goutam <goutam@anyscale.com>
Signed-off-by: Goutam <goutam@anyscale.com>
Signed-off-by: Goutam <goutam@anyscale.com>
Signed-off-by: Goutam <goutam@anyscale.com>
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses a critical issue where log handlers were being closed during reconfiguration, potentially causing log messages to be lost. The solution of saving and restoring handler states is a good approach. My main feedback is that the current implementation overlooks the root logger, which could still be affected by the original issue. I've added a specific comment with a suggestion to include the root logger in the preservation logic.
Signed-off-by: Goutam <goutam@anyscale.com>
Signed-off-by: Goutam <goutam@anyscale.com>
omatthew98
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the code is a little unintuitive so maybe try to capture some more info in comments, but overall lgtm.
Signed-off-by: Goutam <goutam@anyscale.com>
Signed-off-by: Goutam <goutam@anyscale.com>
Signed-off-by: Goutam <goutam@anyscale.com>
Signed-off-by: Goutam <goutam@anyscale.com>
Signed-off-by: Goutam <goutam@anyscale.com>
dictConfig
python/ray/data/_internal/logging.py
Outdated
| logger = logging.getLogger(logger_name) | ||
| if not logger.handlers: | ||
| loggers_to_configure[logger_name] = logger_config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment to explain how there could be a logger w/ no handlers
…ig` (ray-project#57169) <!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? So the way the logger works is log() -> buffer -> flush() -> target (file/console/remote_uri) On shutdown (triggered by logging.config.dictConfig), the MemoryHandler is closed which sets the output target to None When the target is set to None, the logs in the buffer have nowhere to be flushed, so they're retained in memory, hence it accumulates to no end. Solution: Temporarily detach the handlers for loggers (preserve attributes like `level`, `target`, `filter` and `formatters`) and restore them after calling `dictConfig()` Based off of https://github.com/ray-project/ray/pull/48958/files#diff-d5e32f872cf7b123f0899714728ac7fc0916a5609844a7d1bf7317d98094778c (Ray Core - Ray Data log fix) ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run pre-commit jobs to lint the changes in this PR. ([pre-commit setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting)) - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Goutam V. <goutam@anyscale.com> Signed-off-by: Goutam <goutam@anyscale.com>
…ig` (ray-project#57169) <!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? So the way the logger works is log() -> buffer -> flush() -> target (file/console/remote_uri) On shutdown (triggered by logging.config.dictConfig), the MemoryHandler is closed which sets the output target to None When the target is set to None, the logs in the buffer have nowhere to be flushed, so they're retained in memory, hence it accumulates to no end. Solution: Temporarily detach the handlers for loggers (preserve attributes like `level`, `target`, `filter` and `formatters`) and restore them after calling `dictConfig()` Based off of https://github.com/ray-project/ray/pull/48958/files#diff-d5e32f872cf7b123f0899714728ac7fc0916a5609844a7d1bf7317d98094778c (Ray Core - Ray Data log fix) ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run pre-commit jobs to lint the changes in this PR. ([pre-commit setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting)) - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Goutam V. <goutam@anyscale.com> Signed-off-by: Goutam <goutam@anyscale.com> Signed-off-by: xgui <xgui@anyscale.com>
…ig` (#57169) <!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? So the way the logger works is log() -> buffer -> flush() -> target (file/console/remote_uri) On shutdown (triggered by logging.config.dictConfig), the MemoryHandler is closed which sets the output target to None When the target is set to None, the logs in the buffer have nowhere to be flushed, so they're retained in memory, hence it accumulates to no end. Solution: Temporarily detach the handlers for loggers (preserve attributes like `level`, `target`, `filter` and `formatters`) and restore them after calling `dictConfig()` Based off of https://github.com/ray-project/ray/pull/48958/files#diff-d5e32f872cf7b123f0899714728ac7fc0916a5609844a7d1bf7317d98094778c (Ray Core - Ray Data log fix) ## Related issue number <!-- For example: "Closes #1234" --> ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run pre-commit jobs to lint the changes in this PR. ([pre-commit setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting)) - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Goutam V. <goutam@anyscale.com> Signed-off-by: Goutam <goutam@anyscale.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
…ig` (ray-project#57169) <!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? So the way the logger works is log() -> buffer -> flush() -> target (file/console/remote_uri) On shutdown (triggered by logging.config.dictConfig), the MemoryHandler is closed which sets the output target to None When the target is set to None, the logs in the buffer have nowhere to be flushed, so they're retained in memory, hence it accumulates to no end. Solution: Temporarily detach the handlers for loggers (preserve attributes like `level`, `target`, `filter` and `formatters`) and restore them after calling `dictConfig()` Based off of https://github.com/ray-project/ray/pull/48958/files#diff-d5e32f872cf7b123f0899714728ac7fc0916a5609844a7d1bf7317d98094778c (Ray Core - Ray Data log fix) ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run pre-commit jobs to lint the changes in this PR. ([pre-commit setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting)) - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Goutam V. <goutam@anyscale.com> Signed-off-by: Goutam <goutam@anyscale.com>
…ig` (ray-project#57169) <!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? So the way the logger works is log() -> buffer -> flush() -> target (file/console/remote_uri) On shutdown (triggered by logging.config.dictConfig), the MemoryHandler is closed which sets the output target to None When the target is set to None, the logs in the buffer have nowhere to be flushed, so they're retained in memory, hence it accumulates to no end. Solution: Temporarily detach the handlers for loggers (preserve attributes like `level`, `target`, `filter` and `formatters`) and restore them after calling `dictConfig()` Based off of https://github.com/ray-project/ray/pull/48958/files#diff-d5e32f872cf7b123f0899714728ac7fc0916a5609844a7d1bf7317d98094778c (Ray Core - Ray Data log fix) ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run pre-commit jobs to lint the changes in this PR. ([pre-commit setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting)) - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Goutam V. <goutam@anyscale.com> Signed-off-by: Goutam <goutam@anyscale.com> Signed-off-by: Aydin Abiar <aydin@anyscale.com>
…ig` (ray-project#57169) <!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? So the way the logger works is log() -> buffer -> flush() -> target (file/console/remote_uri) On shutdown (triggered by logging.config.dictConfig), the MemoryHandler is closed which sets the output target to None When the target is set to None, the logs in the buffer have nowhere to be flushed, so they're retained in memory, hence it accumulates to no end. Solution: Temporarily detach the handlers for loggers (preserve attributes like `level`, `target`, `filter` and `formatters`) and restore them after calling `dictConfig()` Based off of https://github.com/ray-project/ray/pull/48958/files#diff-d5e32f872cf7b123f0899714728ac7fc0916a5609844a7d1bf7317d98094778c (Ray Core - Ray Data log fix) ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run pre-commit jobs to lint the changes in this PR. ([pre-commit setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting)) - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Goutam V. <goutam@anyscale.com> Signed-off-by: Goutam <goutam@anyscale.com> Signed-off-by: Future-Outlier <eric901201@gmail.com>
Why are these changes needed?
So the way the logger works is log() -> buffer -> flush() -> target (file/console/remote_uri)
On shutdown (triggered by logging.config.dictConfig), the MemoryHandler is closed which sets the output target to None
When the target is set to None, the logs in the buffer have nowhere to be flushed, so they're retained in memory, hence it accumulates to no end.
Solution:
Temporarily detach the handlers for loggers (preserve attributes like
level,target,filterandformatters) and restore them after callingdictConfig()Based off of https://github.com/ray-project/ray/pull/48958/files#diff-d5e32f872cf7b123f0899714728ac7fc0916a5609844a7d1bf7317d98094778c (Ray Core - Ray Data log fix)
Related issue number
Checks
git commit -s) in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.