-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] Enable log rotation for Raylet and GCS server #26121
Conversation
Assigning to @rkooo567 because he looked into this in the past and has the most context. |
Thanks for the PR! It is indeed an important feature (and I've heard some issues regarding the disk usage!) A couple high level comments before deep diving;
|
No problem.
In some cases, logs are printed to stderr or stdout. Cases I can think of are:
Although stderr and stdout logs are uncommon, I do think they are helpful when debugging issues.
This is the default behavior of Ray C++ logging. In the case of multi-node Python tests, I guess log rotation won't work well because multiple processes share the same log filename pattern. However, I haven't read and tested the internal logic of |
After reading the implementation of log rotation in spdlog https://github.com/gabime/spdlog/blob/v1.x/include/spdlog/sinks/rotating_file_sink-inl.h, I don't think multiple processes with the same filename pattern will work well. |
@rkooo567 Any more high-level comments before I update the PR? |
Yeah I think it generally looks good. A couple comments;
I will start the full review soon! |
Also, @wuisawesome @ckw017 would there be any compatibility issue with Loki integration? |
Also for the compatibility (since there are lots of people who will still do cat gcs_server.out), can you write a comment on the top of the .out log file to see .log file instead? |
Promtail/Loki should be fine with log rotations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay. I think the code itself is okay, but it is a pretty big backward incompatible change (e.g., it can easily break anyone who scrapes logs now). Let's actually do the API review here, or is it possible to keep the basic log file as raylet.out & gcs_server.out instead of raylet_[pid].log? Potential alternative file name is
raylet.out -> same as the current raylet_[pid].log
raylet.err -> same as now
and something else that's for redirecting stdout.
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
|
Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message. Please feel free to reopen or open a new issue if you'd still like it to be addressed. Again, you can always ask for help on our discussion forum or Ray's public slack channel. Thanks again for opening the issue! |
@rkooo567
|
Hmm I actually found we explicitly mentioned the file name's backward compatibility is not maintained. So it is probably okay. We still need a doc change (actually I've seen many users reading this log when they debug, and changing the name can break their log scraping depending on how it is implemented). I think we still need an API approval. cc @pcmoritz. |
Sure. |
Signed-off-by: Qing Wang <kingchin1218@gmail.com>
@ray-project/ray-docs for docs owner approval. |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
|
Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message. Please feel free to reopen or open a new issue if you'd still like it to be addressed. Again, you can always ask for help on our discussion forum or Ray's public slack channel. Thanks again for opening the issue! |
Why are these changes needed?
This PR adds log rotation functionality for Raylet and GCS server by setting a non-empty log_dir when invoking
ray::StartRayLog
. This is critical for production environments to avoid logs taking too much disk space.After this PR,
raylet_{pid}.log
/gcs_server_{pid}.log
and potentiallyraylet_{pid}.{number}.log
/gcs_server_{pid}.{number}.log
are created as well as the existing.out
and.err
files. See the sample below.TODO:
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.