-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-3562]Periodic cleanup event logs #2471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
|
i strongly suggest against duplicating functionality that is already provided by the system where these logs are written. however, if you proceed with this, the logic for triggering a clean needs to be improved. first, interrupts to sleep need to be considered. second, clean should occur on startup and the interval, otherwise it may never occur at all. third, if cleaning may be an expensive operation, it more desirable to trigger at a known off peak time / predictable time, instead of X seconds since startup (default every day keyed off startup). if there's special handling that should be done with these log files, i'd suggest a log clean utility that can be triggered by cron. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a.k.a.
Duration(1, TimeUnit.DAYS).toSeconds()
|
I think there's a very unlikely race in your code: it's possible, if things are messed up just right, that the reader thread might try to read a log file that is being deleted by the cleaner thread. I believe that the code will handle that correctly, but it doesn't hurt to check. @mattf don't know what you mean by "functionality that is already provided by the system". I'm not aware of HDFS having any way to automatically do housekeeping of old files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a lot like logCheckingThread. Maybe it makes sense to make an abstract parent thread that both of these extend.
|
Thanks for your options. @vanzin @andrewor14 .i have changed code according your options. |
a system approach means using something like logrotate or a cleaner process that's run from cron. such an approach is beneficial in a number of ways, including reducing the complexity of spark by not duplicating functionality that's already available in spark's environment - akin to using a standard library for i/o instead of interacting w/ devices directly. in this case the context for the environment is the system, where you'll find things like logrotate and cron readily available. as for rotating logs in hdfs - i wouldn't expect hdfs to provide such a feature, because the feature serves a specific use case on top of hdfs. some searching suggests that there are a few solutions available for doing rotation or pruning of files in hdfs and points out that rotating/pruning/cleaning/purging can be done remotely and independently from spark since hdfs is distributed. |
The only thing you can really use system utilities for is cron, which is the least important part of this change. Really, this is not an expensive process that will bring down the HDFS server, and it's scheduled to run at very long intervals. The constant polling for new logs is orders of magnitude more disruptive than this cleanup thread. AFAIK, logrotate doesn't work on HDFS. Now you'd be asking for people to set us the NFS bridge or even fuse-hdfs just to clean up Spark event log files. Finally, Spark theoretically supports Windows. This is a simple way to achieve compatibility with that. And it doesn't require people to set things up outside of their Spark ecosystem, meaning it's easier to maintain. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should use a thread factory so that you can set the thread name and daemon status. See com.google.common.util.concurrent.ThreadFactoryBuilder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, another thing: you should override stop() and shut down this executor cleanly (it's mostly a "best effort" thing, but still).
|
@viper-kun mostly good, just a few minor things left as far as I'm concerned. |
|
@vanzin. is it ok to go? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add $dir to the log message, in case it does not show up in the exception.
|
@viper-kun lgtm, but you'll need to get the attention of a committer. :-) |
|
@vanzin @andrewor14. is it ok to go? |
|
@vanzin @andrewor14 @srowen . is it ok to go? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: you could use Utils.namedThreadFactory() here (just noticed that method the other day).
|
LGTM. Everybody else is kinda busy with releases so I doubt they'll look at this in the next several days... |
|
@vanzin Is this patch ok to merge? |
|
I'm not a committer so I can't merge the patch. But it has merge conflicts now, so that at least needs to be fixed. |
|
@vanzin = =! I got it, sigh~ |
|
I have file a new pr #4214 |
|
@viper-kun could you close this one in that case? thanks! |
link #2391