-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md #27398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md #27398
Conversation
|
Test build #117556 has finished for PR 27398 at commit
|
|
Test build #117557 has finished for PR 27398 at commit
|
|
Thank you for attaching the screenshot, @HeartSaVioR . |
|
Test build #117623 has finished for PR 27398 at commit
|
docs/monitoring.md
Outdated
|
|
||
| When the compaction happens, History Server lists all the available event log files, and considers the event log files older than | ||
| retained log files as a target of compaction. For example, if the application A has 5 event log files and | ||
| <code>spark.history.fs.eventLog.rolling.maxFilesToRetain</code> is set to 2, first 3 log files will be selected to be compacted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/set to 2, first 3/set to 2, then the first 3/
docs/monitoring.md
Outdated
| retained log files as a target of compaction. For example, if the application A has 5 event log files and | ||
| <code>spark.history.fs.eventLog.rolling.maxFilesToRetain</code> is set to 2, first 3 log files will be selected to be compacted. | ||
|
|
||
| Once it selects the files, it analyzes these files to figure out which events can be excluded, and rewrites these files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No specific recommendation but I have the feeling that files term is a little bit overused here. Maybe some rephrasing would be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to rephrase - maybe we can refer as "target" or "candidates" instead of "files".
docs/monitoring.md
Outdated
| <code>spark.history.fs.eventLog.rolling.maxFilesToRetain</code> is set to 2, first 3 log files will be selected to be compacted. | ||
|
|
||
| Once it selects the files, it analyzes these files to figure out which events can be excluded, and rewrites these files | ||
| into one compact file with discarding some events. Once rewriting is done, original log files will be deleted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
original log files will be deleted
Maybe worth to mention its best effort? Not yet gone through the merged functionality so just asking what will happen such garbage in the next compaction round?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it wouldn't matter for the entire logic as listing event log files would take the "last" compact file, and the right side of event log files. But we don't retry deleting them. I'd agree to worth to mention the deletion is best effort.
docs/monitoring.md
Outdated
| * Events for the executor which is terminated | ||
| * Events for the SQL execution which is finished, and related job/stage/tasks events | ||
|
|
||
| but the details can be changed afterwards. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but the details can be changed afterwards
Maybe not needed to mention this. Until history server is able to read the files in a backward compatible way (and can provide correct UI result) it's not really relevant whether it's changed between versions or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK agreed. Once we change the logic we may just need to change here.
docs/monitoring.md
Outdated
| but the details can be changed afterwards. | ||
|
|
||
| Please note that Spark History Server may not compact the old event log files if figures out not a lot of space | ||
| would be reduced during compaction. For streaming query (including Structured Streaming) we normally expect compaction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet see why including Structured Streaming is needed? That's the main streaming engine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK looks redundant. Thanks!
docs/monitoring.md
Outdated
| will run as each micro-batch will trigger one or more jobs which will be finished shortly, but compaction won't run | ||
| in many cases for batch query. | ||
|
|
||
| Please also note that this is a new feature introduced in Spark 3.0, and may not be completely stable. In some circumstance, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/In some circumstance/Under certain circumstances?
docs/monitoring.md
Outdated
|
|
||
| Please also note that this is a new feature introduced in Spark 3.0, and may not be completely stable. In some circumstance, | ||
| the compaction may exclude more events than you expect, leading some UI issues on History Server for the application. | ||
| Use with caution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Use with caution/Use it with caution/
|
|
||
| but the details can be changed afterwards. | ||
|
|
||
| Please note that Spark History Server may not compact the old event log files if figures out not a lot of space |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This paragraph not really answers to me why not always compact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already described which events are the candidates in above, so it's saying "Please note that Spark History Server may not compact the old event log files if figures out not a lot of space would be reduced during compaction because these event log files majorly fill with running jobs or SQL executions."
Does it answer your question?
docs/monitoring.md
Outdated
|
|
||
| ### Applying compaction of old event log files | ||
|
|
||
| A long-running streaming application can bring a huge single event log file which may cost a lot to maintain and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this works for any application, not just streaming? If so perhaps we change the wording a little to say for large application, and give streaming as example. I used to see long running graph processing applications create huge log files as well.
I also think we should describe what compaction is here up front because if I just read this, I would assume its some sort of compression, but really its throwing away parts of the files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this works for any application, not just streaming?
You're right. I focused too much on the target of compaction which is most likely streaming application, but in this sentence it's not only for streaming.
I also think we should describe what compaction is here up front
Uh, actually we don't have explicit section for rolling event log, hence I feel it's good to explain what's rolling event log first, and what is "compaction". Otherwise maybe good to have individual section for rolling event log?
docs/monitoring.md
Outdated
| logs, via setting the configuration <code>spark.history.fs.eventLog.rolling.maxFilesToRetain</code> on the | ||
| Spark History Server. | ||
|
|
||
| When the compaction happens, History Server lists all the available event log files, and considers the event log files older than |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"older than retained log files" .. This wording is confusing me a bit because you say "older". Below the config is maxFilesToRetain, is not date based, its a number of files. Am I missing something here or should the wording be somethign like like considers the event log files when there are more than spark.history.fs.eventLog.rolling.maxFilesToRetain? Or is the intention that when there are more than spark.history.fs.eventLog.rolling.maxFilesToRetain, then it looks at the oldest of those log files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose "older" because the target event log files are created "before" than the event log files which will be retained - so that's semantically date based. It might be better to explain it with "index" which is clearer, but requires another explanation on "index".
docs/monitoring.md
Outdated
| logs, via setting the configuration <code>spark.history.fs.eventLog.rolling.maxFilesToRetain</code> on the | ||
| Spark History Server. | ||
|
|
||
| When the compaction happens, History Server lists all the available event log files, and considers the event log files older than |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "the History Server"
docs/monitoring.md
Outdated
| * Events for the executor which is terminated | ||
| * Events for the SQL execution which is finished, and related job/stage/tasks events | ||
|
|
||
| but the details can be changed afterwards. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we add in a sentence saying once the events are removed you will no longer be able to view in UI - or if there is anything consequence of this happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the effect is intuitive as we "exclude" events during rewriting, but if explicitly mentioning would make it clearer, let's do it.
|
Thanks for the detailed reviews and sorry to not address review comments so far. There're some other tasks on my plate for now and I couldn't focus on this. As the review comments are not only pointing typo/syntax but also pointing the content as well, I may need to find enough block of time to go through. I'll try to find it in next week and update here. Thanks again! |
|
Test build #118487 has finished for PR 27398 at commit
|
|
@dongjoon-hyun @tgravescs @gaborgsomogyi I've reflected review comments and updated the PR description. As these comments are not only about typos or syntaxes, I guess my reflections may not be enough. Please take a second look and provide suggestions. Thanks in advance! |
841d4d0 to
803663f
Compare
|
Test build #118488 has finished for PR 27398 at commit
|
|
Coming back to this but just re-installing jekyll to test it... |
gaborgsomogyi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from my side, only nits. (After some struggle I've installed the tools on my new machine)
docs/monitoring.md
Outdated
| let you have rolling event log files instead of single huge event log file which may help some scenarios on its own, | ||
| but it still doesn't help you reducing the overall size of logs. | ||
|
|
||
| Spark History Server can apply 'compaction' on the rolling event log files to reduce the overall size of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Not sure what's the intention with this ‘ sign here. Maybe compaction word is not so special that it must be highlighted. This applies more places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's used as emphasizing "what" we are explaining here, but maybe not a big deal. Will remove.
docs/monitoring.md
Outdated
| Once it selects the target, it analyzes them to figure out which events can be excluded, and rewrites them | ||
| into one compact file with discarding events which are decided to exclude. | ||
|
|
||
| The compaction tries to exclude the events which point to the outdated things like jobs, and so on. As of now, below describes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: s/events which point to the outdated things like jobs/events which point to outdated data
|
Test build #118601 has finished for PR 27398 at commit
|
|
@dongjoon-hyun @tgravescs Kindly reminder. |
|
thanks for pinging me will hopefully look later today or tomorrow |
tgravescs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, thanks @HeartSaVioR
|
@dongjoon-hyun |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you all.
Merged to master/3.0.
…section of monitoring.md ### What changes were proposed in this pull request? This is a FOLLOW-UP PR for review comment on #27208 : #27208 (review) This PR documents a new feature `Eventlog Compaction` into the new section of `monitoring.md`, as it only has one configuration on the SHS side and it's hard to explain everything on the description on the single configuration. ### Why are the changes needed? Event log compaction lacks the documentation for what it is and how it helps. This PR will explain it. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Built docs via jekyll. > change on the new section <img width="951" alt="Screen Shot 2020-02-16 at 2 23 18 PM" src="https://user-images.githubusercontent.com/1317309/74599587-eb9efa80-50c7-11ea-942c-f7744268e40b.png"> > change on the table <img width="1126" alt="Screen Shot 2020-01-30 at 5 08 12 PM" src="https://user-images.githubusercontent.com/1317309/73431190-2e9c6680-4383-11ea-8ce0-815f10917ddd.png"> Closes #27398 from HeartSaVioR/SPARK-30481-FOLLOWUP-document-new-feature. Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit 02f8165) Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
|
Thanks all for reviewing and merging! |
…section of monitoring.md ### What changes were proposed in this pull request? This is a FOLLOW-UP PR for review comment on apache#27208 : apache#27208 (review) This PR documents a new feature `Eventlog Compaction` into the new section of `monitoring.md`, as it only has one configuration on the SHS side and it's hard to explain everything on the description on the single configuration. ### Why are the changes needed? Event log compaction lacks the documentation for what it is and how it helps. This PR will explain it. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Built docs via jekyll. > change on the new section <img width="951" alt="Screen Shot 2020-02-16 at 2 23 18 PM" src="https://user-images.githubusercontent.com/1317309/74599587-eb9efa80-50c7-11ea-942c-f7744268e40b.png"> > change on the table <img width="1126" alt="Screen Shot 2020-01-30 at 5 08 12 PM" src="https://user-images.githubusercontent.com/1317309/73431190-2e9c6680-4383-11ea-8ce0-815f10917ddd.png"> Closes apache#27398 from HeartSaVioR/SPARK-30481-FOLLOWUP-document-new-feature. Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
What changes were proposed in this pull request?
This is a FOLLOW-UP PR for review comment on #27208 : #27208 (review)
This PR documents a new feature
Eventlog Compactioninto the new section ofmonitoring.md, as it only has one configuration on the SHS side and it's hard to explain everything on the description on the single configuration.Why are the changes needed?
Event log compaction lacks the documentation for what it is and how it helps. This PR will explain it.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Built docs via jekyll.