Skip to content

Conversation

@HeartSaVioR
Copy link
Contributor

@HeartSaVioR HeartSaVioR commented Jan 30, 2020

What changes were proposed in this pull request?

This is a FOLLOW-UP PR for review comment on #27208 : #27208 (review)

This PR documents a new feature Eventlog Compaction into the new section of monitoring.md, as it only has one configuration on the SHS side and it's hard to explain everything on the description on the single configuration.

Why are the changes needed?

Event log compaction lacks the documentation for what it is and how it helps. This PR will explain it.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Built docs via jekyll.

change on the new section

Screen Shot 2020-02-16 at 2 23 18 PM

change on the table

Screen Shot 2020-01-30 at 5 08 12 PM

@SparkQA
Copy link

SparkQA commented Jan 30, 2020

Test build #117556 has finished for PR 27398 at commit 49f51ff.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 30, 2020

Test build #117557 has finished for PR 27398 at commit 90a3f82.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor Author

cc. @vanzin @squito @gaborgsomogyi

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-30481][CORE][FOLLOWUP] Document event log compaction into new section of monitoring.md [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md Jan 30, 2020
@dongjoon-hyun
Copy link
Member

Thank you for attaching the screenshot, @HeartSaVioR .

@SparkQA
Copy link

SparkQA commented Jan 31, 2020

Test build #117623 has finished for PR 27398 at commit 3e74e05.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


When the compaction happens, History Server lists all the available event log files, and considers the event log files older than
retained log files as a target of compaction. For example, if the application A has 5 event log files and
<code>spark.history.fs.eventLog.rolling.maxFilesToRetain</code> is set to 2, first 3 log files will be selected to be compacted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/set to 2, first 3/set to 2, then the first 3/

retained log files as a target of compaction. For example, if the application A has 5 event log files and
<code>spark.history.fs.eventLog.rolling.maxFilesToRetain</code> is set to 2, first 3 log files will be selected to be compacted.

Once it selects the files, it analyzes these files to figure out which events can be excluded, and rewrites these files
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No specific recommendation but I have the feeling that files term is a little bit overused here. Maybe some rephrasing would be good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to rephrase - maybe we can refer as "target" or "candidates" instead of "files".

<code>spark.history.fs.eventLog.rolling.maxFilesToRetain</code> is set to 2, first 3 log files will be selected to be compacted.

Once it selects the files, it analyzes these files to figure out which events can be excluded, and rewrites these files
into one compact file with discarding some events. Once rewriting is done, original log files will be deleted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

original log files will be deleted

Maybe worth to mention its best effort? Not yet gone through the merged functionality so just asking what will happen such garbage in the next compaction round?

Copy link
Contributor Author

@HeartSaVioR HeartSaVioR Feb 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it wouldn't matter for the entire logic as listing event log files would take the "last" compact file, and the right side of event log files. But we don't retry deleting them. I'd agree to worth to mention the deletion is best effort.

* Events for the executor which is terminated
* Events for the SQL execution which is finished, and related job/stage/tasks events

but the details can be changed afterwards.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the details can be changed afterwards

Maybe not needed to mention this. Until history server is able to read the files in a backward compatible way (and can provide correct UI result) it's not really relevant whether it's changed between versions or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK agreed. Once we change the logic we may just need to change here.

but the details can be changed afterwards.

Please note that Spark History Server may not compact the old event log files if figures out not a lot of space
would be reduced during compaction. For streaming query (including Structured Streaming) we normally expect compaction
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet see why including Structured Streaming is needed? That's the main streaming engine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK looks redundant. Thanks!

will run as each micro-batch will trigger one or more jobs which will be finished shortly, but compaction won't run
in many cases for batch query.

Please also note that this is a new feature introduced in Spark 3.0, and may not be completely stable. In some circumstance,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/In some circumstance/Under certain circumstances?


Please also note that this is a new feature introduced in Spark 3.0, and may not be completely stable. In some circumstance,
the compaction may exclude more events than you expect, leading some UI issues on History Server for the application.
Use with caution.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Use with caution/Use it with caution/


but the details can be changed afterwards.

Please note that Spark History Server may not compact the old event log files if figures out not a lot of space
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph not really answers to me why not always compact.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already described which events are the candidates in above, so it's saying "Please note that Spark History Server may not compact the old event log files if figures out not a lot of space would be reduced during compaction because these event log files majorly fill with running jobs or SQL executions."

Does it answer your question?


### Applying compaction of old event log files

A long-running streaming application can bring a huge single event log file which may cost a lot to maintain and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this works for any application, not just streaming? If so perhaps we change the wording a little to say for large application, and give streaming as example. I used to see long running graph processing applications create huge log files as well.
I also think we should describe what compaction is here up front because if I just read this, I would assume its some sort of compression, but really its throwing away parts of the files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this works for any application, not just streaming?

You're right. I focused too much on the target of compaction which is most likely streaming application, but in this sentence it's not only for streaming.

I also think we should describe what compaction is here up front

Uh, actually we don't have explicit section for rolling event log, hence I feel it's good to explain what's rolling event log first, and what is "compaction". Otherwise maybe good to have individual section for rolling event log?

logs, via setting the configuration <code>spark.history.fs.eventLog.rolling.maxFilesToRetain</code> on the
Spark History Server.

When the compaction happens, History Server lists all the available event log files, and considers the event log files older than
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"older than retained log files" .. This wording is confusing me a bit because you say "older". Below the config is maxFilesToRetain, is not date based, its a number of files. Am I missing something here or should the wording be somethign like like considers the event log files when there are more than spark.history.fs.eventLog.rolling.maxFilesToRetain? Or is the intention that when there are more than spark.history.fs.eventLog.rolling.maxFilesToRetain, then it looks at the oldest of those log files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose "older" because the target event log files are created "before" than the event log files which will be retained - so that's semantically date based. It might be better to explain it with "index" which is clearer, but requires another explanation on "index".

logs, via setting the configuration <code>spark.history.fs.eventLog.rolling.maxFilesToRetain</code> on the
Spark History Server.

When the compaction happens, History Server lists all the available event log files, and considers the event log files older than
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "the History Server"

* Events for the executor which is terminated
* Events for the SQL execution which is finished, and related job/stage/tasks events

but the details can be changed afterwards.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add in a sentence saying once the events are removed you will no longer be able to view in UI - or if there is anything consequence of this happening.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the effect is intuitive as we "exclude" events during rewriting, but if explicitly mentioning would make it clearer, let's do it.

@HeartSaVioR
Copy link
Contributor Author

Thanks for the detailed reviews and sorry to not address review comments so far. There're some other tasks on my plate for now and I couldn't focus on this.

As the review comments are not only pointing typo/syntax but also pointing the content as well, I may need to find enough block of time to go through. I'll try to find it in next week and update here. Thanks again!

@SparkQA
Copy link

SparkQA commented Feb 16, 2020

Test build #118487 has finished for PR 27398 at commit 841d4d0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor Author

@dongjoon-hyun @tgravescs @gaborgsomogyi

I've reflected review comments and updated the PR description. As these comments are not only about typos or syntaxes, I guess my reflections may not be enough. Please take a second look and provide suggestions. Thanks in advance!

@HeartSaVioR HeartSaVioR force-pushed the SPARK-30481-FOLLOWUP-document-new-feature branch from 841d4d0 to 803663f Compare February 16, 2020 05:29
@SparkQA
Copy link

SparkQA commented Feb 16, 2020

Test build #118488 has finished for PR 27398 at commit 803663f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gaborgsomogyi
Copy link
Contributor

Coming back to this but just re-installing jekyll to test it...

Copy link
Contributor

@gaborgsomogyi gaborgsomogyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from my side, only nits. (After some struggle I've installed the tools on my new machine)

let you have rolling event log files instead of single huge event log file which may help some scenarios on its own,
but it still doesn't help you reducing the overall size of logs.

Spark History Server can apply 'compaction' on the rolling event log files to reduce the overall size of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Not sure what's the intention with this sign here. Maybe compaction word is not so special that it must be highlighted. This applies more places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's used as emphasizing "what" we are explaining here, but maybe not a big deal. Will remove.

Once it selects the target, it analyzes them to figure out which events can be excluded, and rewrites them
into one compact file with discarding events which are decided to exclude.

The compaction tries to exclude the events which point to the outdated things like jobs, and so on. As of now, below describes
Copy link
Contributor

@gaborgsomogyi gaborgsomogyi Feb 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: s/events which point to the outdated things like jobs/events which point to outdated data

@SparkQA
Copy link

SparkQA commented Feb 17, 2020

Test build #118601 has finished for PR 27398 at commit a95a4a4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor Author

@dongjoon-hyun @tgravescs Kindly reminder.

@tgravescs
Copy link
Contributor

thanks for pinging me will hopefully look later today or tomorrow

Copy link
Contributor

@tgravescs tgravescs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, thanks @HeartSaVioR

@HeartSaVioR
Copy link
Contributor Author

@dongjoon-hyun
Could you please help finalizing the review? I guess it might not be possible to get this reviewed by @vanzin in time. I'd be happy to follow your comment on lowering warn if you still think it's too aggressive.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you all.
Merged to master/3.0.

dongjoon-hyun pushed a commit that referenced this pull request Feb 25, 2020
…section of monitoring.md

### What changes were proposed in this pull request?

This is a FOLLOW-UP PR for review comment on #27208 : #27208 (review)

This PR documents a new feature `Eventlog Compaction` into the new section of `monitoring.md`, as it only has one configuration on the SHS side and it's hard to explain everything on the description on the single configuration.

### Why are the changes needed?

Event log compaction lacks the documentation for what it is and how it helps. This PR will explain it.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Built docs via jekyll.

> change on the new section

<img width="951" alt="Screen Shot 2020-02-16 at 2 23 18 PM" src="https://user-images.githubusercontent.com/1317309/74599587-eb9efa80-50c7-11ea-942c-f7744268e40b.png">

> change on the table

<img width="1126" alt="Screen Shot 2020-01-30 at 5 08 12 PM" src="https://user-images.githubusercontent.com/1317309/73431190-2e9c6680-4383-11ea-8ce0-815f10917ddd.png">

Closes #27398 from HeartSaVioR/SPARK-30481-FOLLOWUP-document-new-feature.

Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit 02f8165)
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
@HeartSaVioR
Copy link
Contributor Author

Thanks all for reviewing and merging!

@HeartSaVioR HeartSaVioR deleted the SPARK-30481-FOLLOWUP-document-new-feature branch February 26, 2020 00:14
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
…section of monitoring.md

### What changes were proposed in this pull request?

This is a FOLLOW-UP PR for review comment on apache#27208 : apache#27208 (review)

This PR documents a new feature `Eventlog Compaction` into the new section of `monitoring.md`, as it only has one configuration on the SHS side and it's hard to explain everything on the description on the single configuration.

### Why are the changes needed?

Event log compaction lacks the documentation for what it is and how it helps. This PR will explain it.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Built docs via jekyll.

> change on the new section

<img width="951" alt="Screen Shot 2020-02-16 at 2 23 18 PM" src="https://user-images.githubusercontent.com/1317309/74599587-eb9efa80-50c7-11ea-942c-f7744268e40b.png">

> change on the table

<img width="1126" alt="Screen Shot 2020-01-30 at 5 08 12 PM" src="https://user-images.githubusercontent.com/1317309/73431190-2e9c6680-4383-11ea-8ce0-815f10917ddd.png">

Closes apache#27398 from HeartSaVioR/SPARK-30481-FOLLOWUP-document-new-feature.

Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants