Skip to content

Conversation

@HeartSaVioR
Copy link
Contributor

@HeartSaVioR HeartSaVioR commented Jan 10, 2020

What changes were proposed in this pull request?

This patch addresses adding event filter to handle SQL related events. This patch is next task of SPARK-29779 (#27085), please refer the description of PR #27085 to see overall rationalization of this patch.

Below functionalities will be addressed in later parts:

  • integrate compaction into FsHistoryProvider
  • documentation about new configuration

Why are the changes needed?

One of major goal of SPARK-28594 is to prevent the event logs to become too huge, and SPARK-29779 achieves the goal. We've got another approach in prior, but the old approach required models in both KVStore and live entities to guarantee compatibility, while they're not designed to do so.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added UTs.

@HeartSaVioR
Copy link
Contributor Author

Will rebase once SPARK-29779 is merged. For now, the only effective commit is the last, c72fd1f.

@SparkQA
Copy link

SparkQA commented Jan 10, 2020

Test build #116461 has finished for PR 27164 at commit c72fd1f.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR HeartSaVioR changed the title [WIP][SPARK-30479][SQL] Apply compaction of event log to SQL events [SPARK-30479][SQL] Apply compaction of event log to SQL events Jan 11, 2020
@HeartSaVioR
Copy link
Contributor Author

OK Just rebased this. cc. @vanzin @gaborgsomogyi

@SparkQA
Copy link

SparkQA commented Jan 11, 2020

Test build #116524 has finished for PR 27164 at commit 5f37b64.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented Jan 12, 2020

Test build #116543 has finished for PR 27164 at commit 5f37b64.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor Author

retest this, please

@SparkQA
Copy link

SparkQA commented Jan 13, 2020

Test build #116585 has finished for PR 27164 at commit 5f37b64.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor Author

retest this, please

@SparkQA
Copy link

SparkQA commented Jan 13, 2020

Test build #116605 has finished for PR 27164 at commit 5f37b64.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor Author

retest this, please

@SparkQA
Copy link

SparkQA commented Jan 13, 2020

Test build #116637 has finished for PR 27164 at commit 5f37b64.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor Author

pyspark.mllib.tests.test_streaming_algorithms.StreamingLinearRegressionWithTests.test_train_prediction
pyspark.mllib.tests.test_streaming_algorithms.StreamingLogisticRegressionWithSGDTests.test_parameter_accuracy
pyspark.mllib.tests.test_streaming_algorithms.StreamingLogisticRegressionWithSGDTests.test_training_and_prediction

Not relevant failures.

@HeartSaVioR
Copy link
Contributor Author

retest this, please

@SparkQA
Copy link

SparkQA commented Jan 14, 2020

Test build #116665 has finished for PR 27164 at commit 5f37b64.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 14, 2020

Test build #116680 has finished for PR 27164 at commit 01ece29.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Jan 15, 2020

Looks ok to me. Will leave here until tomorrow to see if anyone comments.

@vanzin
Copy link
Contributor

vanzin commented Jan 15, 2020

Merging to master.

@vanzin vanzin closed this in e751bc6 Jan 15, 2020
@HeartSaVioR
Copy link
Contributor Author

Thanks for reviewing and merging!

@HeartSaVioR HeartSaVioR deleted the SPARK-30479 branch January 15, 2020 21:12
vanzin pushed a commit that referenced this pull request Jan 29, 2020
…Server

### What changes were proposed in this pull request?

This patch addresses remaining functionality on event log compaction: integrate compaction into FsHistoryProvider.

This patch is next task of SPARK-30479 (#27164), please refer the description of PR #27085 to see overall rationalization of this patch.

### Why are the changes needed?

One of major goal of SPARK-28594 is to prevent the event logs to become too huge, and SPARK-29779 achieves the goal. We've got another approach in prior, but the old approach required models in both KVStore and live entities to guarantee compatibility, while they're not designed to do so.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Added UT.

Closes #27208 from HeartSaVioR/SPARK-30481.

Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@apache.org>
private val _stageToRDDs = new mutable.HashMap[Int, Set[Int]]
private val stages = new mutable.HashSet[Int]

def liveSQLExecutions: Set[Long] = _liveExecutionToJobs.keySet.toSet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quick question: is this non-private only because of the testing purpose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Maybe we could reduce down the scope to package private.

* between finished job and live job without relation of SQL execution.
*/
private[spark] class SQLEventFilterBuilder extends SparkListener with EventFilterBuilder {
private val _liveExecutionToJobs = new mutable.HashMap[Long, mutable.Set[Int]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another nit or quick question. Why did you use leading underscore? Seems the name doesn't conflict.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was conflicted with below methods, and I renamed the methods as the purpose of methods changed slightly. Revisiting this now, looks like these fields don't need to have underscore.

_liveJobs: Set[Int],
_liveStages: Set[Int],
_liveTasks: Set[Long],
_liveRDDs: Set[Int])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too. The naming here seems confusing ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're just temporal variables and the only purpose is to pass to the constructor of JobEventFilter. As they don't conflict with parameters on the constructor of JobEventFilter, I wouldn't mind removing underscore.

@HeartSaVioR
Copy link
Contributor Author

Thanks for the post-hoc review. I'm seeing review comments from all of three PRs - let me deal with all of reviews at once.

@HyukjinKwon
Copy link
Member

Thanks @HeartSaVioR!

dongjoon-hyun pushed a commit that referenced this pull request Jan 31, 2020
…ts on post-hoc review

### What changes were proposed in this pull request?

This PR reflects review comments on post-hoc review among PRs for SPARK-29779 (#27085), SPARK-30479 (#27164). The list of review comments this PR addresses are below:

* #27085 (comment)
* #27164 (comment)
* #27164 (comment)
* #27164 (comment)

I also applied review comments to the CORE module (BasicEventFilterBuilder.scala) as well, as the review comments for SQL/core module (SQLEventFilterBuilder.scala) can be applied there as well.

### Why are the changes needed?

There're post-hoc reviews on PRs for such issues, like links in above section.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Existing UTs.

Closes #27414 from HeartSaVioR/SPARK-28869-SPARK-29779-SPARK-30479-FOLLOWUP-posthoc-reviews.

Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants