Skip to content

Conversation

@jinxing64
Copy link

What changes were proposed in this pull request?

This pr doesn't target for merging. It's a measurement for #16867, in which store successful taskIds in successfulTaskIdsSet in TreeSet, thus the time complexity is O(n/2) when get median duration in checkSpeculatableTasks.

@SparkQA
Copy link

SparkQA commented Mar 1, 2017

Test build #73654 has finished for PR 17112 at commit 6825bd7.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jinxing64
Copy link
Author

jinxing64 commented Mar 1, 2017

The unit test "Measurement for SPARK-16929." added is the measurement.
In TaskSetManagerSuite.scala line 1049, if newAlgorithm=true, successfulTaskIdsSet will be used to get the median duration. If newAlgorithm=false, old algorithm(Arrays.sort) will be used.

I calculate the time used for getting median duration in TaskSetManager.scala line 957.
If tasksNum=1000(TaskSetManagerSuite.scala line 1043), I did this test multiple times, results are as below:

newAlgorithm time cost
false 5ms, 3ms, 4ms, 3ms, 3ms
true 2ms, 4ms, 2ms, 2ms, 3ms

if tasksNum=100000:

newAlgorithm time cost
false 107ms, 109ms, 103ms, 100ms, 107ms
true 17ms, 14ms, 14ms, 13ms, 14ms

if tasksNum=150000:

newAlgorithm time cost
false 133ms, 146ms, 127ms, 163ms, 114ms
true 14ms, 13ms, 15ms, 16ms, 14ms

As we can see, new algorithm(TreeSet) has better performance than old algorithm(Arrays.sort). When tasksNum=100000, Arrays.sort costs over 100ms every time, while in new algorithm all below 20ms.

@srowen
Copy link
Member

srowen commented Mar 1, 2017

Put [WIP] in the title for clarit

@jinxing64 jinxing64 changed the title Measurement for SPARK-16929. [WIP] Measurement for SPARK-16929. Mar 2, 2017
@SparkQA
Copy link

SparkQA commented Mar 4, 2017

Test build #73889 has finished for PR 17112 at commit 61b96ff.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

jinxing added 11 commits March 11, 2017 09:10
* SPARK-16929: (178 commits)
  mod
  Refine test.
  scheduleAtFixedRate -> scheduleWithFixedDelay
  Change back to scheduleAtFixedRate
  Change some comment and unit tests.
  scheduleAtFixedRate -> scheduleWithFixedDelay
  Get rid of 'remove' and fix doc in MedianHeap
  [SPARK-16929] Improve performance when check speculatable tasks.
  [SPARK-19891][SS] Await Batch Lock notified on stream execution exit
  [SPARK-19008][SQL] Improve performance of Dataset.map by eliminating boxing/unboxing
  [SPARK-19886] Fix reportDataLoss if statement in SS KafkaSource
  [SPARK-19611][SQL] Introduce configurable table schema inference
  [SPARK-12334][SQL][PYSPARK] Support read from multiple input paths for orc file in DataFrameReader.orc
  [SPARK-19861][SS] watermark should not be a negative time.
  [SPARK-19715][STRUCTURED STREAMING] Option to Strip Paths in FileSource
  [SPARK-19793] Use clock.getTimeMillis when mark task as finished in TaskSetManager.
  [SPARK-19757][CORE] DriverEndpoint#makeOffers race against CoarseGrainedSchedulerBackend#killExecutors
  [SPARK-19561][SQL] add int case handling for TimestampType
  [SPARK-19763][SQL] qualified external datasource table location stored in catalog
  [SPARK-19859][SS][FOLLOW-UP] The new watermark should override the old one.
  ...
@jinxing64 jinxing64 force-pushed the SPARK-16929-measurement branch from 61b96ff to cfc7e33 Compare March 18, 2017 02:08
@SparkQA
Copy link

SparkQA commented Mar 18, 2017

Test build #74765 has finished for PR 17112 at commit cfc7e33.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jinxing64 jinxing64 closed this Apr 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants