Skip to content

Conversation

@alkis
Copy link
Contributor

@alkis alkis commented Feb 22, 2023

What changes were proposed in this pull request?

Reimplement PercentileHeap such that:

  • the percentile value is always in the topHeap, this speeds up percentile access
  • rebalance the heaps more efficiently by checking which heap should grow due to the new insertion and doing a rebalance based on target heap sizes
  • the heaps are java PriorityQueue's without comparators. Comparator call overhead slows down poll/offer by more than 2x. Instead implement a max-heap by poll/offer on the negated domain of numbers.

Why are the changes needed?

PercentileHeap is heavy weight enough to cause scheduling delays if inserted inside the scheduler loop.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added more extensive unittests.

@github-actions github-actions bot added the CORE label Feb 22, 2023
@alkis alkis force-pushed the faster-percentile-heap branch from 993a724 to 3ebd388 Compare February 22, 2023 10:44
alkis and others added 2 commits February 22, 2023 12:41
…eHeap.scala

Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
@mridulm
Copy link
Contributor

mridulm commented Feb 22, 2023

Can we minimize diff's to this file ? A large fraction is whitespace changes and due to the renames ... will take a look at the changes as well.

Also given this is an optimization change - include benchmark to quantify the impact ?

*/
def percentile(): Double = {
if (isEmpty) throw new NoSuchElementException("empty")
topHeap.head
Copy link
Contributor

@mridulm mridulm Feb 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When used as a median heap (which is what PercentileHeap replaced), the expectation is to return either the middle element (when size is odd) or the avg of numbers around middle.

We are changing that behavior here.

Copy link
Contributor Author

@alkis alkis Feb 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't matter in practice. This is used for median heap most of the time and the values around the median are typically very close.

In theory, I believe both approaches are used (either average of 2 medians when even or median of odd) so in principle this is correct as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When making performance related changes, let us avoid behavior change.
If we want to make a behavior change, that can be a follow up item - and discussed in its own merits.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Note that doing the average of the middle elements is 10% slower than not doing the average.

*/
private[this] val smallerHalf = PriorityQueue.empty[Double](ord)
private[this] val topHeap = PriorityQueue.empty[Double](Ordering[Double].reverse)
private[this] val botHeap = PriorityQueue.empty[Double](Ordering[Double])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can keep calling them smallerHalf and largerHalf

Copy link
Contributor Author

@alkis alkis Feb 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reason I renamed it to top/bot is because they are justified antonyms (same length) which makes the code easier to read. smaller/larger do not justify.

I could rename to small/large if you feel strong about it.

That said since the class is practically reimplemented could I exercise the right of rewriting it to choose the name? :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @cloud-fan, let us keep the variable names the same - the meaning is effectively similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the change to smallHeap and largeHeap even though I disagree. I will leave this here for the record.

  • the old names do not match the implementation. They made sense when this was a median heap because they talk about halves (smallerHalf and largerHalf) but since this was made into a percentile heap we no longer have halves.
  • small/large is confusing. When one reads smallHeap or smallerHeap she does not know if the heap is small (as in has few elements) or if the heap contains small numbers. On the other hand botHeap or bottomHeap is unambiguous. It is the heap with the small numbers.

@alkis
Copy link
Contributor Author

alkis commented Feb 23, 2023

Also given this is an optimization change - include benchmark to quantify the impact ?

I did benchmarking live in a cluster. Profiles before show ~1% of scheduler time in PercentileHeap operations. Profiles after do not have PercentileHeap operations at all.

@alkis
Copy link
Contributor Author

alkis commented Feb 23, 2023

Can we minimize diff's to this file ? A large fraction is whitespace changes and due to the renames ... will take a look at the changes as well.

Can you treat is a new implementation? There is only 15 lines (L55-L70) that matter on the new implementation - the code inside insert. Outside of insert there is nothing meaty or interesting.

@mridulm
Copy link
Contributor

mridulm commented Feb 23, 2023

I did benchmarking live in a cluster. Profiles before show ~1% of scheduler time in PercentileHeap operations. Profiles after do not have PercentileHeap operations at all.

Can you add a benchmark in the PR ? With results for best and after in description or as comment ? Thanks !
For example, take a look at core/src/test/scala/org/apache/spark/MapStatusesSerDeserBenchmark.scala

@alkis
Copy link
Contributor Author

alkis commented Feb 23, 2023

I ran this benchmark offline:

  test("benchmark") {
    val input: Seq[Int] = 0 until 1000
    val numRuns = 1000

    def kernel(): Long = {
      val shuffled = Random.shuffle(input).toArray
      val start = System.nanoTime()
      val h = new PercentileHeap(0.95)
      shuffled.foreach { x =>
        h.insert(x)
        for (_ <- 0 until input.length) h.percentile
      }
      System.nanoTime() - start
    }
    for (_ <- 0 until numRuns) kernel()  // warmup

    var elapsed: Long = 0
    for (_ <- 0 until numRuns) elapsed += kernel()
    val perOp = elapsed / (numRuns * input.length)
    println(s"$perOp ns per op on heaps of size ${input.length}")
  }

Results:

    BEFORE 3886 ns per op on heaps of size 1000
    AFTER  1703 ns per op on heaps of size 1000 (with scala PriorityQueue) 
    AFTER    36 ns per op on heaps of size 1000 (with java PriorityQueue)

(yes 100x improvement is not a typo)

I left this test in the PR instead of a full blown benchmark.

@alkis alkis force-pushed the faster-percentile-heap branch from 1800926 to 59b4a01 Compare February 23, 2023 09:59
@alkis
Copy link
Contributor Author

alkis commented Feb 23, 2023

I updated the implementation and the description. TLDR I use a comparator-less java PriorityQueue now for a total of 100x speedup over the original implementation.

@mridulm good call on the benchmark, in my internal tests I had a handrolled heap implementation that was even faster than the java one. If not for the benchmark I wouldn't have noticed that Scala's priority queue is so bad vs Java's.

@HyukjinKwon HyukjinKwon changed the title [SPARK-42528] Optimize PercentileHeap [SPARK-42528][CORE] Optimize PercentileHeap Feb 24, 2023
Copy link
Contributor

@mridulm mridulm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @alkis !

@cloud-fan
Copy link
Contributor

The failed HealthTrackerIntegrationSuite is definitely unrelated, I'm merging it to master, thanks!

@cloud-fan cloud-fan closed this in 0b8234d Feb 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants