Skip to content

Conversation

@liancheng
Copy link
Contributor

@liancheng liancheng commented Oct 20, 2016

What changes were proposed in this pull request?

This is another try of PR #15517, which aims to solve the exponential slow down of query planning time. It's still a PoC. I'm opening this PR to check whether Jenkins complains.

This PR adds a new method Dataset.cached, which returns a new Dataset with a cached version of the logical plan of the current Dataset, so that we can truncate the cached sub plan tree.

The existing Dataset.cache() method doesn't fit because it mutates inner states of the current Dataset.

The microbenchmark results are basically the same with the one described in #15517.

(0 until 1000).foldLeft(Seq(1, 2, 3).toDS) { (plan, iteration) =>
  val start = System.currentTimeMillis()
  val result = plan.join(plan, "value").join(plan, "value").join(plan, "value").join(plan, "value")

  // Note that we are calling `.cached` instead of `.cache()` here.
  val cached = result.cached
  System.out.println(s"Iteration $iteration takes time ${System.currentTimeMillis() - start} ms")
  cached.as[Int]
}
Iteration 0 takes time 7 ms
Iteration 1 takes time 44 ms
Iteration 2 takes time 35 ms
Iteration 3 takes time 29 ms
Iteration 4 takes time 43 ms
Iteration 5 takes time 28 ms
Iteration 6 takes time 29 ms
Iteration 7 takes time 29 ms
Iteration 8 takes time 28 ms
Iteration 9 takes time 33 ms
...
Iteration 990 takes time 166 ms
Iteration 991 takes time 167 ms
Iteration 992 takes time 174 ms
Iteration 993 takes time 170 ms
Iteration 994 takes time 181 ms
Iteration 995 takes time 173 ms
Iteration 996 takes time 182 ms
Iteration 997 takes time 171 ms
Iteration 998 takes time 180 ms
Iteration 999 takes time 170 ms

How was this patch tested?

N/A. Existing tests should be enough.

@SparkQA
Copy link

SparkQA commented Oct 20, 2016

Test build #67251 has finished for PR 15565 at commit 0fea96b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor Author

Closing this in favor of #15651.

@liancheng liancheng closed this Oct 27, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants