[SPARK-34309][BUILD][CORE][SQL][K8S]Use Caffeine instead of Guava Cache #31517

LuciferYang · 2021-02-08T05:44:46Z

What changes were proposed in this pull request?

There are 3 ways to use Guava cache in spark code:

Loadingcache is the main way to use Guava cache in spark code and the key usages are as follows:
a. LoadingCache with maximumsize data eviction policy, such as appCache in ApplicationCache, cache in Codegenerator
b. LoadingCache with maximumWeight data eviction policy, such as shuffleIndexCache in ExternalShuffleBlockResolver
c. LoadingCache with 'expireAfterWrite' data eviction policy, such as tableRelationCache in SessionCatalog
ManualCache is another way to use Guava cache in spark code and the key usage is cache in SharedInMemoryCache, it use to caches partition file statuses in memory
The last use way is hadoopJobMetadata in SparkEnv, it uses Guava Cache to build a soft-reference map.

The goal of this pr is use Caffeine instead of Guava Cache because Caffeine is faster than Guava Cache from benchmarks, the main changes as follows:

Add Caffeine deps to maven pom.xml
Use Caffeine instead of Guava LoadingCache, ManualCache and soft-reference map in SparkEnv
Add LocalCacheBenchmark to compare performance of Loadingcache between Guava Cache and Caffeine

Why are the changes needed?

Caffeine is faster than Guava Cache from benchmarks

Does this PR introduce any user-facing change?

No

How was this patch tested?

Pass the Jenkins or GitHub Action
Add LocalCacheBenchmark to compare performance of Loadingcache between Guava Cache and Caffeine

…k into guava-cache-to-caffeine

core/src/main/scala/org/apache/spark/deploy/history/ApplicationCache.scala

core/src/test/scala/org/apache/spark/deploy/history/ApplicationCacheSuite.scala

core/pom.xml

SparkQA · 2021-02-08T08:27:06Z

Test build #135009 has finished for PR 31517 at commit 4761a5b.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-02-08T08:29:40Z

Test build #135006 has finished for PR 31517 at commit 0c5382a.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-02-08T09:17:10Z

Test build #135011 has finished for PR 31517 at commit 4b49b84.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

Ngone51 · 2021-02-08T12:37:01Z

Just out of curiosity, how much performance improvement we can get by using Caffeine? Do we have a rough number?

LuciferYang · 2021-02-09T02:22:57Z

@Ngone51 I have tested some benchmark data in SPARK-34309, but I haven't further tested it with spark scenarios.

Further progress may be delayed, the Spring Festival holiday is coming soon in China, haha :)

dongjoon-hyun · 2021-03-20T04:44:05Z

Gentle ping, @LuciferYang .

SparkQA · 2021-07-29T06:26:26Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46319/

SparkQA · 2021-07-29T07:01:56Z

Test build #141806 has finished for PR 31517 at commit 6c74fc6.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
trait AlterTableColumnCommand extends UnaryCommand
case class AlterTableAddColumns(
case class AlterTableReplaceColumns(

SparkQA · 2021-07-29T09:06:41Z

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46331/

SparkQA · 2021-07-29T11:03:17Z

Test build #141818 has finished for PR 31517 at commit 5a75b2c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-08-02T03:30:53Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46445/

SparkQA · 2021-08-02T04:05:09Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46445/

SparkQA · 2021-08-02T11:06:03Z

Test build #141935 has finished for PR 31517 at commit 706a68d.

This patch fails from timeout after a configured wait of 500m.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-08-04T05:39:01Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46531/

SparkQA · 2021-08-04T06:15:19Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46531/

SparkQA · 2021-08-04T07:38:52Z

Test build #142020 has finished for PR 31517 at commit 81f863f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

holdenk · 2021-08-04T19:03:45Z

Merged to the current branch targeting 3.3.0. Thanks everyone for taking the time to review the PR & special thanks to @LuciferYang for sticking with this for several months.

LuciferYang · 2021-08-05T02:24:48Z

thanks all ~

dongjoon-hyun · 2021-08-05T02:26:47Z

Thank you, @LuciferYang and @holdenk and all!

gatorsmile · 2021-08-11T03:29:09Z

Sorry for my late review.

@LuciferYang Since it adds the new dependences, could you show us the benefits besides the one in the micro-benchmark? Yes, Caffeine is faster than Guava Cache from benchmarks, but does it affect our performance if it is not being used in any hot path? Maybe I am wrong, could you show me an example?

LuciferYang · 2021-08-11T05:10:10Z

Thank you for your review @gatorsmile ~

On the one hand, I think the main purpose of using Guava cache in the code is to solve the problem of hot path access, so I want to recommend a faster library assuming I understand it correctly , there are some other benchmark results in SPARK-34309.

On the other hand, Spark is still using guava 14.0.1, which is a 2013 version and the latest version is 30.1. There are some known issues related to Guava Cache(such as google/guava#1761, google/guava#1337 and so on), and I also found that some JIRA wanted to upgrade the Guava version(like SPARK-33090，SPARK-32502, SPARK-23897, SPARK-25762 and so on) but didn't completed (I'm very sorry I didn't study the reasons for why they weren't completed, maybe it's related to some cascading dependencies?), so I try to use a new Cache library to get rid of the dependence on Guava cache and it's easier to upgrade it to fix some possible issues

Does the current use of Caffeine have any negative effects? If it exists, we can revert this pr.

HyukjinKwon · 2021-08-11T07:37:13Z

The benchmark, etc. here look promising. My only concern is that hidden behaviour changes by switching the library. Do they claim compatibility vs Guava at least?

Does the current use of Caffeine have any negative effects?

So I felt like this had to be investigated first.

LuciferYang · 2021-08-11T08:13:29Z

The benchmark, etc. here look promising. My only concern is that hidden behaviour changes by switching the library. Do they claim compatibility vs Guava at least?

@HyukjinKwon As far as I know, there are some differences in exception handling, for example:

Guava Cache

   * @throws ExecutionException if a checked exception was thrown while loading the value
   * @throws UncheckedExecutionException if an unchecked exception was thrown while loading the
   *     value
   * @throws ExecutionError if an error was thrown while loading the value
   */
  V get(K key) throws ExecutionException;

Caffeine

   * @throws NullPointerException if the specified key is null
   * @throws IllegalStateException if the computation detectably attempts a recursive update to this
   *         cache that would otherwise never complete
   * @throws CompletionException if a checked exception was thrown while loading the value
   * @throws RuntimeException or Error if the {@link CacheLoader} does so, in which case the mapping
   *         is left unestablished
   */
  @Nullable
  V get(@NonNull K key);

But in fact, @ben-manes also provided CaffeinatedGuava to solve these differences, but we didn't use CaffeinatedGuava at @holdenk 's suggestion.

Maybe I don't know enough. @ben-manes Could you help me give more information?

ben-manes · 2021-08-11T09:07:39Z

The choice on what is best for this project is yours, and Guava's is certainly a fine cache. If there is no performance, feature, or stability gain then leaving it as is would be a reasonable, conservative choice. I'll provide some background and project information, but I can't speak to what is best for your project.

History

I coauthored Guava's cache, where Charles and I tried to port algorithmic ideas from my earlier CLHM project into Bob Lee's MapMaker class. This was challenging because Bob made design decisions that I disagreed with, as he was very enthusiastic about soft references as an ideal caching strategy and optimized for that (hence replacing the decorator-based ReferenceMap in Google Collections). While this appears optimal in a microbenchmark, the GC impact made it unusable in practice (e.g. AdWords' SRE team vocally complained). We introduced a concurrent LRU, rewrote expiration to O(1), and designed an API with feedback from Josh Bloch. There were major performance flaws from the original structure that I couldn't fix, but this was in the Java 5 time period (mostly Java 4 idiomatically) so it was still quite good. The API, feature set, and best-in-class performance on low core servers was a step forward. It replaced all of the ad hoc Java caches at Google, which was the main goal for a Guava as a standard internal library. By Java 8 everyone who worked on it had moved on, ConcurrentHashMap was completely rewritten, and the language had evolved. Caffeine is a complete rewrite with a design biased towards size-eviction (not reference), an updated Guava-like API, and no time pressures to allow for exploration of data structures & algorithms.

Compatibility

The Guava adapters are supported by a port of Guava's unit tests, minus assertions that inspect internal state. It passes the behavioral checks as implemented, though the tests are not exhaustive.

Caffeine's test suite is parameterized with test methods declaring their specification constraints. This allows us to brute force runs of every acceptable configuration, which helps detect if features interact badly. This also lets us run the tests against Guava through reverse adapters, thereby providing a reference implementation. In sum the entire test suite runs ~6.8 million scenarios over 1.5 hours on CI.

There are small differences. Some are bugs found in Guava, not all of which are fixed. Others are improvements, such as linearizability and a different eviction policy. The most important changes for migration is discussed on the wiki.

Tests and static analyzers don't catch all mistakes. There has been enough usage to uncover and fix many issues. At this point the Guava team does not actively maintain its cache and recommends Caffeine, which they prefer to use internally. Since I haven't been there since Guava's heyday, the switch means that I had little influence on that.

Recommendations

That's not my place. I can only say that the library tries to be a modern version of Guava's cache, is actively maintained, and enough bugs have been fixed that users have ensured a mature codebase. The design differences overall means that Caffeine is usually much better, but tradeoffs means Guava might be slightly ahead in a few narrow cases. What makes sense for this project and its maintainability is your team's decision.

HyukjinKwon · 2021-08-11T09:43:42Z

Thanks @ben-manes.

zsxwing · 2021-08-12T07:15:30Z

@ben-manes thanks for the context and building for the great project.

Does the current use of Caffeine have any negative effects? If it exists, we can revert this pr.

@LuciferYang Caffeine is definitely an amazing and popular project. But when we decide whether we should add a library , especially a popular library, to Spark, we also need to think about the impact to the downstream libraries and Spark applications. As Caffeine is pretty popular, many downstream libraries and Spark users probably are using it. If Spark adds it to its classpath, it will force them to use the same Caffeine version as Spark. If someone would like to use a different Caffeine version, the work would be non trivial since they need to shade their Caffeine version.

Do you have any examples that we can get Caffeine's benefits from hot path, so that we can use that to make the tradeoff?

LuciferYang · 2021-08-12T08:01:15Z

@LuciferYang Caffeine is definitely an amazing and popular project. But when we decide whether we should add a library , especially a popular library, to Spark, we also need to think about the impact to the downstream libraries and Spark applications. As Caffeine is pretty popular, many downstream libraries and Spark users probably are using it. If Spark adds it to its classpath, it will force them to use the same Caffeine version as Spark. If someone would like to use a different Caffeine version, the work would be non trivial since they need to shade their Caffeine version.

Can we solve this problem by shade + relocation, just like now Spark uses Guava and Jetty now. If we accept it, I can do this work.

Do you have any examples that we can get Caffeine's benefits from hot path, so that we can use that to make the tradeoff?

For example, FileStatusCache on driver side of a resident process(like thrift-server). Of course, the improvement will not be very significant because Guava Cache is not bad.

Overall, I think both are OK. If need to revert the code, please tell me.I am not entangled and opposed to this. I can also be responsible for completing this work :)

I just want to mention it again. As I said above, we are still using Guava 14.0.1, which is a 2013 version. Some fixed about Guava Cache and other Guava components are still hidden in Spark, and I found many JIRA want to upgrade it which seems to have not been completed. And Caffeine is a component that only focuses on cache, so it is relatively easy to upgrade it :)

### What changes were proposed in this pull request? This PR upgrades Caffeine to `2.9.2`. Caffeine was introduced in SPARK-34309 (#31517). At the time that PR was opened, the latest version of caffeine was `2.9.1` but now `2.9.2` is available. ### Why are the changes needed? `2.9.2` have the following improvements (https://github.com/ben-manes/caffeine/releases/tag/v2.9.2). * Fixed reading an intermittent null weak/soft value during a concurrent write * Fixed extraneous eviction when concurrently removing a collected entry after a writer resurrects it with a new mapping * Fixed excessive retries of discarding an expired entry when the fixed duration period is extended, thereby resurrecting it ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CIs. Closes #33772 from sarutak/upgrade-caffeine-2.9.2. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>

…Guava Cache" ### What changes were proposed in this pull request? This pr revert the change of SPARK-34309, includes: - #31517 - #33772 ### Why are the changes needed? 1. No really performance improvement in Spark 2. Added an additional dependency ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass the Jenkins or GitHub Action Closes #33784 from LuciferYang/revert-caffeine. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

LuciferYang added 11 commits February 1, 2021 15:41

use caffeine instead of guava cache

d4e3901

fix compile error

a3d794a

remove some change

85dc7b9

fix exception type

399f5b2

use cache writer to sync mode

dbb46dc

remove listener

01f99fe

add comments

18813e9

fix test suites

3f13689

Merge branch 'upmaster' into guava-cache-to-caffeine

39032e6

Merge branch 'guava-cache-to-caffeine' of github.com:LuciferYang/spar…

674af31

…k into guava-cache-to-caffeine

update deps

5f08d0f

LuciferYang changed the title ~~[SPARK-34309][CORE][SQL] Use Caffeine instead of Guava Cache~~ [WIP][SPARK-34309][CORE][SQL] Use Caffeine instead of Guava Cache Feb 8, 2021

LuciferYang marked this pull request as draft February 8, 2021 05:44

github-actions bot added BUILD CORE KUBERNETES SQL labels Feb 8, 2021

fix unused import

0c5382a

LuciferYang commented Feb 8, 2021

View reviewed changes

core/src/main/scala/org/apache/spark/deploy/history/ApplicationCache.scala Outdated Show resolved Hide resolved

LuciferYang commented Feb 8, 2021

View reviewed changes

core/src/test/scala/org/apache/spark/deploy/history/ApplicationCacheSuite.scala Outdated Show resolved Hide resolved

fix import

4761a5b

pan3793 reviewed Feb 8, 2021

View reviewed changes

core/pom.xml Outdated Show resolved Hide resolved

fix redundant space

4b49b84

LuciferYang changed the title ~~[WIP][SPARK-34309][CORE][SQL] Use Caffeine instead of Guava Cache~~ [WIP][SPARK-34309][BUILD][CORE][SQL] [K8S]Use Caffeine instead of Guava Cache Feb 9, 2021

Merge branch 'upmaster' into guava-cache-to-caffeine

5a75b2c

Merge branch 'upmaster' into guava-cache-to-caffeine

706a68d

Merge branch 'upmaster' into guava-cache-to-caffeine

81f863f

asfgit closed this in 01cf6f4 Aug 4, 2021

LuciferYang deleted the guava-cache-to-caffeine branch August 11, 2021 05:12

sarutak mentioned this pull request Aug 18, 2021

[SPARK-34309][BUILD][FOLLOWUP] Upgrade Caffeine to 2.9.2 #33772

Closed

LuciferYang mentioned this pull request Aug 19, 2021

Revert "[SPARK-34309][BUILD][CORE][SQL][K8S] Use Caffeine instead of Guava Cache" #33784

Closed

[SPARK-34309][BUILD][CORE][SQL][K8S]Use Caffeine instead of Guava Cache #31517

[SPARK-34309][BUILD][CORE][SQL][K8S]Use Caffeine instead of Guava Cache #31517

Uh oh!

Conversation

LuciferYang commented Feb 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SparkQA commented Feb 8, 2021

Uh oh!

SparkQA commented Feb 8, 2021

Uh oh!

SparkQA commented Feb 8, 2021

Uh oh!

Ngone51 commented Feb 8, 2021

Uh oh!

LuciferYang commented Feb 9, 2021

Uh oh!

dongjoon-hyun commented Mar 20, 2021

Uh oh!

SparkQA commented Jul 29, 2021

Uh oh!

SparkQA commented Jul 29, 2021

Uh oh!

SparkQA commented Jul 29, 2021

Uh oh!

SparkQA commented Jul 29, 2021

Uh oh!

SparkQA commented Aug 2, 2021

Uh oh!

SparkQA commented Aug 2, 2021

Uh oh!

SparkQA commented Aug 2, 2021

Uh oh!

SparkQA commented Aug 4, 2021

Uh oh!

SparkQA commented Aug 4, 2021

Uh oh!

SparkQA commented Aug 4, 2021

Uh oh!

holdenk commented Aug 4, 2021

Uh oh!

LuciferYang commented Aug 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented Aug 5, 2021

Uh oh!

gatorsmile commented Aug 11, 2021

Uh oh!

LuciferYang commented Aug 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HyukjinKwon commented Aug 11, 2021

Uh oh!

LuciferYang commented Aug 11, 2021

Uh oh!

ben-manes commented Aug 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

History

Compatibility

Recommendations

Uh oh!

HyukjinKwon commented Aug 11, 2021

Uh oh!

zsxwing commented Aug 12, 2021

Uh oh!

LuciferYang commented Aug 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

LuciferYang commented Feb 8, 2021 •

edited

Loading

LuciferYang commented Aug 5, 2021 •

edited

Loading

LuciferYang commented Aug 11, 2021 •

edited

Loading

ben-manes commented Aug 11, 2021 •

edited

Loading

LuciferYang commented Aug 12, 2021 •

edited

Loading