Use soft values in the EntityCache; approximate entity size #490

eric-maynard · 2024-11-27T09:47:14Z

Description

The EntityCache currently uses a hard-coded limit of 100k entities, which is apparently intended to keep the cache around 100MB based on the assumption that entities are ~1KB.

If that assumption is not true, or if there is not even 100MB of free memory, we should not let the cache use up too much memory.

This PR adjusts the cache to use soft values to allow GC to clean up entries when memory pressure becomes too great.

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
Documentation update
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

Added a new test in EntityCacheTest

Checklist:

Please delete options that are not relevant.

I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
My changes generate no new warnings

snazy · 2024-11-27T19:38:24Z

polaris-core/src/main/java/org/apache/polaris/core/persistence/cache/EntityWeigher.java

+        + value.getEntity().getProperties().length()
+        + value.getEntity().getInternalProperties().length();


The assumption here is wrong. There is no guarantee that one character is only one byte, even with "compact strings" - it's likely off by factor 2.

It's an approximate size. Even if it's off by a factor of 10, that should be okay. A 1GB cache is greatly preferable to an unbounded cache.

snazy · 2024-11-27T19:39:04Z

polaris-core/src/main/java/org/apache/polaris/core/persistence/cache/EntityWeigher.java

+  public static final long WEIGHT_PER_MB = 1024 * 1024;
+
+  /* Represents the approximate size of an entity beyond the properties */
+  private static final int APPROXIMATE_ENTITY_OVERHEAD = 1000;


Where does the value 1000 come from?

See this old comment -- the apparent assumption was that entities are generally ~1KB.

This means in the worst case, we will not get a larger number of entries due to this change. It should strictly be smaller.

snazy · 2024-11-27T19:39:37Z

...e/src/testFixtures/java/org/apache/polaris/core/persistence/PolarisTestMetaStoreManager.java

@@ -828,7 +828,7 @@ void dropEntity(List<PolarisEntityCore> catalogPath, PolarisEntityCore entityToD
  }

  /** Grant a privilege to a catalog role */
-  void grantPrivilege(
+  public void grantPrivilege(


The changes here look unrelated?

They are not; the EntityCacheTest was moved into a new package and so these methods were no longer callable from it if they remained package-private

snazy · 2024-11-27T19:43:13Z

polaris-core/src/main/java/org/apache/polaris/core/persistence/cache/EntityCache.java

            .expireAfterAccess(1, TimeUnit.HOURS) // Expire entries after 1 hour of no access
            .removalListener(removalListener) // Set the removal listener
+            .softValues() // Account for memory pressure


Either there's a weigher or there are soft-references, the latter is IMHO a bad choice, because it can likely ruin the efficiency of the eviction.

The docs seem to say you can do both, WDYM?

As for soft values being a bad choice, in my experience this is quite a common practice and the degradation we're likely to see is far less than the degradation we'd see in a situation where heap space gets exhausted and we do not have soft values.

Assuming there is not significant memory pressure, my expectation is that GC should more or less ignore the SoftReference objects.

If there's a specific performance concern that can be borne out in a benchmark, we should address it.

snazy · 2024-11-27T19:45:39Z

polaris-core/src/main/java/org/apache/polaris/core/persistence/cache/EntityCache.java

    this.byId =
        Caffeine.newBuilder()
-            .maximumSize(100_000) // Set maximum size to 100,000 elements
+            .maximumWeight(100 * EntityWeigher.WEIGHT_PER_MB) // Goal is ~100MB


While we're here - why is this always hard coded and not configurable? Or at least determined based on the actual heap size (which is not as easy as it sounds)?

This is a good callout. My concern with making it configurable is that the limit is close to arbitrary.

As you noted in another comment, we have a "goal" of 100MB but that could really be 200MB or 1MB depending on how eviction happens with soft values and the weights. So while raising the limit definitely gets you a bigger cache, it's quite opaque as to what value you would actually want to set.

Also, the EntityCache is ideally very transparent and not something a typical end user would want to concern themselves with.

In light of the above I kept it hard-coded for now but if you feel strongly that we should make it a featureConfiguration I am comfortable doing so

snazy · 2024-12-04T08:23:56Z

Let's defer this change until the bigger CDI refactoring and potential follow-ups have been done.

eric-maynard · 2024-12-04T08:39:49Z

To me, this seems quite unrelated to the CDI refactoring. Clearly, we cannot pause all development on the project due to a potential conflict with another existing PR.

If there's an especially nasty conflict you're worried about here, can you highlight it? Maybe we can try to minimize the potential problem.

snazy · 2024-12-04T10:33:46Z

Sure, nobody says we shall pause everything. But how the cache is used is heavily influenced by the CDI work.

I still have concerns about the heap and CPU pressure in this PR and #433 and how it's persisted. Also, having a weigher meant to limit heap pressure but having it's calculation being largely off is IMO not good. It's also that the design has hard limits, meaning the cache cannot be as effective as it could be - and at the same time there can be an unbounded number of EntityCache instances. IMO the EntityCache needs a redesign. Because of all that, I'm not sure whether it makes sense to add more stuff to it at this point, especially since there's been no Polaris "GA" release yet, so there's no pressure to push things in.

eric-maynard · 2024-12-10T19:32:36Z

@snazy

But how the cache is used is heavily influenced by the CDI work.

Can you clarify this point? To me, this seems like a strict improvement over the current behavior.

Also, having a weigher meant to limit heap pressure but having it's calculation being largely off is IMO not good.

I disagree that it's "largely off" to a degree that matters, which I articulated in a comment above. Let's keep discussing in more detail there if you'd like. For this thread, I would just ask: Is the current behavior better?

It seems to me that cache that is bounded to an imprecise number of bytes is vastly better than a totally unbounded cache.

It's also that the design has hard limits, meaning the cache cannot be as effective as it could be

This is true today. I'm not intending to change this behavior in this PR, only to make a best-effort attempt to limit how large the cache will grow (in bytes). If you feel that making the cache size configurable is imperative I would be happy to review a PR to that effect or to piggyback that functionality here if you prefer.

and at the same time there can be an unbounded number of EntityCache instances.

Today, this is not the case. The number of instances is bounded.

there's been no Polaris "GA" release yet, so there's no pressure to push things in.

I don't fully understand this point. Can you clarify? I am eager to see our first release, but I don't think its delay means we should put off impactful work.

snazy · 2024-12-11T13:05:42Z

Also, having a weigher meant to limit heap pressure but having it's calculation being largely off is IMO not good.

I disagree that it's "largely off" to a degree that matters, which I articulated in a comment above. Let's keep discussing in more detail there if you'd like. For this thread, I would just ask: Is the current behavior better?

2x+ is largely off, no? String.length() does not return the number of bytes.

It seems to me that cache that is bounded to an imprecise number of bytes is vastly better than a totally unbounded cache.

It's not unbounded at this point - 100,000 * 1kB is ~ 1MB - that's acceptable. But the intent of this PR is to add really large and uncompressed object trees.

So for me it seems that the effort has the potential to make Polaris unstable and/or Caffeine's eviction algorithm way less efficient.

there's been no Polaris "GA" release yet, so there's no pressure to push things in.

I don't fully understand this point. Can you clarify? I am eager to see our first release, but I don't think its delay means we should put off impactful work.

It's related to pushing for this change - we still have time to design things properly in Apache Polaris.

eric-maynard · 2024-12-11T18:28:46Z

String.length() does not return the number of bytes.

Nothing in the PR claims that it does. The length of the string does scale linearly with the number of bytes, though, so a doubling of the length limit will ~double the number of bytes in the cache.

It's not unbounded at this point

This is not correct. The cache is currently bounded to 100k objects, but it does not account for their size. The cache's size, in bytes, is currently unbounded.

eric-maynard added 2 commits November 27, 2024 01:43

initial commit

f0ca27b

autolint

e28c519

eric-maynard requested review from jbonofre, ashvina, RussellSpitzer, snazy, vvcephei, takidau, jackye1995, flyrain, collado-mike, ebyhr and adutra as code owners November 27, 2024 09:47

eric-maynard mentioned this pull request Nov 27, 2024

Support persisting TableMetadata in the metastore #433

Open

5 tasks

snazy reviewed Nov 27, 2024

View reviewed changes

eric-maynard requested a review from snazy December 2, 2024 17:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use soft values in the EntityCache; approximate entity size #490

Use soft values in the EntityCache; approximate entity size #490

eric-maynard commented Nov 27, 2024

snazy Nov 27, 2024

eric-maynard Nov 27, 2024

snazy Nov 27, 2024

eric-maynard Nov 27, 2024 •

edited

Loading

snazy Nov 27, 2024

eric-maynard Nov 27, 2024

snazy Nov 27, 2024

eric-maynard Nov 27, 2024

eric-maynard Nov 27, 2024 •

edited

Loading

snazy Nov 27, 2024

eric-maynard Nov 27, 2024

snazy commented Dec 4, 2024

eric-maynard commented Dec 4, 2024

snazy commented Dec 4, 2024

eric-maynard commented Dec 10, 2024 •

edited

Loading

snazy commented Dec 11, 2024

eric-maynard commented Dec 11, 2024

		+ value.getEntity().getProperties().length()
		+ value.getEntity().getInternalProperties().length();

Use soft values in the EntityCache; approximate entity size #490

Are you sure you want to change the base?

Use soft values in the EntityCache; approximate entity size #490

Conversation

eric-maynard commented Nov 27, 2024

Description

Type of change

How Has This Been Tested?

Checklist:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eric-maynard Nov 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eric-maynard Nov 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

snazy commented Dec 4, 2024

eric-maynard commented Dec 4, 2024

snazy commented Dec 4, 2024

eric-maynard commented Dec 10, 2024 • edited Loading

snazy commented Dec 11, 2024

eric-maynard commented Dec 11, 2024

eric-maynard Nov 27, 2024 •

edited

Loading

eric-maynard Nov 27, 2024 •

edited

Loading

eric-maynard commented Dec 10, 2024 •

edited

Loading