Better defaults for DocumentSubsetBitsetCache #49260

jpountz · 2019-11-18T18:41:56Z

Initially the DLS cache was unbounded. This was not due to us being careless, but rather because Lucene makes the assumption that accessing live docs is cheap, and so we didn't it to run in linear time on cache miss. Unfortunately the ability to use templated role queries means you can have an unbounded number of bit sets in the cache, so we had to rollback this decision and move to a size-bound cache in order to avoid out-of-memory errors. We got some feedback that this has considerably slowed down some clusters, and the default of 50MB that this cache may use is way too low.

We should probably move to a number that is more generous, and also depends on the available memory on the node, e.g. 10%?

I wonder whether we should also decrease the default TTL. I think the TTL is important on this cache because it helps avoid keeping that cache full all the time in case some role queries are rarely used, which is probably typical when templated role queries are used. But the current default of 1 week means that someone running only one query every week would always use memory for this cache that might be better spent elsewhere. What about decreasing this default to maybe a couple hours?

The text was updated successfully, but these errors were encountered:

elasticmachine · 2019-11-18T18:41:57Z

Pinging @elastic/es-security (:Security/Authorization)

tvernum · 2019-11-19T01:49:35Z

Broadly there's 2 options for cache sizing

make them relatively small, but with a long TTL so that it caters for objects that are used occasionally
make them large, but with a shorter TTL so that it caters for objects that are used multiple times over a short period, but then quickly release that memory.

(Small with a short TTL is also an option, but is not particularly helpful, and Large with a long ttl just ends up holding too much memory for too long).

When adding this cache, I opted for approach (1) based on the premise that some users/roles would only be used a few times a week.
I think in retrospect that was the wrong choice. This cache is most helpful when indices are large and calculating live-docs is relatively slow. The current defaults are so small that it sometimes cannot hold a single bitset for a large index and is effectively useless.

I think it makes sense to switch the defaults to large-weight+short-ttl because that matches the use cases where the cache is most needed.

jpountz · 2019-11-19T09:43:17Z

+1

The Document Level Security BitSet Cache (see #43669) had a default configuration of "small size, long lifetime". However, this is not a very useful default as the cache is most valuable for BitSets that take a long time to construct, which is (generally speaking) the same ones that operate over a large number of documents and contain many bytes. This commit changes the cache to be "large size, short lifetime" so that it can hold bitsets representing billions of documents, but releases memory quickly. The new defaults are 10% of heap, and 2 hours. This also adds some logging when a single BitSet exceeds the size of the cache and when the cache is full. Resolves: #49260

The Document Level Security BitSet Cache (see elastic#43669) had a default configuration of "small size, long lifetime". However, this is not a very useful default as the cache is most valuable for BitSets that take a long time to construct, which is (generally speaking) the same ones that operate over a large number of documents and contain many bytes. This commit changes the cache to be "large size, short lifetime" so that it can hold bitsets representing billions of documents, but releases memory quickly. The new defaults are 10% of heap, and 2 hours. This also adds some logging when a single BitSet exceeds the size of the cache and when the cache is full. Resolves: elastic#49260 Backport of: elastic#50535

The Document Level Security BitSet Cache (see elastic#43669) had a default configuration of "small size, long lifetime". However, this is not a very useful default as the cache is most valuable for BitSets that take a long time to construct, which is (generally speaking) the same ones that operate over a large number of documents and contain many bytes. This commit changes the cache to be "large size, short lifetime" so that it can hold bitsets representing billions of documents, but releases memory quickly. The new defaults are 10% of heap, and 2 hours. This also adds some logging when a single BitSet exceeds the size of the cache and when the cache is full. Resolves: elastic#49260

jpountz added >enhancement :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC labels Nov 18, 2019

tvernum self-assigned this Nov 19, 2019

tvernum mentioned this issue Dec 31, 2019

Increase Size and lower TTL on DLS BitSet Cache #50535

Merged

tvernum closed this as completed in #50535 Jan 13, 2020

This was referenced Feb 3, 2020

[meta] 7.6 release elastic/elasticsearch-net#4340

Closed

[meta] 7.6 release elastic/elasticsearch-net#4341

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better defaults for DocumentSubsetBitsetCache #49260

Better defaults for DocumentSubsetBitsetCache #49260

jpountz commented Nov 18, 2019

elasticmachine commented Nov 18, 2019

tvernum commented Nov 19, 2019

jpountz commented Nov 19, 2019

Better defaults for DocumentSubsetBitsetCache #49260

Better defaults for DocumentSubsetBitsetCache #49260

Comments

jpountz commented Nov 18, 2019

elasticmachine commented Nov 18, 2019

tvernum commented Nov 19, 2019

jpountz commented Nov 19, 2019