Skip to content

KAFKA-13973: Fix inflated block cache metrics#14317

Merged
cadonna merged 5 commits intoapache:trunkfrom
nicktelford:KAFKA-13973
Sep 18, 2023
Merged

KAFKA-13973: Fix inflated block cache metrics#14317
cadonna merged 5 commits intoapache:trunkfrom
nicktelford:KAFKA-13973

Conversation

@nicktelford
Copy link
Contributor

All block cache metrics are being multiplied by the total number of column families. In a RocksDBTimestampedStore, we have 2 column families (the default, and the timestamped values), which causes all block cache metrics in these stores to become doubled.

The cause is that our metrics recorder uses getAggregatedLongProperty to fetch block cache metrics. getAggregatedLongProperty queries the property on each column family in the database, and sums the results.

Since we always configure all column families to share the same block cache, that causes the same block cache to be queried multiple times for its metrics, with the results added togehter, effectively multiplying the real value by the total number of column families.

To fix this, we should simply use getLongProperty, which queries a single column family (the default one). Since all column families share the same block cache, querying just one of them will give us the correct metrics for that shared block cache.

Note: the same block cache is shared among all column families of a store irrespective of whether the user has configured a shared block cache across multiple stores.

All block cache metrics are being multiplied by the total number of
column families. In a `RocksDBTimestampedStore`, we have 2 column
families (the default, and the timestamped values), which causes all
block cache metrics in these stores to become doubled.

The cause is that our metrics recorder uses `getAggregatedLongProperty`
to fetch block cache metrics. `getAggregatedLongProperty` queries the
property on each column family in the database, and sums the results.

Since we always configure all column families to share the same block
cache, that causes the same block cache to be queried multiple times for
its metrics, with the results added togehter, effectively multiplying
the real value by the total number of column families.

To fix this, we should simply use `getLongProperty`, which queries a
single column family (the default one). Since all column families share
the same block cache, querying just one of them will give us the correct
metrics for that shared block cache.

Note: the same block cache is shared among all column families of a store
irrespective of whether the user has configured a shared block cache
across multiple stores.
// BigInteger and construct the object from the byte representation of the value
result = new BigInteger(1, longToBytes(
valueProvider.db.getAggregatedLongProperty(ROCKSDB_PROPERTIES_PREFIX + propertyName)
valueProvider.db.getLongProperty(ROCKSDB_PROPERTIES_PREFIX + propertyName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This indeed seems a bug! Thanks for finding and fixing it!

// BigInteger and construct the object from the byte representation of the value
result = result.add(new BigInteger(1, longToBytes(
valueProvider.db.getAggregatedLongProperty(ROCKSDB_PROPERTIES_PREFIX + propertyName)
valueProvider.db.getLongProperty(ROCKSDB_PROPERTIES_PREFIX + propertyName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you know that column families share the same block cache?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this blog post, it says that:

Each column family can have its own block cache but may also be shared between column families or multiple database instances

We need to understand how we can discover both cases and compute the metric accordingly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right about that we share the same cache across the two column families. We create and add a cache to an options object and pass the same option object to both column families. I am waiting for confirmation from the speedb chat.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so block caches are configurable per-CF in both RocksDB and Speedb, but the specific implementation in TimestampedRocksDBStore always assigns the same block cache to both column families. In fact, the design of RocksDBStore makes it difficult to assign different block caches to column families.

The only way an implementation could have different block caches for different column families is if RocksDBStore was extended and overridden, such that the configuration of the TableFormatConfig is provided per-CF. In which case, such a custom implementation would necessitate a custom RocksDBMetricsRecorder, as it's tightly coupled to RocksDBStore anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got confirmation, the column families share the cache.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw: Maybe speedb-io/speedb#583 (comment) helps?

@cadonna
Copy link
Member

cadonna commented Sep 5, 2023

@nicktelford Could you please add new unit tests for the case of multiple column families and adapt existing unit tests to your fix?

@nicktelford
Copy link
Contributor Author

@nicktelford Could you please add new unit tests for the case of multiple column families and adapt existing unit tests to your fix?

No existing tests appear to be affected by this bug/change, but I can most likely add a test that verifies the bug and fix.

@cadonna
Copy link
Member

cadonna commented Sep 5, 2023

I think this part of the unit tests should be affected since the called method changes:

when(dbToAdd1.getAggregatedLongProperty(ROCKSDB_PROPERTIES_PREFIX + propertyName)).thenReturn(recordedValue);

Our tests were verifying the use of `getAggregatedLongProperty`, when
they should instead use `getLongProperty` for _only_ block cache
metrics.
@nicktelford
Copy link
Contributor Author

I think this part of the unit tests should be affected since the called method changes:

when(dbToAdd1.getAggregatedLongProperty(ROCKSDB_PROPERTIES_PREFIX + propertyName)).thenReturn(recordedValue);

Woops! I must have missed those test failures. I've pushed an update to that test suite. Do you think this is sufficient or should there be more tests?

Copy link
Member

@cadonna cadonna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nicktelford Thanks for the update!

Could you also look why we did not catch this bug in RocksDBMetricsIntegrationTestor other metrics integration tests and add tests if needed?

@nicktelford
Copy link
Contributor Author

@nicktelford Thanks for the update!

Could you also look why we did not catch this bug in RocksDBMetricsIntegrationTestor other metrics integration tests and add tests if needed?

@cadonna Looks like that test doesn't verify the value of those metrics; it only checks the number of metrics registered under each name (i.e. that the expected metrics are registered and available, but not what they are).

@cadonna
Copy link
Member

cadonna commented Sep 5, 2023

@nicktelford Yes, you are right about the existing tests! Sorry I was in a hurry and only skimmed them and missed that fact.
I would like to have some tests that verify the behavior with one single column family (non-timestamped RocksDB) and multiple column families (timestamped RocksDB) so that we get a signal when something changes on RocksDB side and to kind of document the behavior.

Tests that `RocksDBStore` and `RocksDBTimestampedStore` produce the
expected values in their block cache metrics.

The test has been verified to fail without the fix provided by apache#14317,
and pass with the fix applied.

Note: the primary constructor of `RocksDBTimestampedStore` was made
`public` to match the visibility of the same constructor on the parent
`RocksDBStore`.
@nicktelford
Copy link
Contributor Author

@cadonna I've added a test now that I think tests this suitably. Crucially, the test fails without this fix applied, and passes with this fix applied, as you would expect.

Copy link
Member

@cadonna cadonna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates, @nicktelford !

I have one ask.

For the rest the PR looks good to me!

Comment on lines 31 to 33
import org.junit.Test;
import org.junit.runner.RunWith;
import org.junit.runners.Parameterized;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are migrating to JUnit 5. New tests should use JUnit 5. Could you please adapt the test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed this change, although it's not particularly pretty: JUnit 5 doesn't have support for Parameterized tests where the test parameters are shared among all tests in the class. This issue is tracked here: junit-team/junit-framework#944

I suspect this might be an issue for various other tests in the codebase as they get migrated to JUnit 5.

For now, we have to setup and teardown the state store on every test method, which slows it down a bit.

Copy link
Member

@cadonna cadonna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@nicktelford Thanks again!

@cadonna cadonna merged commit f041efa into apache:trunk Sep 18, 2023
@nicktelford nicktelford deleted the KAFKA-13973 branch September 18, 2023 16:25
mjsax pushed a commit that referenced this pull request Oct 19, 2023
All block cache metrics are being multiplied by the total number of
column families. In a `RocksDBTimestampedStore`, we have 2 column
families (the default, and the timestamped values), which causes all
block cache metrics in these stores to become doubled.

The cause is that our metrics recorder uses `getAggregatedLongProperty`
to fetch block cache metrics. `getAggregatedLongProperty` queries the
property on each column family in the database, and sums the results.

Since we always configure all column families to share the same block
cache, that causes the same block cache to be queried multiple times for
its metrics, with the results added togehter, effectively multiplying
the real value by the total number of column families.

To fix this, we should simply use `getLongProperty`, which queries a
single column family (the default one). Since all column families share
the same block cache, querying just one of them will give us the correct
metrics for that shared block cache.

Note: the same block cache is shared among all column families of a store
irrespective of whether the user has configured a shared block cache
across multiple stores.

Reviewers: Matthias J. Sax <matthias@confluent.io>, Bruno Cadonna <cadonna@apache.org>
@mjsax
Copy link
Member

mjsax commented Oct 19, 2023

Cherry-picked this to 3.6 branch. Also tries to cherry-pick to 3.5 but the RocksDBMetricsRecorderGaugesTest test fails on 3.5 -- not sure why yet.

cadonna pushed a commit to cadonna/kafka that referenced this pull request Oct 19, 2023
All block cache metrics are being multiplied by the total number of
column families. In a `RocksDBTimestampedStore`, we have 2 column
families (the default, and the timestamped values), which causes all
block cache metrics in these stores to become doubled.

The cause is that our metrics recorder uses `getAggregatedLongProperty`
to fetch block cache metrics. `getAggregatedLongProperty` queries the
property on each column family in the database, and sums the results.

Since we always configure all column families to share the same block
cache, that causes the same block cache to be queried multiple times for
its metrics, with the results added togehter, effectively multiplying
the real value by the total number of column families.

To fix this, we should simply use `getLongProperty`, which queries a
single column family (the default one). Since all column families share
the same block cache, querying just one of them will give us the correct
metrics for that shared block cache.

Note: the same block cache is shared among all column families of a store
irrespective of whether the user has configured a shared block cache
across multiple stores.

Reviewers: Matthias J. Sax <matthias@confluent.io>, Bruno Cadonna <cadonna@apache.org>
cadonna pushed a commit that referenced this pull request Oct 20, 2023
All block cache metrics are being multiplied by the total number of
column families. In a `RocksDBTimestampedStore`, we have 2 column
families (the default, and the timestamped values), which causes all
block cache metrics in these stores to become doubled.

The cause is that our metrics recorder uses `getAggregatedLongProperty`
to fetch block cache metrics. `getAggregatedLongProperty` queries the
property on each column family in the database, and sums the results.

Since we always configure all column families to share the same block
cache, that causes the same block cache to be queried multiple times for
its metrics, with the results added togehter, effectively multiplying
the real value by the total number of column families.

To fix this, we should simply use `getLongProperty`, which queries a
single column family (the default one). Since all column families share
the same block cache, querying just one of them will give us the correct
metrics for that shared block cache.

Note: the same block cache is shared among all column families of a store
irrespective of whether the user has configured a shared block cache
across multiple stores.

Reviewers: Matthias J. Sax <matthias@confluent.io>, Bruno Cadonna <cadonna@apache.org>
mjsax pushed a commit to confluentinc/kafka that referenced this pull request Nov 22, 2023
All block cache metrics are being multiplied by the total number of
column families. In a `RocksDBTimestampedStore`, we have 2 column
families (the default, and the timestamped values), which causes all
block cache metrics in these stores to become doubled.

The cause is that our metrics recorder uses `getAggregatedLongProperty`
to fetch block cache metrics. `getAggregatedLongProperty` queries the
property on each column family in the database, and sums the results.

Since we always configure all column families to share the same block
cache, that causes the same block cache to be queried multiple times for
its metrics, with the results added togehter, effectively multiplying
the real value by the total number of column families.

To fix this, we should simply use `getLongProperty`, which queries a
single column family (the default one). Since all column families share
the same block cache, querying just one of them will give us the correct
metrics for that shared block cache.

Note: the same block cache is shared among all column families of a store
irrespective of whether the user has configured a shared block cache
across multiple stores.

Reviewers: Matthias J. Sax <matthias@confluent.io>, Bruno Cadonna <cadonna@apache.org>
Cerchie pushed a commit to Cerchie/kafka that referenced this pull request Feb 22, 2024
All block cache metrics are being multiplied by the total number of
column families. In a `RocksDBTimestampedStore`, we have 2 column
families (the default, and the timestamped values), which causes all
block cache metrics in these stores to become doubled.

The cause is that our metrics recorder uses `getAggregatedLongProperty`
to fetch block cache metrics. `getAggregatedLongProperty` queries the
property on each column family in the database, and sums the results.

Since we always configure all column families to share the same block
cache, that causes the same block cache to be queried multiple times for
its metrics, with the results added togehter, effectively multiplying
the real value by the total number of column families.

To fix this, we should simply use `getLongProperty`, which queries a
single column family (the default one). Since all column families share
the same block cache, querying just one of them will give us the correct
metrics for that shared block cache.

Note: the same block cache is shared among all column families of a store
irrespective of whether the user has configured a shared block cache
across multiple stores.

Reviewers: Matthias J. Sax <matthias@confluent.io>, Bruno Cadonna <cadonna@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants