KAFKA-13973: Fix inflated block cache metrics#14317
Conversation
All block cache metrics are being multiplied by the total number of column families. In a `RocksDBTimestampedStore`, we have 2 column families (the default, and the timestamped values), which causes all block cache metrics in these stores to become doubled. The cause is that our metrics recorder uses `getAggregatedLongProperty` to fetch block cache metrics. `getAggregatedLongProperty` queries the property on each column family in the database, and sums the results. Since we always configure all column families to share the same block cache, that causes the same block cache to be queried multiple times for its metrics, with the results added togehter, effectively multiplying the real value by the total number of column families. To fix this, we should simply use `getLongProperty`, which queries a single column family (the default one). Since all column families share the same block cache, querying just one of them will give us the correct metrics for that shared block cache. Note: the same block cache is shared among all column families of a store irrespective of whether the user has configured a shared block cache across multiple stores.
| // BigInteger and construct the object from the byte representation of the value | ||
| result = new BigInteger(1, longToBytes( | ||
| valueProvider.db.getAggregatedLongProperty(ROCKSDB_PROPERTIES_PREFIX + propertyName) | ||
| valueProvider.db.getLongProperty(ROCKSDB_PROPERTIES_PREFIX + propertyName) |
There was a problem hiding this comment.
This indeed seems a bug! Thanks for finding and fixing it!
| // BigInteger and construct the object from the byte representation of the value | ||
| result = result.add(new BigInteger(1, longToBytes( | ||
| valueProvider.db.getAggregatedLongProperty(ROCKSDB_PROPERTIES_PREFIX + propertyName) | ||
| valueProvider.db.getLongProperty(ROCKSDB_PROPERTIES_PREFIX + propertyName) |
There was a problem hiding this comment.
How do you know that column families share the same block cache?
There was a problem hiding this comment.
In this blog post, it says that:
Each column family can have its own block cache but may also be shared between column families or multiple database instances
We need to understand how we can discover both cases and compute the metric accordingly.
There was a problem hiding this comment.
I think you are right about that we share the same cache across the two column families. We create and add a cache to an options object and pass the same option object to both column families. I am waiting for confirmation from the speedb chat.
There was a problem hiding this comment.
Yeah, so block caches are configurable per-CF in both RocksDB and Speedb, but the specific implementation in TimestampedRocksDBStore always assigns the same block cache to both column families. In fact, the design of RocksDBStore makes it difficult to assign different block caches to column families.
The only way an implementation could have different block caches for different column families is if RocksDBStore was extended and overridden, such that the configuration of the TableFormatConfig is provided per-CF. In which case, such a custom implementation would necessitate a custom RocksDBMetricsRecorder, as it's tightly coupled to RocksDBStore anyway.
There was a problem hiding this comment.
I got confirmation, the column families share the cache.
|
@nicktelford Could you please add new unit tests for the case of multiple column families and adapt existing unit tests to your fix? |
No existing tests appear to be affected by this bug/change, but I can most likely add a test that verifies the bug and fix. |
|
I think this part of the unit tests should be affected since the called method changes: |
Our tests were verifying the use of `getAggregatedLongProperty`, when they should instead use `getLongProperty` for _only_ block cache metrics.
Woops! I must have missed those test failures. I've pushed an update to that test suite. Do you think this is sufficient or should there be more tests? |
cadonna
left a comment
There was a problem hiding this comment.
@nicktelford Thanks for the update!
Could you also look why we did not catch this bug in RocksDBMetricsIntegrationTestor other metrics integration tests and add tests if needed?
.../java/org/apache/kafka/streams/state/internals/metrics/RocksDBMetricsRecorderGaugesTest.java
Outdated
Show resolved
Hide resolved
@cadonna Looks like that test doesn't verify the value of those metrics; it only checks the number of metrics registered under each name (i.e. that the expected metrics are registered and available, but not what they are). |
|
@nicktelford Yes, you are right about the existing tests! Sorry I was in a hurry and only skimmed them and missed that fact. |
Tests that `RocksDBStore` and `RocksDBTimestampedStore` produce the expected values in their block cache metrics. The test has been verified to fail without the fix provided by apache#14317, and pass with the fix applied. Note: the primary constructor of `RocksDBTimestampedStore` was made `public` to match the visibility of the same constructor on the parent `RocksDBStore`.
|
@cadonna I've added a test now that I think tests this suitably. Crucially, the test fails without this fix applied, and passes with this fix applied, as you would expect. |
| import org.junit.Test; | ||
| import org.junit.runner.RunWith; | ||
| import org.junit.runners.Parameterized; |
There was a problem hiding this comment.
We are migrating to JUnit 5. New tests should use JUnit 5. Could you please adapt the test?
There was a problem hiding this comment.
I've pushed this change, although it's not particularly pretty: JUnit 5 doesn't have support for Parameterized tests where the test parameters are shared among all tests in the class. This issue is tracked here: junit-team/junit-framework#944
I suspect this might be an issue for various other tests in the codebase as they get migrated to JUnit 5.
For now, we have to setup and teardown the state store on every test method, which slows it down a bit.
cadonna
left a comment
There was a problem hiding this comment.
LGTM!
@nicktelford Thanks again!
All block cache metrics are being multiplied by the total number of column families. In a `RocksDBTimestampedStore`, we have 2 column families (the default, and the timestamped values), which causes all block cache metrics in these stores to become doubled. The cause is that our metrics recorder uses `getAggregatedLongProperty` to fetch block cache metrics. `getAggregatedLongProperty` queries the property on each column family in the database, and sums the results. Since we always configure all column families to share the same block cache, that causes the same block cache to be queried multiple times for its metrics, with the results added togehter, effectively multiplying the real value by the total number of column families. To fix this, we should simply use `getLongProperty`, which queries a single column family (the default one). Since all column families share the same block cache, querying just one of them will give us the correct metrics for that shared block cache. Note: the same block cache is shared among all column families of a store irrespective of whether the user has configured a shared block cache across multiple stores. Reviewers: Matthias J. Sax <matthias@confluent.io>, Bruno Cadonna <cadonna@apache.org>
|
Cherry-picked this to |
All block cache metrics are being multiplied by the total number of column families. In a `RocksDBTimestampedStore`, we have 2 column families (the default, and the timestamped values), which causes all block cache metrics in these stores to become doubled. The cause is that our metrics recorder uses `getAggregatedLongProperty` to fetch block cache metrics. `getAggregatedLongProperty` queries the property on each column family in the database, and sums the results. Since we always configure all column families to share the same block cache, that causes the same block cache to be queried multiple times for its metrics, with the results added togehter, effectively multiplying the real value by the total number of column families. To fix this, we should simply use `getLongProperty`, which queries a single column family (the default one). Since all column families share the same block cache, querying just one of them will give us the correct metrics for that shared block cache. Note: the same block cache is shared among all column families of a store irrespective of whether the user has configured a shared block cache across multiple stores. Reviewers: Matthias J. Sax <matthias@confluent.io>, Bruno Cadonna <cadonna@apache.org>
All block cache metrics are being multiplied by the total number of column families. In a `RocksDBTimestampedStore`, we have 2 column families (the default, and the timestamped values), which causes all block cache metrics in these stores to become doubled. The cause is that our metrics recorder uses `getAggregatedLongProperty` to fetch block cache metrics. `getAggregatedLongProperty` queries the property on each column family in the database, and sums the results. Since we always configure all column families to share the same block cache, that causes the same block cache to be queried multiple times for its metrics, with the results added togehter, effectively multiplying the real value by the total number of column families. To fix this, we should simply use `getLongProperty`, which queries a single column family (the default one). Since all column families share the same block cache, querying just one of them will give us the correct metrics for that shared block cache. Note: the same block cache is shared among all column families of a store irrespective of whether the user has configured a shared block cache across multiple stores. Reviewers: Matthias J. Sax <matthias@confluent.io>, Bruno Cadonna <cadonna@apache.org>
All block cache metrics are being multiplied by the total number of column families. In a `RocksDBTimestampedStore`, we have 2 column families (the default, and the timestamped values), which causes all block cache metrics in these stores to become doubled. The cause is that our metrics recorder uses `getAggregatedLongProperty` to fetch block cache metrics. `getAggregatedLongProperty` queries the property on each column family in the database, and sums the results. Since we always configure all column families to share the same block cache, that causes the same block cache to be queried multiple times for its metrics, with the results added togehter, effectively multiplying the real value by the total number of column families. To fix this, we should simply use `getLongProperty`, which queries a single column family (the default one). Since all column families share the same block cache, querying just one of them will give us the correct metrics for that shared block cache. Note: the same block cache is shared among all column families of a store irrespective of whether the user has configured a shared block cache across multiple stores. Reviewers: Matthias J. Sax <matthias@confluent.io>, Bruno Cadonna <cadonna@apache.org>
All block cache metrics are being multiplied by the total number of column families. In a `RocksDBTimestampedStore`, we have 2 column families (the default, and the timestamped values), which causes all block cache metrics in these stores to become doubled. The cause is that our metrics recorder uses `getAggregatedLongProperty` to fetch block cache metrics. `getAggregatedLongProperty` queries the property on each column family in the database, and sums the results. Since we always configure all column families to share the same block cache, that causes the same block cache to be queried multiple times for its metrics, with the results added togehter, effectively multiplying the real value by the total number of column families. To fix this, we should simply use `getLongProperty`, which queries a single column family (the default one). Since all column families share the same block cache, querying just one of them will give us the correct metrics for that shared block cache. Note: the same block cache is shared among all column families of a store irrespective of whether the user has configured a shared block cache across multiple stores. Reviewers: Matthias J. Sax <matthias@confluent.io>, Bruno Cadonna <cadonna@apache.org>
All block cache metrics are being multiplied by the total number of column families. In a
RocksDBTimestampedStore, we have 2 column families (the default, and the timestamped values), which causes all block cache metrics in these stores to become doubled.The cause is that our metrics recorder uses
getAggregatedLongPropertyto fetch block cache metrics.getAggregatedLongPropertyqueries the property on each column family in the database, and sums the results.Since we always configure all column families to share the same block cache, that causes the same block cache to be queried multiple times for its metrics, with the results added togehter, effectively multiplying the real value by the total number of column families.
To fix this, we should simply use
getLongProperty, which queries a single column family (the default one). Since all column families share the same block cache, querying just one of them will give us the correct metrics for that shared block cache.Note: the same block cache is shared among all column families of a store irrespective of whether the user has configured a shared block cache across multiple stores.