[enhance](iceberg) Refactor Iceberg metadata cache structure and add table cache test cases#59716
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run external |
|
run buildall |
FE UT Coverage ReportIncrement line coverage |
TPC-H: Total hot run time: 32189 ms |
TPC-DS: Total hot run time: 173580 ms |
FE Regression Coverage ReportIncrement line coverage |
fe/fe-core/src/main/java/org/apache/doris/datasource/iceberg/IcebergMetadataCache.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/datasource/iceberg/IcebergMetadataCache.java
Show resolved
Hide resolved
1574f3a to
2391454
Compare
|
run buildall |
TPC-H: Total hot run time: 32499 ms |
ClickBench: Total hot run time: 28.32 s |
|
run buildall |
TPC-H: Total hot run time: 32768 ms |
ClickBench: Total hot run time: 28.25 s |
FE UT Coverage ReportIncrement line coverage |
|
run buildall |
TPC-H: Total hot run time: 33103 ms |
ClickBench: Total hot run time: 28.19 s |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
There was a problem hiding this comment.
Pull request overview
This PR refactors the Iceberg metadata cache structure to improve code organization, reduce memory overhead, and add comprehensive test coverage for table cache behavior. The main improvements include consolidating three separate caches into two, implementing lazy loading for snapshot cache, and fixing a spelling error in a method name.
Changes:
- Introduced
IcebergTableCacheValueto encapsulate table metadata with lazy-loaded snapshot cache - Removed redundant
snapshotListCacheandsnapshotCache, consolidating them intoIcebergTableCacheValue - Renamed
getLastedIcebergSnapshottogetLatestIcebergSnapshot(spelling correction) - Added comprehensive test suite covering DML operations, schema changes, and partition evolution
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
IcebergTableCacheValue.java |
New class implementing lazy-loaded snapshot cache with thread-safe double-checked locking |
IcebergMetadataCache.java |
Simplified cache structure from 3 caches to 2; removed snapshot-specific caches and cache statistics |
IcebergUtils.java |
Refactored method signatures to support passing Table instances; renamed methods for consistency |
IcebergExternalCatalog.java |
Removed deprecated ICEBERG_SNAPSHOT_META_CACHE_TTL_SECOND property |
IcebergExternalTable.java |
Updated to use new simplified cache API methods |
IcebergDlaTable.java |
Updated to use new cache API for Hive-based Iceberg tables |
HMSExternalTable.java |
Updated to use renamed snapshot cache methods |
test_iceberg_table_cache.groovy |
Comprehensive new test covering cache behavior with DML, DDL, and partition operations |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
run buildall |
TPC-H: Total hot run time: 32384 ms |
ClickBench: Total hot run time: 28.46 s |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
…table cache test cases (#59716) ### What problem does this PR solve? ## Description ### Changes This PR refactors the Iceberg metadata cache structure to improve code organization and adds comprehensive test cases for table cache behavior. ### Main Changes #### 1. Refactored IcebergMetadataCache - Introduced `IcebergTableCacheValue` to encapsulate table-related metadata - Removed redundant `snapshotListCache` and `snapshotCache` - Merged snapshot information into `IcebergTableCacheValue` with lazy loading - Simplified cache structure from 3 separate caches to 2: `tableCache` and `viewCache` **Before:** ```java private LoadingCache<IcebergMetadataCacheKey, List<Snapshot>> snapshotListCache; private LoadingCache<IcebergMetadataCacheKey, Table> tableCache; private LoadingCache<IcebergMetadataCacheKey, IcebergSnapshotCacheValue> snapshotCache; ``` **After:** ```java private LoadingCache<IcebergMetadataCacheKey, IcebergTableCacheValue> tableCache; private LoadingCache<IcebergMetadataCacheKey, View> viewCache; ``` #### 2. Lazy Loading for Snapshot Cache - Snapshot cache is now loaded on-demand through `IcebergTableCacheValue.getSnapshotCacheValue()` - Reduced unnecessary memory footprint for queries that don't require snapshot information - Snapshot information is mainly used for MTMV scenarios #### 3. Simplified Cache API - `getIcebergTable()`: Returns the Table object directly from `IcebergTableCacheValue` - `getSnapshotCache()`: Returns snapshot cache value with lazy loading - `getSnapshotList()`: Returns snapshot list from the Table object #### 4. Test Cases - Added comprehensive test case `test_iceberg_table_cache` to verify cache behavior - Tests cover both cache-enabled and cache-disabled scenarios - Validated external modifications (INSERT, DELETE, UPDATE, schema changes) are properly handled ### Benefits | Aspect | Improvement | |--------|-------------| | **Memory Usage** | Reduced by eliminating duplicate caching of snapshot information | | **Code Structure** | Cleaner with single `IcebergTableCacheValue` instead of multiple separate caches | | **Performance** | Better with lazy loading of snapshot cache only when needed | | **Maintainability** | Simpler cache management logic | ### Test Results - Added regression test: `test_iceberg_table_cache.groovy` - Tests validate cache behavior with TTL and external modifications - Verified cache invalidation works correctly with `REFRESH TABLE` - Test scenarios include: - DML operations (INSERT, DELETE, UPDATE, INSERT OVERWRITE) - Schema changes (ADD/DROP/RENAME COLUMN, ALTER COLUMN TYPE) - Partition evolution (ADD/DROP/REPLACE PARTITION FIELD) ### Related Files **Core Changes:** - `IcebergMetadataCache.java` - Refactored cache structure - `IcebergTableCacheValue.java` - New class to encapsulate table metadata - `IcebergExternalCatalog.java` - Updated cache-related configurations **Tests:** - `test_iceberg_table_cache.groovy` - Comprehensive cache behavior tests - `Suite.groovy` - Updated `getSparkIcebergContainerName()` implementation
What problem does this PR solve?
Description
Changes
This PR refactors the Iceberg metadata cache structure to improve code organization and adds comprehensive test cases for table cache behavior.
Main Changes
1. Refactored IcebergMetadataCache
IcebergTableCacheValueto encapsulate table-related metadatasnapshotListCacheandsnapshotCacheIcebergTableCacheValuewith lazy loadingtableCacheandviewCacheBefore:
After:
2. Lazy Loading for Snapshot Cache
IcebergTableCacheValue.getSnapshotCacheValue()3. Simplified Cache API
getIcebergTable(): Returns the Table object directly fromIcebergTableCacheValuegetSnapshotCache(): Returns snapshot cache value with lazy loadinggetSnapshotList(): Returns snapshot list from the Table object4. Test Cases
test_iceberg_table_cacheto verify cache behaviorBenefits
IcebergTableCacheValueinstead of multiple separate cachesTest Results
test_iceberg_table_cache.groovyREFRESH TABLERelated Files
Core Changes:
IcebergMetadataCache.java- Refactored cache structureIcebergTableCacheValue.java- New class to encapsulate table metadataIcebergExternalCatalog.java- Updated cache-related configurationsTests:
test_iceberg_table_cache.groovy- Comprehensive cache behavior testsSuite.groovy- UpdatedgetSparkIcebergContainerName()implementationCheck List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)