-
Notifications
You must be signed in to change notification settings - Fork 91
Generate optimized SQL query to improve performance of find_chunks_by_dedup_key method #9318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
WalkthroughSwitched chunk deduplication lookup from MongoDB to PostgreSQL in MDStore, changed MapServer to pass base64 dedup keys (strings) instead of Buffers, added a Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/server/object_services/md_store.js (1)
1540-1543: Update JSDoc type to match actual parameter type.The JSDoc indicates
@param {nb.DBBuffer[]} dedup_keysbut the implementation now expects base64 strings (as passed frommap_server.js). Update the type to reflect the actual usage:/** * @param {nb.Bucket} bucket - * @param {nb.DBBuffer[]} dedup_keys + * @param {string[]} dedup_keys - base64 encoded dedup keys * @returns {Promise<nb.ChunkSchemaDB[]>} */
🧹 Nitpick comments (2)
src/server/object_services/map_server.js (1)
88-93: Consider usingfilter+mapfor a more functional approach.The implementation is correct and aligns with the PostgreSQL path that expects base64 strings. The optional chaining (
chunk?.digest_b64) provides null safety.A more concise alternative using functional patterns:
- const dedup_keys = []; - chunks.forEach(chunk => { - if (chunk?.digest_b64) { - dedup_keys.push(chunk.digest_b64); - } - }); + const dedup_keys = chunks + .map(chunk => chunk?.digest_b64) + .filter(Boolean);This is optional and the current implementation works correctly.
src/test/integration_tests/db/test_md_store.js (1)
402-418: Good edge case coverage for empty dedup_key array.This test ensures that passing an empty array returns an empty result without errors, which validates the
FALSE AND data ? 'dedup_key'branch in the SQL query.Consider adding a test case for chunks that don't have a
dedup_keyfield to ensure they're properly excluded from results.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/sdk/nb.d.ts(1 hunks)src/server/object_services/map_server.js(1 hunks)src/server/object_services/md_store.js(2 hunks)src/test/integration_tests/db/test_md_store.js(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/test/**/*.*
⚙️ CodeRabbit configuration file
src/test/**/*.*: Ensure that the PR includes tests for the changes.
Files:
src/test/integration_tests/db/test_md_store.js
🧠 Learnings (2)
📓 Common learnings
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:9-17
Timestamp: 2025-08-08T13:12:46.728Z
Learning: In upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js for noobaa-core, rely on structural detection (e.g., pool.mongo_info, and resource_type === 'INTERNAL') with name-prefix fallback for removing legacy mongo/internal pools, instead of depending solely on config.INTERNAL_STORAGE_POOL_NAME or config.DEFAULT_POOL_NAME. Handle multi-system stores and remove all matching pools in one change.
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:6-22
Timestamp: 2025-08-11T06:12:12.318Z
Learning: In the noobaa-core upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js, bucket migration from the internal mongo pool to a new default pool is planned to be handled in separate future PRs with comprehensive testing, rather than being included directly in the pool removal script.
📚 Learning: 2025-08-08T13:12:46.728Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:9-17
Timestamp: 2025-08-08T13:12:46.728Z
Learning: In upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js for noobaa-core, rely on structural detection (e.g., pool.mongo_info, and resource_type === 'INTERNAL') with name-prefix fallback for removing legacy mongo/internal pools, instead of depending solely on config.INTERNAL_STORAGE_POOL_NAME or config.DEFAULT_POOL_NAME. Handle multi-system stores and remove all matching pools in one change.
Applied to files:
src/server/object_services/md_store.js
🧬 Code graph analysis (3)
src/server/object_services/md_store.js (1)
src/util/postgres_client.js (6)
require(13-13)require(14-14)require(26-26)query(326-326)query(331-331)res(258-258)
src/test/integration_tests/db/test_md_store.js (1)
src/server/object_services/md_store.js (2)
config(29-29)assert(8-8)
src/server/object_services/map_server.js (1)
src/test/integration_tests/db/test_md_store.js (2)
chunk(386-394)chunk(405-413)
🔇 Additional comments (3)
src/sdk/nb.d.ts (1)
771-771: LGTM!The addition of the
schemaproperty to theDBCollectioninterface is necessary to support thedecode_jsonusage infind_chunks_by_dedup_key. Theanytype is consistent with the existing patterns in this interface.src/test/integration_tests/db/test_md_store.js (1)
383-400: Test coverage looks good for the PostgreSQL path.The test correctly validates that:
- The result is an array
- At least one chunk is returned
- The frag ID matches the inserted chunk
The bucket mock structure
{ _id, system: { _id } }aligns with the expected parameters infind_chunks_by_dedup_key.src/server/object_services/md_store.js (1)
1545-1567: SQL query implementation looks correct with parameterized queries.The implementation:
- Uses
ANY($3)for array membership, which is more efficient thanINfor PostgreSQL- Uses parameterized queries preventing SQL injection
- Properly handles the empty
dedup_keysarray case withFALSE AND data ? 'dedup_key'- Sorts by
_id DESCto utilize the primary key indexTwo observations:
- Silent error handling: Returning an empty array on error could mask legitimate issues. Consider logging at a higher severity or re-throwing certain errors:
} catch (err) { dbg.error('Error while finding chunks by dedup_key. error is ', err); // Consider: throw err; or at least return based on error type return []; }
- Null handling: Line 1556 checks
data->'deleted' IS NULL OR data->'deleted' = 'null'::jsonb. This handles both missing keys and JSON null values, which is appropriate for JSONB columns.
Signed-off-by: Karthik P S <karthikperla2000@gmail.com>
20920f6 to
3eaf530
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/server/object_services/md_store.js (1)
1539-1543: Update JSDoc to reflect the changed parameter type.The JSDoc declares
dedup_keysasnb.DBBuffer[], but the PR changes this tostring[](base64-encoded). The implementation at line 1551 passes dedup_keys directly to a SQL query expecting string values for the ANY operator.Update the JSDoc to match the actual signature:
/** * @param {nb.Bucket} bucket - * @param {nb.DBBuffer[]} dedup_keys + * @param {string[]} dedup_keys - Base64-encoded deduplication keys * @returns {Promise<nb.ChunkSchemaDB[]>} */
♻️ Duplicate comments (1)
src/server/object_services/md_store.js (1)
16-16: Import path convention.The import path includes the
.jsextension. Per Node.js conventions, the extension can be omitted:-const { decode_json } = require('../../util/postgres_client.js'); +const { decode_json } = require('../../util/postgres_client');
🧹 Nitpick comments (1)
src/sdk/nb.d.ts (1)
773-773: Schema field addition looks good.The new
schema: anyfield enables PostgreSQL JSON decoding in md_store.js. Whileanyprovides no type safety, it's a pragmatic choice for now.For future improvement, consider typing this field more specifically, perhaps as a generic parameter based on the collection's schema type.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/sdk/nb.d.ts(1 hunks)src/server/object_services/map_server.js(1 hunks)src/server/object_services/md_store.js(2 hunks)src/test/integration_tests/db/test_md_store.js(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- src/test/integration_tests/db/test_md_store.js
- src/server/object_services/map_server.js
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:9-17
Timestamp: 2025-08-08T13:12:46.728Z
Learning: In upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js for noobaa-core, rely on structural detection (e.g., pool.mongo_info, and resource_type === 'INTERNAL') with name-prefix fallback for removing legacy mongo/internal pools, instead of depending solely on config.INTERNAL_STORAGE_POOL_NAME or config.DEFAULT_POOL_NAME. Handle multi-system stores and remove all matching pools in one change.
📚 Learning: 2025-08-08T13:12:46.728Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:9-17
Timestamp: 2025-08-08T13:12:46.728Z
Learning: In upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js for noobaa-core, rely on structural detection (e.g., pool.mongo_info, and resource_type === 'INTERNAL') with name-prefix fallback for removing legacy mongo/internal pools, instead of depending solely on config.INTERNAL_STORAGE_POOL_NAME or config.DEFAULT_POOL_NAME. Handle multi-system stores and remove all matching pools in one change.
Applied to files:
src/server/object_services/md_store.js
🧬 Code graph analysis (1)
src/server/object_services/md_store.js (1)
src/util/postgres_client.js (3)
query(326-326)query(331-331)res(258-258)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Build Noobaa Image
- GitHub Check: run-jest-unit-tests
- GitHub Check: run-package-lock-validation
🔇 Additional comments (2)
src/server/object_services/md_store.js (2)
1545-1556: Query construction is well-implemented.The SQL query correctly:
- Uses parameterized queries ($1, $2, $3) to prevent SQL injection
- Employs the ANY operator for array comparison (per PR objectives)
- Sorts by the
_idprimary key column to leverage indexing (per PR objectives)- Follows patterns consistent with other queries in the codebase
These changes align with the stated performance goals of reducing query size and improving execution time.
1549-1554: Verify the dedup_keys parameter is properly formatted as an array for the ANY() operator.The code uses PostgreSQL's
ANY($3)operator, which requires the bound parameter to be a proper array type. When using node-postgres, the parameter must be passed as an array object. Ensure thatdedup_keysis consistently an array before being pushed tovaluesat line 1551—this includes edge cases where dedup_keys might be an empty array (which the code already handles) or single-item arrays. The JSONB operators (data ? 'dedup_key'anddata ->>) are standard PostgreSQL syntax and should work correctly.
| try { | ||
| const res = await this._chunks.executeSQL(query, values); | ||
| const chunks = res?.rows.map(row => decode_json(this._chunks.schema, row.data)); | ||
| await this.load_blocks_for_chunks(chunks); | ||
| return chunks; | ||
| } catch (err) { | ||
| dbg.error('Error while finding chunks by dedup_key. error is ', err); | ||
| return []; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improve optional chaining for safety.
Line 1560 uses incomplete optional chaining that could still throw if res is undefined:
const chunks = res?.rows.map(row => decode_json(this._chunks.schema, row.data));If res is undefined, res?.rows returns undefined, and calling .map() on undefined throws an error. Fix with:
-const chunks = res?.rows.map(row => decode_json(this._chunks.schema, row.data));
+const chunks = res?.rows?.map(row => decode_json(this._chunks.schema, row.data)) || [];This ensures chunks is always an array, even if res or res.rows is undefined.
🤖 Prompt for AI Agents
In src/server/object_services/md_store.js around lines 1558 to 1566, the mapping
uses incomplete optional chaining which can throw if res or res.rows is
undefined; replace the mapping with a safe expression that always produces an
array (e.g. const chunks = (res?.rows?.map(row =>
decode_json(this._chunks.schema, row.data))) ?? [];), then call await
this.load_blocks_for_chunks(chunks); and return chunks so chunks is guaranteed
to be an array even when the query returns no result.
Describe the Problem
The mongo to pg query converter uses IN operator for array. But in case of postgres, using ANY operator improves performance as the query size is reduced and it also allows for plan caching. The sort is also applied on '_id' JSON field of data column, but there is _id column already present (primary key).
Explain the Changes
Issues: Fixed #xxx / Gap #xxx
Testing Instructions:
Summary by CodeRabbit
Improvements
Tests
✏️ Tip: You can customize this high-level summary in your review settings.