Skip to content

Conversation

@naveenpaul1
Copy link
Contributor

@naveenpaul1 naveenpaul1 commented Sep 10, 2025

Describe the Problem

Log based deletion sync fails between AWS Namespacestore buckets

Explain the Changes

  1. log_object body and error code updated err?.name
  2. s3_client fetch from noobaa_s3_client.get_s3_client_v3_params()

Issues: Fixed #xxx / Gap #xxx

  1. https://issues.redhat.com/browse/DFBUGS-3890

Testing Instructions:

  1. Please refere the issue https://issues.redhat.com/browse/DFBUGS-3890
  • Doc added/updated
  • Tests added

Summary by CodeRabbit

  • Bug Fixes

    • More reliable retrieval and parsing of S3 logs; reduced intermittent failures and quieter handling of "NotFound" errors.
  • Refactor

    • Replaced underlying S3 client and moved log parsing to asynchronous, string-based processing for improved stability and performance.
  • Chores

    • Updated tests and internal configuration to align with the new client and async processing flow.

@coderabbitai
Copy link

coderabbitai bot commented Sep 10, 2025

Walkthrough

Replace AWS SDK v2 S3 usage with NooBaa S3 v3 client, convert log retrieval/parsing to async string-based processing, update S3 client construction plumbing, and adjust NotFound detection in replication utilities and related tests.

Changes

Cohort / File(s) Summary
Replication log parser (S3 v3 + async parsing)
src/server/bg_services/replication_log_parser.js
Replace aws-sdk with noobaa_s3_client.get_s3_client_v3_params; remove .promise() calls; use Body.transformToString() and await it; change aws_parse_log_object to async accepting a log string (signature changed) and update internal helpers.
Cloud S3 connection helper
src/util/cloud_utils.js
Use noobaa_s3_client.get_s3_client_v3_params(...) in set_noobaa_s3_connection instead of creating AWS v2 client directly; preserve endpoint/credentials validation and returned client.
Replication utilities error handling
src/server/utils/replication_utils.js
Change NotFound detection from err.code === 'NotFound' to err?.name === 'NotFound'; log err?.name alongside the error object.
Unit tests (interface update)
src/test/unit_tests/internal/test_bucket_log_based_replication.test.js
Update tests to pass raw log strings (instead of { Body: '...' }), add the extra parameter for new aws_parse_log_object signature, and await the async parser calls.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor BG as BG Service
  participant RLP as ReplicationLogParser
  participant S3 as NooBaa S3 (SDK v3)

  rect rgb(237,243,254)
    note right of S3: Client created via noobaa_s3_client.get_s3_client_v3_params
  end

  BG->>RLP: get_aws_log_candidates()
  RLP->>S3: listObjectsV2(params)
  S3-->>RLP: object list
  loop for each object
    RLP->>S3: getObject(key)
    S3-->>RLP: Body (stream)
    RLP->>RLP: await Body.transformToString()
    RLP->>RLP: await aws_parse_log_object(log_string, ...)
  end
  RLP-->>BG: candidates
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • Neon-White
  • jackyalbo
  • tangledbytes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 62.50% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title clearly indicates an SDK-related fix for log replication, which matches the changes to S3 client usage and log parsing in the diff; it is relevant and concise though slightly repetitive (uses "SDK" twice) and could be more specific.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 66a215e and e8e5927.

📒 Files selected for processing (4)
  • src/server/bg_services/replication_log_parser.js (8 hunks)
  • src/server/utils/replication_utils.js (1 hunks)
  • src/test/unit_tests/internal/test_bucket_log_based_replication.test.js (5 hunks)
  • src/util/cloud_utils.js (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • src/server/utils/replication_utils.js
  • src/util/cloud_utils.js
  • src/server/bg_services/replication_log_parser.js
  • src/test/unit_tests/internal/test_bucket_log_based_replication.test.js
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: run-jest-unit-tests
  • GitHub Check: Build Noobaa Image
  • GitHub Check: run-package-lock-validation

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/util/cloud_utils.js (1)

188-198: Switching to noobaa_s3_client v3 params here looks right; tls: false is correct for INTERNAL S3.

  • Matches the repo-wide v3 migration and returns a compatible S3 client.
  • Using get_requestHandler_with_suitable_agent(endpoint) is appropriate.
  • Per prior learning, INTERNAL 'api: s3' endpoints are non-secure, so tls: false is expected.
🧹 Nitpick comments (3)
src/server/bg_services/replication_log_parser.js (3)

271-276: ResponseContentType is unnecessary for getObject here.

It doesn’t affect the returned Body and can be dropped to avoid confusion.

-        const res = await s3.getObject({
+        const res = await s3.getObject({
             Bucket: bucket,
             Key: key,
-            ResponseContentType: 'json'
         });

293-295: Add a safe fallback when Body.transformToString is unavailable; specify encoding.

Some runtimes or polyfills may not expose transformToString; also make encoding explicit.

-async function aws_parse_log_object(logs, log_object, sync_deletions, obj_prefix_filter) {
-    const log_string = await log_object.Body.transformToString();
+async function aws_parse_log_object(logs, log_object, sync_deletions, obj_prefix_filter) {
+    if (!log_object?.Body) {
+        dbg.error('aws_parse_log_object: missing Body on log object');
+        return;
+    }
+    const toStringFromBody = async body => {
+        if (typeof body.transformToString === 'function') return body.transformToString('utf-8');
+        // Fallback: collect stream
+        return new Promise((resolve, reject) => {
+            const chunks = [];
+            body.on('data', c => chunks.push(Buffer.isBuffer(c) ? c : Buffer.from(c)));
+            body.on('end', () => resolve(Buffer.concat(chunks).toString('utf-8')));
+            body.on('error', reject);
+        });
+    };
+    const log_string = await toStringFromBody(log_object.Body);

374-385: SignatureVersion correctly honored; no fixes needed
The wrapper’s should_use_sdk_v2 checks params.signatureVersion === 'v2' and falls back to the AWS SDK v2 client as intended.
• Optional: add a tls: endpoint.startsWith('https://') property for explicit clarity.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8cd5678 and 76640b4.

📒 Files selected for processing (3)
  • src/server/bg_services/replication_log_parser.js (8 hunks)
  • src/server/utils/replication_utils.js (1 hunks)
  • src/util/cloud_utils.js (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-06-17T12:59:51.543Z
Learnt from: naveenpaul1
PR: noobaa/noobaa-core#9042
File: src/util/cloud_utils.js:183-194
Timestamp: 2025-06-17T12:59:51.543Z
Learning: In the set_noobaa_s3_connection function in src/util/cloud_utils.js, internal NooBaa S3 endpoints (filtered by 'api': 's3', 'kind': 'INTERNAL') will always be non-secure, so tls: false is the correct setting despite the conditional https:// scheme logic.

Applied to files:

  • src/util/cloud_utils.js
🧬 Code graph analysis (2)
src/util/cloud_utils.js (3)
src/server/bg_services/replication_log_parser.js (1)
  • noobaa_s3_client (13-13)
src/sdk/namespace_s3.js (1)
  • noobaa_s3_client (14-14)
src/sdk/object_sdk.js (1)
  • noobaa_s3_client (28-28)
src/server/bg_services/replication_log_parser.js (3)
src/util/cloud_utils.js (5)
  • noobaa_s3_client (16-16)
  • require (9-9)
  • require (13-13)
  • require (14-14)
  • s3 (71-87)
src/server/system_services/account_server.js (10)
  • noobaa_s3_client (26-26)
  • require (13-13)
  • require (23-23)
  • require (24-24)
  • require (25-25)
  • s3 (1075-1075)
  • params (472-472)
  • params (873-873)
  • params (1137-1137)
  • cloud_utils (16-16)
src/server/system_services/bucket_server.js (6)
  • noobaa_s3_client (41-41)
  • require (19-19)
  • require (40-40)
  • s3 (454-454)
  • s3 (1304-1304)
  • cloud_utils (25-25)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Build Noobaa Image
  • GitHub Check: run-jest-unit-tests
  • GitHub Check: run-package-lock-validation
🔇 Additional comments (4)
src/server/bg_services/replication_log_parser.js (4)

13-14: Importing noobaa_s3_client for v3 S3 is consistent with the migration.

No issues.


63-64: Awaiting aws_parse_log_object is necessary after making it async.

Good catch to prevent races and partial parsing.


229-235: Doc update to v3 S3 type is accurate.

Minor but correct.


252-253: Switch to await s3.listObjectsV2(params) aligns with v3 S3 API.

  • Using StartAfter with a normalized prefix is fine when ContinuationToken is absent.
  • MaxKeys: 1 matches the loop’s throttled scan logic.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/test/unit_tests/internal/test_bucket_log_based_replication.test.js (2)

260-260: Fix concatenated log lines; they’re missing newlines and break parsing/counts.

Two log entries are glued on one line. Insert explicit newlines so each entry is parsed independently.

-            aaa test.bucket [12/Feb/2023:09:08:28 +0000] ... TLSv1.2 - -            aaa test.bucket [13/Feb/2023:09:08:56 +0000] ...
+            aaa test.bucket [12/Feb/2023:09:08:28 +0000] ... TLSv1.2 - -
+            aaa test.bucket [13/Feb/2023:09:08:56 +0000] ...

-            aaa test.bucket [13/Feb/2023:16:08:56 +0000] ... TLSv1.2 - -            aaa test.bucket [13/Feb/2023:15:08:28 +0000] ... - -
-            aaa test.bucket [13/Feb/2023:15:08:28 +0000] ... TLSv1.2 - -
+            aaa test.bucket [13/Feb/2023:16:08:56 +0000] ... TLSv1.2 - -
+            aaa test.bucket [13/Feb/2023:15:08:28 +0000] ... - -
+            aaa test.bucket [13/Feb/2023:15:08:28 +0000] ... TLSv1.2 - -

After fixing, please re-evaluate the exact expected logs.length. Consider asserting candidates instead of raw log count for stability.

Also applies to: 263-263


273-279: create_candidates returns a log per key, not an array; indexing with [0] is wrong.

This makes the assertion ineffective and can mask bugs.

-        for (const key in candidates) {
-            if (candidates[key][0]) {
-                expect(candidates[key][0].action).toEqual(action_candidates[key]);
-            }
-        }
+        for (const key of Object.keys(candidates)) {
+            expect(candidates[key].action).toEqual(action_candidates[key]);
+        }
🧹 Nitpick comments (2)
src/server/bg_services/replication_log_parser.js (2)

289-292: JSDoc nit: parameter description doesn’t match type.

It’s a string now, not an “AWS log object”.

- * @param {String} log_string  - AWS log object
+ * @param {string} log_string  - Raw AWS server access log text

271-276: Minor: ResponseContentType: 'json' on GetObject is unnecessary.

We read a text body and parse it ourselves; setting a response content type isn’t needed and may confuse future readers.

-        const res = await s3.getObject({
-            Bucket: bucket,
-            Key: key,
-            ResponseContentType: 'json'
-        });
+        const res = await s3.getObject({ Bucket: bucket, Key: key });
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 76640b4 and 5b5d651.

📒 Files selected for processing (4)
  • src/server/bg_services/replication_log_parser.js (8 hunks)
  • src/server/utils/replication_utils.js (1 hunks)
  • src/test/unit_tests/internal/test_bucket_log_based_replication.test.js (4 hunks)
  • src/util/cloud_utils.js (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/util/cloud_utils.js
  • src/server/utils/replication_utils.js
🧰 Additional context used
📓 Path-based instructions (1)
src/test/**/*.*

⚙️ CodeRabbit configuration file

src/test/**/*.*: Ensure that the PR includes tests for the changes.

Files:

  • src/test/unit_tests/internal/test_bucket_log_based_replication.test.js
🧠 Learnings (1)
📚 Learning: 2025-06-17T12:59:51.543Z
Learnt from: naveenpaul1
PR: noobaa/noobaa-core#9042
File: src/util/cloud_utils.js:183-194
Timestamp: 2025-06-17T12:59:51.543Z
Learning: In the set_noobaa_s3_connection function in src/util/cloud_utils.js, internal NooBaa S3 endpoints (filtered by 'api': 's3', 'kind': 'INTERNAL') will always be non-secure, so tls: false is the correct setting despite the conditional https:// scheme logic.

Applied to files:

  • src/server/bg_services/replication_log_parser.js
🧬 Code graph analysis (1)
src/server/bg_services/replication_log_parser.js (2)
src/util/cloud_utils.js (5)
  • noobaa_s3_client (16-16)
  • require (9-9)
  • require (13-13)
  • require (14-14)
  • s3 (71-87)
src/server/system_services/account_server.js (10)
  • noobaa_s3_client (26-26)
  • require (13-13)
  • require (23-23)
  • require (24-24)
  • require (25-25)
  • s3 (1075-1075)
  • params (472-472)
  • params (873-873)
  • params (1137-1137)
  • cloud_utils (16-16)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Build Noobaa Image
  • GitHub Check: run-package-lock-validation
  • GitHub Check: run-jest-unit-tests
🔇 Additional comments (1)
src/server/bg_services/replication_log_parser.js (1)

374-384: Confirm TLS handling for internal NooBaa S3 endpoints.

Per our prior learning, internal NooBaa S3 endpoints are non-secure (tls: false). Ensure get_s3_client_v3_params/get_requestHandler_with_suitable_agent enforce that for INTERNAL endpoints; otherwise explicitly set it here.

We used your stored preference from “Learnings”: internal NooBaa S3 uses non-TLS by design.

Signed-off-by: Naveen Paul <napaul@redhat.com>
@naveenpaul1 naveenpaul1 merged commit 6d6d17b into noobaa:master Sep 15, 2025
18 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants