Skip to content

Conversation

chschneider
Copy link

@chschneider chschneider commented Sep 1, 2025

Fixes a memory leak of 128k per zstd_uncompress/zstd_uncompress_dict call (see issue #93)

Summary by CodeRabbit

  • Bug Fixes

    • Resolved a memory usage regression during Zstandard decompression, ensuring no net memory growth after a compress/decompress cycle.
    • Improved memory efficiency and stability for Zstandard operations.
  • Tests

    • Added a regression test that verifies memory usage remains unchanged before and after a Zstandard compress/decompress cycle.

@coderabbitai
Copy link

coderabbitai bot commented Sep 1, 2025

Walkthrough

Introduces a PHPT test verifying no net memory growth across a zstd compress/decompress cycle and updates decompression buffer management in zstd.c to use resizing (erealloc) for the output buffer instead of fresh allocations in specific paths.

Changes

Cohort / File(s) Summary
Decompression buffer management
zstd.c
Replaces output buffer allocations in zstd_uncompress and zstd_uncompress_dict from emalloc(size) to erealloc(ctx.output.dst, size), retaining initial allocation elsewhere; surrounding logic and error paths unchanged.
Memory regression test
tests/memory_001.phpt
Adds test that measures memory before/after zstd_compress and zstd_uncompress of a 1000-byte repeated string, asserting zero delta.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant T as PHPT Test
    participant P as PHP zstd Extension
    participant Z as zstd_uncompress(...)
    Note over T: Measure memory start
    T->>P: zstd_compress("a" x 1000)
    T->>P: zstd_uncompress(compressed)
    activate P
    P->>Z: Decompress request
    activate Z
    alt Output buffer needs growth
        Z->>Z: erealloc(ctx.output.dst, newSize)
        Note right of Z: Resize existing buffer in place
    else Sufficient capacity
        Z->>Z: Use existing buffer
    end
    Z-->>P: Decompressed data
    deactivate Z
    P-->>T: Return data
    deactivate P
    Note over T: Measure memory end and compute delta
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Poem

A nibble of bytes, a squeeze of zstd air,
I hop through buffers with careful repair.
No crumbs left behind, no memory to bloat,
I resize my nest—so sleek, so remote.
Compress, decompress—zero change, I boast!
A tidy warren from coast to coast. 🐇💾

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
zstd.c (1)

1376-1385: Dictionary zend_string not released in output handler path (memory leak).

php_zstd_output_handler_context_start() loads a zend_string dict and passes it into php_zstd_context_create_compress(), but never releases the zend_string. Compare with php_stream_zstd_opener(), which explicitly zend_string_release(dict). Please free it after creating the context.

Apply:

 static zend_result php_zstd_output_handler_context_start(php_zstd_context *ctx)
 {
   int level = PHP_ZSTD_G(output_compression_level);
   if (!zstd_check_compress_level(level) || level == 0) {
     level = ZSTD_CLEVEL_DEFAULT;
   }

   zend_string *dict = php_zstd_output_handler_load_dict(ctx);
   if (dict == NULL) {
     PHP_ZSTD_G(compression_coding) &= ~PHP_ZSTD_ENCODING_DCZ;
   }
   if (!PHP_ZSTD_G(compression_coding)) {
     return FAILURE;
   }

-  return php_zstd_context_create_compress(ctx, level, dict);
+  zend_result res = php_zstd_context_create_compress(ctx, level, dict);
+  if (dict) {
+    zend_string_release(dict);
+  }
+  return res;
 }
🧹 Nitpick comments (2)
zstd.c (1)

510-516: Guard against frames larger than SIZE_MAX on 32-bit builds.

size is uint64_t but ZSTD_outBuffer.size is size_t. If the frame advertises > SIZE_MAX, erealloc will truncate on 32-bit. Add a sanity check to fail early on such inputs.

 size = ZSTD_getFrameContentSize(ZSTR_VAL(input), ZSTR_LEN(input));
 if (size == ZSTD_CONTENTSIZE_ERROR) {
   ZSTD_WARNING("it was not compressed by zstd");
   RETURN_FALSE;
 } else if (size == ZSTD_CONTENTSIZE_UNKNOWN) {
   size = ZSTD_DStreamOutSize();
+} else if (size > SIZE_MAX) {
+  ZSTD_WARNING("frame content size exceeds platform limits");
+  RETURN_FALSE;
 }
tests/memory_001.phpt (1)

1-11: Strengthen the regression test; also cover dict path.

Asserting exact 0 can be brittle across engines. Consider looping to amplify leaks and adding a dict case; assert the delta stays very small (or 0 if you prefer). Example:

 --FILE--
 <?php
-$start = memory_get_usage();
-zstd_uncompress(zstd_compress(str_repeat('a', 1000)));
-$end = memory_get_usage();
-echo ($end - $start) . "\n";
+$payload = str_repeat('a', 1000);
+$start = memory_get_usage();
+for ($i = 0; $i < 5; $i++) {
+    $c = zstd_compress($payload);
+    zstd_uncompress($c);
+}
+$end = memory_get_usage();
+echo ($end - $start) . "\n";
 ?>
 --EXPECT--
-0
+0

Optionally add a second PHPT exercising zstd_uncompress_dict() similarly.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled
  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1577e5d and dacc2b3.

📒 Files selected for processing (2)
  • tests/memory_001.phpt (1 hunks)
  • zstd.c (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/memory_001.phpt (1)
zstd.stub.php (2)
  • zstd_uncompress_add (49-49)
  • Context (85-87)
🔇 Additional comments (2)
zstd.c (2)

530-533: Leak fix via erealloc is correct.

Switching to erealloc reuses the preallocated ZSTD_DStreamOutSize() buffer instead of orphaning it, eliminating the 128 KiB per-call leak. ctx->output.dst is initialized in php_zstd_context_create_decompress and freed in php_zstd_context_free, so lifecycle stays sound.


649-652: Same leak fix for dict path — LGTM.

Mirroring the erealloc change in zstd_uncompress_dict closes the identical leak here as well.

@kjdev kjdev merged commit 762835c into kjdev:master Sep 1, 2025
95 checks passed
@kjdev
Copy link
Owner

kjdev commented Sep 1, 2025

special thanks.

@chschneider chschneider deleted the fix-memory-leak-uncompress branch September 1, 2025 21:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants