Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inplace Composite and ScalarLoop Ops with multiple outputs #1322

Merged
merged 5 commits into from
Apr 8, 2025

Conversation

ricardoV94
Copy link
Member

@ricardoV94 ricardoV94 commented Mar 25, 2025

Closes #138

This allows inplacing arbitrary number of inputs in Elemwise Composite (i.e., fused) Ops. The restriction was there because in the original codegen writing to the aliased outputs could affect the computations of other outputs that still referenced the just overridden aliased inputs.

The fix was trivial, only use the output names in the end after all nodes are computed. Until then the variables are stored in temporary variables as already happens for regular intermediate nodes in the Composite.

Did the same for ScalarLoop. The final outputs are only defined once the inner loop is finished, so no worries about aliasing too soon.

Also a typo-bug in the Numba vectorize code that's fixed.


📚 Documentation preview 📚: https://pytensor--1322.org.readthedocs.build/en/1322/

@ricardoV94 ricardoV94 added bug Something isn't working enhancement New feature or request numba performance memory_optimization labels Mar 25, 2025
@ricardoV94
Copy link
Member Author

Marked as TODO as I think the ScalarLoop can already be inplaced as well as it uses temporary variables for the inner carry. Just need to add a test.

Copy link

codecov bot commented Mar 25, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.04%. Comparing base (8a7356c) to head (d03dc8a).
Report is 19 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1322      +/-   ##
==========================================
+ Coverage   81.98%   82.04%   +0.05%     
==========================================
  Files         188      203      +15     
  Lines       48489    48837     +348     
  Branches     8673     8689      +16     
==========================================
+ Hits        39756    40067     +311     
- Misses       6582     6619      +37     
  Partials     2151     2151              
Files with missing lines Coverage Δ
pytensor/link/numba/dispatch/vectorize_codegen.py 92.03% <ø> (+0.88%) ⬆️
pytensor/scalar/basic.py 80.48% <100.00%> (-0.04%) ⬇️
pytensor/scalar/loop.py 94.37% <100.00%> (+4.25%) ⬆️
pytensor/tensor/rewriting/elemwise.py 91.02% <100.00%> (-0.11%) ⬇️

... and 29 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ricardoV94 ricardoV94 changed the title Inplace Composite Ops with multiple outputs Inplace Composite and ScalarLoop Ops with multiple outputs Mar 26, 2025
@ricardoV94 ricardoV94 requested a review from Copilot March 26, 2025 21:56
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for inplacing multiple outputs in Composite and ScalarLoop operations while also fixing a typo in the Numba vectorize code. Key changes include:

  • New tests in various test suites (scalar, numba, and tensor) to verify the inplacing behavior of multi-output composite ops.
  • Updates to the ScalarLoop op in pytensor/scalar/loop.py to allow passing extra keyword arguments (and corresponding changes in clone).
  • Minor API adjustments in the numba dispatch code and candidate outputs handling in elemwise rewriting.

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tests/scalar/test_loop.py Adds tests for elemwise inplacing with multiple outputs.
tests/link/numba/test_elemwise.py Updates tests to cover inplace output type and multi-output inplacing.
tests/tensor/rewriting/test_elemwise.py Adjusts multi-output inplacing tests with new parametrization.
pytensor/scalar/loop.py Modifies init and clone methods to accept extra kwargs.
pytensor/scalar/basic.py Updates super() calls and bumps the c_code_cache_version_outer.
pytensor/tensor/rewriting/elemwise.py Removes candidate_input_idxs and uses a direct range for candidate outputs.
pytensor/link/numba/dispatch/vectorize_codegen.py Fixes a typo by replacing _get_value() with _getvalue().
Comments suppressed due to low confidence (2)

pytensor/tensor/rewriting/elemwise.py:165

  • The candidate_input_idxs abstraction was removed in favor of using 'range(len(node.outputs))'. Please double-check that this change does not inadvertently allow in–place modifications on outputs that should remain protected.
candidate_outputs = [i for i in range(len(node.outputs)) if i not in baseline]

pytensor/link/numba/dispatch/vectorize_codegen.py:268

  • The call to _getvalue() replaces the previous _get_value() method; confirm that this API change is consistent across all relevant objects.
outputs[inplace_idx]._getvalue()

@ricardoV94 ricardoV94 marked this pull request as ready for review March 27, 2025 10:42
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enables inplacing for Composite and ScalarLoop operations with multiple outputs by deferring output assignment until all node computations are complete. Key changes include:

  • Adjusting in-place optimizations in Composite and ScalarLoop ops.
  • Updating tests to pass explicit dtypes and include trust_input flags.
  • Refining code generation and cloning logic in scalar and tensor rewriting components.

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tests/tensor/test_math_scipy.py Added trust_input and explicit dtype parameters in function calls.
tests/tensor/rewriting/test_elemwise.py Modified test for multi-output inplacing with parametrized linker.
tests/scalar/test_loop.py Enhanced tests for in-place behavior in ScalarLoop ops.
tests/link/numba/test_elemwise.py Adjusted test naming and added new test for multiple inplaced outputs.
pytensor/tensor/rewriting/elemwise.py Simplified candidate output indexing for inplacing.
pytensor/scalar/loop.py Updated clone signature, in-place node code generation, and renaming for clarity.
pytensor/scalar/basic.py Revised str method formatting for cleaner output.
pytensor/link/numba/dispatch/vectorize_codegen.py Fixed a typo in the function call accessing a value.
Comments suppressed due to low confidence (3)

tests/tensor/rewriting/test_elemwise.py:1127

  • The new assertion for 'destroy_map' is stricter than before and assumes that the first output is always the one inplaced. Please verify that this change covers all valid scenarios for multi-output composites.
assert destroy_map == {0: [0]}

tests/scalar/test_loop.py:301

  • Using 'is' to verify in-place aliasing for numpy arrays might lead to false negatives if equivalent but non-identical objects are returned. Confirm that this identity check is necessary and robust for in-place testing.
assert xv_res is (n_test, x0v_test, y0v_test, cv_test)[mutate_arg_idx]

pytensor/scalar/loop.py:308

  • The newly introduced assignment block that maps carry variables to outputs should be reviewed to ensure it does not cause redundant or conflicting assignments in complex loop scenarios.
            _c_code += f"{out} = {carry};\n"

Copy link
Member

@jessegrabowski jessegrabowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@@ -115,7 +116,7 @@ def fgraph(self):
self._fgraph = fgraph
return self._fgraph

def clone(self):
def clone(self, name=None, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated but I just checked and the signature of clone varies quite a bit over the codebase. That's pretty maddening!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't think it's standardized, unlike node.clone

@@ -431,11 +431,13 @@ def test_gammaincc_ddk_performance(benchmark):
x = vector("x")

out = gammaincc(k, x)
grad_fn = function([k, x], grad(out.sum(), wrt=[k]), mode="FAST_RUN")
grad_fn = function(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the story with this test? Changes seem unrelated to the rest of the PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To better benchmark the Op I set trust_input=True, which required I be careful with the dtypes. The ScalarLoop is used in the gradient of gammaincc

@ricardoV94 ricardoV94 merged commit 3c66aa6 into pymc-devs:main Apr 8, 2025
73 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request memory_optimization numba performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement specialized inplace rewriter for Elemwise Composite Ops
2 participants