-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inplace Composite and ScalarLoop Ops with multiple outputs #1322
Conversation
Marked as TODO as I think the ScalarLoop can already be inplaced as well as it uses temporary variables for the inner carry. Just need to add a test. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1322 +/- ##
==========================================
+ Coverage 81.98% 82.04% +0.05%
==========================================
Files 188 203 +15
Lines 48489 48837 +348
Branches 8673 8689 +16
==========================================
+ Hits 39756 40067 +311
- Misses 6582 6619 +37
Partials 2151 2151
🚀 New features to boost your workflow:
|
11bb648
to
32dcec9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for inplacing multiple outputs in Composite and ScalarLoop operations while also fixing a typo in the Numba vectorize code. Key changes include:
- New tests in various test suites (scalar, numba, and tensor) to verify the inplacing behavior of multi-output composite ops.
- Updates to the ScalarLoop op in pytensor/scalar/loop.py to allow passing extra keyword arguments (and corresponding changes in clone).
- Minor API adjustments in the numba dispatch code and candidate outputs handling in elemwise rewriting.
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
tests/scalar/test_loop.py | Adds tests for elemwise inplacing with multiple outputs. |
tests/link/numba/test_elemwise.py | Updates tests to cover inplace output type and multi-output inplacing. |
tests/tensor/rewriting/test_elemwise.py | Adjusts multi-output inplacing tests with new parametrization. |
pytensor/scalar/loop.py | Modifies init and clone methods to accept extra kwargs. |
pytensor/scalar/basic.py | Updates super() calls and bumps the c_code_cache_version_outer. |
pytensor/tensor/rewriting/elemwise.py | Removes candidate_input_idxs and uses a direct range for candidate outputs. |
pytensor/link/numba/dispatch/vectorize_codegen.py | Fixes a typo by replacing _get_value() with _getvalue(). |
Comments suppressed due to low confidence (2)
pytensor/tensor/rewriting/elemwise.py:165
- The candidate_input_idxs abstraction was removed in favor of using 'range(len(node.outputs))'. Please double-check that this change does not inadvertently allow in–place modifications on outputs that should remain protected.
candidate_outputs = [i for i in range(len(node.outputs)) if i not in baseline]
pytensor/link/numba/dispatch/vectorize_codegen.py:268
- The call to _getvalue() replaces the previous _get_value() method; confirm that this API change is consistent across all relevant objects.
outputs[inplace_idx]._getvalue()
32dcec9
to
8df9789
Compare
8df9789
to
5fc2cb8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enables inplacing for Composite and ScalarLoop operations with multiple outputs by deferring output assignment until all node computations are complete. Key changes include:
- Adjusting in-place optimizations in Composite and ScalarLoop ops.
- Updating tests to pass explicit dtypes and include trust_input flags.
- Refining code generation and cloning logic in scalar and tensor rewriting components.
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
tests/tensor/test_math_scipy.py | Added trust_input and explicit dtype parameters in function calls. |
tests/tensor/rewriting/test_elemwise.py | Modified test for multi-output inplacing with parametrized linker. |
tests/scalar/test_loop.py | Enhanced tests for in-place behavior in ScalarLoop ops. |
tests/link/numba/test_elemwise.py | Adjusted test naming and added new test for multiple inplaced outputs. |
pytensor/tensor/rewriting/elemwise.py | Simplified candidate output indexing for inplacing. |
pytensor/scalar/loop.py | Updated clone signature, in-place node code generation, and renaming for clarity. |
pytensor/scalar/basic.py | Revised str method formatting for cleaner output. |
pytensor/link/numba/dispatch/vectorize_codegen.py | Fixed a typo in the function call accessing a value. |
Comments suppressed due to low confidence (3)
tests/tensor/rewriting/test_elemwise.py:1127
- The new assertion for 'destroy_map' is stricter than before and assumes that the first output is always the one inplaced. Please verify that this change covers all valid scenarios for multi-output composites.
assert destroy_map == {0: [0]}
tests/scalar/test_loop.py:301
- Using 'is' to verify in-place aliasing for numpy arrays might lead to false negatives if equivalent but non-identical objects are returned. Confirm that this identity check is necessary and robust for in-place testing.
assert xv_res is (n_test, x0v_test, y0v_test, cv_test)[mutate_arg_idx]
pytensor/scalar/loop.py:308
- The newly introduced assignment block that maps carry variables to outputs should be reviewed to ensure it does not cause redundant or conflicting assignments in complex loop scenarios.
_c_code += f"{out} = {carry};\n"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@@ -115,7 +116,7 @@ def fgraph(self): | |||
self._fgraph = fgraph | |||
return self._fgraph | |||
|
|||
def clone(self): | |||
def clone(self, name=None, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated but I just checked and the signature of clone
varies quite a bit over the codebase. That's pretty maddening!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't think it's standardized, unlike node.clone
@@ -431,11 +431,13 @@ def test_gammaincc_ddk_performance(benchmark): | |||
x = vector("x") | |||
|
|||
out = gammaincc(k, x) | |||
grad_fn = function([k, x], grad(out.sum(), wrt=[k]), mode="FAST_RUN") | |||
grad_fn = function( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the story with this test? Changes seem unrelated to the rest of the PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To better benchmark the Op I set trust_input=True
, which required I be careful with the dtypes. The ScalarLoop is used in the gradient of gammaincc
Closes #138
This allows inplacing arbitrary number of inputs in Elemwise Composite (i.e., fused) Ops. The restriction was there because in the original codegen writing to the aliased outputs could affect the computations of other outputs that still referenced the just overridden aliased inputs.
The fix was trivial, only use the output names in the end after all nodes are computed. Until then the variables are stored in temporary variables as already happens for regular intermediate nodes in the Composite.
Did the same for
ScalarLoop
. The final outputs are only defined once the inner loop is finished, so no worries about aliasing too soon.Also a typo-bug in the Numba vectorize code that's fixed.
📚 Documentation preview 📚: https://pytensor--1322.org.readthedocs.build/en/1322/