Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batched Mode : Issue with Conditional Assignment of Closure With Varying Weight #1637

Merged

Conversation

danieldresser-ie
Copy link
Contributor

@danieldresser-ie danieldresser-ie commented Jan 14, 2023

I might need some help again to actually fix this one.

I've added a test where Ci is set to an emission closure with a varying weight on one side of a conditional, and a uniform weight on the other. In batched execution with optimization on, this results in corrupted closure pointers.

I've tracked the trigger down to the final case in RuntimeOptimizer::peephole2, which converts from a closure multiplied to a weight to a closure-with-scale constructor. If I comment out this one optimization, the tests pass, but I don't think the bug is in this function ... as far as I can tell, this optimization is the only way to call the closure-with-scale constructor, so the optimization is probably fine, but something else is failing to handle the closure-with-scale constructor - or the resulting wide closure pointer, where half of it points closures with uniform weights and half points to closures with varying weights?

Checklist:

  • I have read the contribution guidelines.
  • I have previously submitted a Contributor License Agreement.
  • I have updated the documentation, if applicable.
  • I have ensured that the change is tested somewhere in the testsuite (adding new test cases if necessary).
  • My code follows the prevailing code style of this project.

@danieldresser-ie danieldresser-ie force-pushed the closureConditional branch 2 times, most recently from 9dca197 to e9e7707 Compare January 16, 2023 23:04
@danieldresser-ie
Copy link
Contributor Author

OK, I think I have actually figured out the fix: in batched_llvm_gen, llvm_gen_closure explicitly loops through each lane of the closure pointers, but there didn't seem to be anything to prevent it from running on lanes that aren't part of the mask. I added a test_mask_lane to the existing branch that checks for null closures, and that was enough to get the test passing ( though definitely make sure that someone who knows this codebase better than I looks at this before merging it ).

llvm::Value* cond = rop.ll.op_and(
rop.ll.op_ne(rop.ll.ptr_cast(comp_ptr, rop.ll.type_void_ptr()),
rop.ll.void_ptr_null()),
rop.ll.test_mask_lane(mask, rop.ll.constant(lane_index)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

" rop.ll.constant(lane_index)" isn't wrong, but you should be able to pass lane_index in directly without creating the constant.

@AlexMWells
Copy link
Contributor

This appears more correct. Initially lanes may not of been allocated so the nullptr test would of prevented execution, but as soon as the same symbol gets populated from different masks (as it does in this if/else case) then nullptr is no longer enough. So yes checking if the lane is active is correct.

@AlexMWells
Copy link
Contributor

Looking at the test example,
if (u > 0.5) {
Ci = u * emission();
} else {
Ci = color(1, 0, 0) * emission();
}

And following the code, I see the underlying I see the underlying closure_component_allot would get called twice, once for each branch, the resulting wide Ci symbol would have pointers to two different blocks, which I suppose is fine. Perhaps a bit wasteful, but fine. Might be worth reviewing who/how that closure memory gets cleaned up/resused.

@AlexMWells
Copy link
Contributor

Excellent find, please add more complex unit tests to more fully exercise complex closure behaviors

Copy link
Contributor

@AlexMWells AlexMWells left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@danieldresser-ie
Copy link
Contributor Author

I made a very minor correction, replacing rop.ll.constant(lane_index) with merely lane_index as Alex pointed out.

I also tried to understand Alex's comment about checking up on how the memory gets released, and whether having different lanes pointing to different blocks would create issues. As far as I can tell, closures are just allocated from a SimplePool which never frees any elements, it only has a clear() method to start back at the beginning reusing all memory ... since there is no tracking of individual elements, I don't see any problems arising from pointing to two different blocks ( though it is a little bit inefficient ).

@johnhaddon
Copy link
Contributor

Is this fix likely to make it into an official release soonish? We'd love to include it in the next major version of Gaffer so we can enable batched mode for our OSL-based geometry and image processing nodes. Does anything else need to happen from our side?

@AlexMWells AlexMWells requested a review from lgritz January 25, 2023 16:25
Signed-off-by: Daniel Dresser <danield@image-engine.com>
@lgritz lgritz force-pushed the closureConditional branch from 2c92ef8 to 29a8ab5 Compare January 25, 2023 19:00
@lgritz
Copy link
Collaborator

lgritz commented Jan 25, 2023

The Mac CI failure appears unrelated to these changes, so I'm going to merge.

@lgritz lgritz merged commit 5dad507 into AcademySoftwareFoundation:main Jan 25, 2023
@lgritz
Copy link
Collaborator

lgritz commented Jan 25, 2023

@johnhaddon Merged now, I will immediately backport to the release branch. I can cut a tagged release soon, but want to get a couple other outstanding PRs merged finalized and merged first.

@lgritz
Copy link
Collaborator

lgritz commented Jan 25, 2023

Actually, I just noticed that it's already the 25th (my how time gets away from us). There's already a release scheduled for Feb 1. @johnhaddon is that adequate, or do you need a tagged release before then?

@johnhaddon
Copy link
Contributor

Thanks for the speedy response @lgritz! Feb 1st will be great for us...

lgritz pushed a commit to lgritz/OpenShadingLanguage that referenced this pull request Jan 28, 2023
…n#1637)

Signed-off-by: Daniel Dresser <danield@image-engine.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants