-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Relay] PlanDevices supports 'free' on_device annotations #9693
Conversation
This is in support of apache#9613, which allows PlanDevices to be run after lowering so as to flow memory constraints in and out of PrimFuncs. That requires a way to insert device_copies when the memory scopes chosen during separate lowering of fused primitive functions clashes, but otherwise avoid device_copies when scopes can be chosen so as to avoid them. We support that by generalizing the "on_device" annotation to allow the device constraint to be independently controlled for its 'body' and 'result'. # Standard user annotation: body is constrained to S on_device(body, S) # Used by PlanDevices to 'fix' expression to S # (was is_fixed=True) on_device(body, S, constrain_result=True) # Used by PlanDevices to indicate a device_copy can be # inserted if necessary. on_device(body, S, constrain_body=False) # Supported, but currently has no use. on_device(body, S, constrain_result=True, constrain_body=False) A few extra odd's 'n ends collected along the way: - Some CallLowered cleanup which I found useful. - The usual extra debugging output needed as I debugged. In return I removed some particularly verbose logging I'd added while tracking down unexpected object copies. - Cleanup warnings from clang-12 as I touch files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, LGTM! I left some suggestions to improve comments and some clarifying questions for my own edification.
<< "Cannot constrain intermediate result of nested on_device calls to different SEScopes"; | ||
} | ||
// We can now ignore the intermediate constraints, if any. | ||
return OnDevice(props.body, (constrain_inner || constrain_outer) ? outer : inner, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the inner is constrained, we return outer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that doesn't look right does it? Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it is correct -- added a comment to explain.
Thanks @electriclilies and @jroesch , merged your post-submit comments into the sequel. |
…straints. This PR: 1) Makes PlanDevices consider lowered calls when solving device domain constraints. 2) Connects the storage scopes on PrimFunc parameters (encoded in their Buffer data Var type annotation PointerTypes storage_scope fields) to the memory_scope fields of the SEScopes which PlanDevices unifies over. 3) Allows new device_copies to be inserted on the arguments and results of lowered calls so as to acount for any memory scope mismatches which are now apparent. [device_planner.cc has main changes, rest is secondary.] In the short term we'd like to use this machinery to flow memory scope choices made during lowering back out into the overall Relay program. In the longer term we'd also like to be able to use memory scopes to influence the lowering of yet-to-be-lowered functions (or lowered functions which have yet to been scheduled, a distinction now possible with TensorIR). - Memory scope constraints can flow both out of and in to PrimFuncs introduced by LowerTE. In TIR memory scopes are represented by 'storage scopes' on the PointerType type annotations on TIR Buffer data variables. - It is straightforward to extract memory scopes from PrimFuncs by looking at the PrimFunc's buffer_map. We do this is 'phase 1' of PlanDevices, which collects all the device constraints implied by - However, pushing memory constraints in to PrimFuncs is more challenging due to buffer aliasing. This aspect is still experimental. - Allow device_copies to be inserted for both arguments and results of PrimFunc calls, on the assumption PlanDevices has already established a consistent device assignment prior to lowering and any new mismatch is required to match up memory scopes. We use the new 'free' on_device annotations to implement this. Coming along for the ride: - To make unit tests of mixed Relay/TIR functions possible needed to be able to supply a checked_type to GlobalVar since that's currently the only way to give a Relay type to PrimFuncs. - Use GenSym to get unique var names in ANF & partial eval so easier to diff debug output between passes and connect program fragments back into the overall program. Relying on pretty-printing to automagically unique-ify var names is certainly cute but until we have better span support is very hard to work with. - Realized both dead_code.cc and fold_constant.cc would happily move values into a different lexical virtual device context since device_planner.cc was being 'clever' and eliding on_devices for let-bound values when there's no change. Fixed so that every let-bound value has an on_device. Will be much better after apache/tvm-rfcs#45 is implemented. - Make build -Werror clean for clang-12 (mostly move fixups). - Address post-submit comments from apache#9693.
…straints. (#9613) * [Relay] Re-run PlanDevices after LowerTE to flow new memory scope constraints. This PR: 1) Makes PlanDevices consider lowered calls when solving device domain constraints. 2) Connects the storage scopes on PrimFunc parameters (encoded in their Buffer data Var type annotation PointerTypes storage_scope fields) to the memory_scope fields of the SEScopes which PlanDevices unifies over. 3) Allows new device_copies to be inserted on the arguments and results of lowered calls so as to acount for any memory scope mismatches which are now apparent. [device_planner.cc has main changes, rest is secondary.] In the short term we'd like to use this machinery to flow memory scope choices made during lowering back out into the overall Relay program. In the longer term we'd also like to be able to use memory scopes to influence the lowering of yet-to-be-lowered functions (or lowered functions which have yet to been scheduled, a distinction now possible with TensorIR). - Memory scope constraints can flow both out of and in to PrimFuncs introduced by LowerTE. In TIR memory scopes are represented by 'storage scopes' on the PointerType type annotations on TIR Buffer data variables. - It is straightforward to extract memory scopes from PrimFuncs by looking at the PrimFunc's buffer_map. We do this is 'phase 1' of PlanDevices, which collects all the device constraints implied by - However, pushing memory constraints in to PrimFuncs is more challenging due to buffer aliasing. This aspect is still experimental. - Allow device_copies to be inserted for both arguments and results of PrimFunc calls, on the assumption PlanDevices has already established a consistent device assignment prior to lowering and any new mismatch is required to match up memory scopes. We use the new 'free' on_device annotations to implement this. Coming along for the ride: - To make unit tests of mixed Relay/TIR functions possible needed to be able to supply a checked_type to GlobalVar since that's currently the only way to give a Relay type to PrimFuncs. - Use GenSym to get unique var names in ANF & partial eval so easier to diff debug output between passes and connect program fragments back into the overall program. Relying on pretty-printing to automagically unique-ify var names is certainly cute but until we have better span support is very hard to work with. - Realized both dead_code.cc and fold_constant.cc would happily move values into a different lexical virtual device context since device_planner.cc was being 'clever' and eliding on_devices for let-bound values when there's no change. Fixed so that every let-bound value has an on_device. Will be much better after apache/tvm-rfcs#45 is implemented. - Make build -Werror clean for clang-12 (mostly move fixups). - Address post-submit comments from #9693. * [checkpoint] thread safe GenSym
* [Relay] PlanDevices supports 'free' on_device annotations This is in support of apache#9613, which allows PlanDevices to be run after lowering so as to flow memory constraints in and out of PrimFuncs. That requires a way to insert device_copies when the memory scopes chosen during separate lowering of fused primitive functions clashes, but otherwise avoid device_copies when scopes can be chosen so as to avoid them. We support that by generalizing the "on_device" annotation to allow the device constraint to be independently controlled for its 'body' and 'result'. # Standard user annotation: body is constrained to S on_device(body, S) # Used by PlanDevices to 'fix' expression to S # (was is_fixed=True) on_device(body, S, constrain_result=True) # Used by PlanDevices to indicate a device_copy can be # inserted if necessary. on_device(body, S, constrain_body=False) # Supported, but currently has no use. on_device(body, S, constrain_result=True, constrain_body=False) A few extra odd's 'n ends collected along the way: - Some CallLowered cleanup which I found useful. - The usual extra debugging output needed as I debugged. In return I removed some particularly verbose logging I'd added while tracking down unexpected object copies. - Cleanup warnings from clang-12 as I touch files. * [checkpoint] unused var
…straints. (apache#9613) * [Relay] Re-run PlanDevices after LowerTE to flow new memory scope constraints. This PR: 1) Makes PlanDevices consider lowered calls when solving device domain constraints. 2) Connects the storage scopes on PrimFunc parameters (encoded in their Buffer data Var type annotation PointerTypes storage_scope fields) to the memory_scope fields of the SEScopes which PlanDevices unifies over. 3) Allows new device_copies to be inserted on the arguments and results of lowered calls so as to acount for any memory scope mismatches which are now apparent. [device_planner.cc has main changes, rest is secondary.] In the short term we'd like to use this machinery to flow memory scope choices made during lowering back out into the overall Relay program. In the longer term we'd also like to be able to use memory scopes to influence the lowering of yet-to-be-lowered functions (or lowered functions which have yet to been scheduled, a distinction now possible with TensorIR). - Memory scope constraints can flow both out of and in to PrimFuncs introduced by LowerTE. In TIR memory scopes are represented by 'storage scopes' on the PointerType type annotations on TIR Buffer data variables. - It is straightforward to extract memory scopes from PrimFuncs by looking at the PrimFunc's buffer_map. We do this is 'phase 1' of PlanDevices, which collects all the device constraints implied by - However, pushing memory constraints in to PrimFuncs is more challenging due to buffer aliasing. This aspect is still experimental. - Allow device_copies to be inserted for both arguments and results of PrimFunc calls, on the assumption PlanDevices has already established a consistent device assignment prior to lowering and any new mismatch is required to match up memory scopes. We use the new 'free' on_device annotations to implement this. Coming along for the ride: - To make unit tests of mixed Relay/TIR functions possible needed to be able to supply a checked_type to GlobalVar since that's currently the only way to give a Relay type to PrimFuncs. - Use GenSym to get unique var names in ANF & partial eval so easier to diff debug output between passes and connect program fragments back into the overall program. Relying on pretty-printing to automagically unique-ify var names is certainly cute but until we have better span support is very hard to work with. - Realized both dead_code.cc and fold_constant.cc would happily move values into a different lexical virtual device context since device_planner.cc was being 'clever' and eliding on_devices for let-bound values when there's no change. Fixed so that every let-bound value has an on_device. Will be much better after apache/tvm-rfcs#45 is implemented. - Make build -Werror clean for clang-12 (mostly move fixups). - Address post-submit comments from apache#9693. * [checkpoint] thread safe GenSym
* [Relay] PlanDevices supports 'free' on_device annotations This is in support of apache#9613, which allows PlanDevices to be run after lowering so as to flow memory constraints in and out of PrimFuncs. That requires a way to insert device_copies when the memory scopes chosen during separate lowering of fused primitive functions clashes, but otherwise avoid device_copies when scopes can be chosen so as to avoid them. We support that by generalizing the "on_device" annotation to allow the device constraint to be independently controlled for its 'body' and 'result'. # Standard user annotation: body is constrained to S on_device(body, S) # Used by PlanDevices to 'fix' expression to S # (was is_fixed=True) on_device(body, S, constrain_result=True) # Used by PlanDevices to indicate a device_copy can be # inserted if necessary. on_device(body, S, constrain_body=False) # Supported, but currently has no use. on_device(body, S, constrain_result=True, constrain_body=False) A few extra odd's 'n ends collected along the way: - Some CallLowered cleanup which I found useful. - The usual extra debugging output needed as I debugged. In return I removed some particularly verbose logging I'd added while tracking down unexpected object copies. - Cleanup warnings from clang-12 as I touch files. * [checkpoint] unused var
…straints. (apache#9613) * [Relay] Re-run PlanDevices after LowerTE to flow new memory scope constraints. This PR: 1) Makes PlanDevices consider lowered calls when solving device domain constraints. 2) Connects the storage scopes on PrimFunc parameters (encoded in their Buffer data Var type annotation PointerTypes storage_scope fields) to the memory_scope fields of the SEScopes which PlanDevices unifies over. 3) Allows new device_copies to be inserted on the arguments and results of lowered calls so as to acount for any memory scope mismatches which are now apparent. [device_planner.cc has main changes, rest is secondary.] In the short term we'd like to use this machinery to flow memory scope choices made during lowering back out into the overall Relay program. In the longer term we'd also like to be able to use memory scopes to influence the lowering of yet-to-be-lowered functions (or lowered functions which have yet to been scheduled, a distinction now possible with TensorIR). - Memory scope constraints can flow both out of and in to PrimFuncs introduced by LowerTE. In TIR memory scopes are represented by 'storage scopes' on the PointerType type annotations on TIR Buffer data variables. - It is straightforward to extract memory scopes from PrimFuncs by looking at the PrimFunc's buffer_map. We do this is 'phase 1' of PlanDevices, which collects all the device constraints implied by - However, pushing memory constraints in to PrimFuncs is more challenging due to buffer aliasing. This aspect is still experimental. - Allow device_copies to be inserted for both arguments and results of PrimFunc calls, on the assumption PlanDevices has already established a consistent device assignment prior to lowering and any new mismatch is required to match up memory scopes. We use the new 'free' on_device annotations to implement this. Coming along for the ride: - To make unit tests of mixed Relay/TIR functions possible needed to be able to supply a checked_type to GlobalVar since that's currently the only way to give a Relay type to PrimFuncs. - Use GenSym to get unique var names in ANF & partial eval so easier to diff debug output between passes and connect program fragments back into the overall program. Relying on pretty-printing to automagically unique-ify var names is certainly cute but until we have better span support is very hard to work with. - Realized both dead_code.cc and fold_constant.cc would happily move values into a different lexical virtual device context since device_planner.cc was being 'clever' and eliding on_devices for let-bound values when there's no change. Fixed so that every let-bound value has an on_device. Will be much better after apache/tvm-rfcs#45 is implemented. - Make build -Werror clean for clang-12 (mostly move fixups). - Address post-submit comments from apache#9693. * [checkpoint] thread safe GenSym
* [Relay] PlanDevices supports 'free' on_device annotations This is in support of apache#9613, which allows PlanDevices to be run after lowering so as to flow memory constraints in and out of PrimFuncs. That requires a way to insert device_copies when the memory scopes chosen during separate lowering of fused primitive functions clashes, but otherwise avoid device_copies when scopes can be chosen so as to avoid them. We support that by generalizing the "on_device" annotation to allow the device constraint to be independently controlled for its 'body' and 'result'. # Standard user annotation: body is constrained to S on_device(body, S) # Used by PlanDevices to 'fix' expression to S # (was is_fixed=True) on_device(body, S, constrain_result=True) # Used by PlanDevices to indicate a device_copy can be # inserted if necessary. on_device(body, S, constrain_body=False) # Supported, but currently has no use. on_device(body, S, constrain_result=True, constrain_body=False) A few extra odd's 'n ends collected along the way: - Some CallLowered cleanup which I found useful. - The usual extra debugging output needed as I debugged. In return I removed some particularly verbose logging I'd added while tracking down unexpected object copies. - Cleanup warnings from clang-12 as I touch files. * [checkpoint] unused var
…straints. (apache#9613) * [Relay] Re-run PlanDevices after LowerTE to flow new memory scope constraints. This PR: 1) Makes PlanDevices consider lowered calls when solving device domain constraints. 2) Connects the storage scopes on PrimFunc parameters (encoded in their Buffer data Var type annotation PointerTypes storage_scope fields) to the memory_scope fields of the SEScopes which PlanDevices unifies over. 3) Allows new device_copies to be inserted on the arguments and results of lowered calls so as to acount for any memory scope mismatches which are now apparent. [device_planner.cc has main changes, rest is secondary.] In the short term we'd like to use this machinery to flow memory scope choices made during lowering back out into the overall Relay program. In the longer term we'd also like to be able to use memory scopes to influence the lowering of yet-to-be-lowered functions (or lowered functions which have yet to been scheduled, a distinction now possible with TensorIR). - Memory scope constraints can flow both out of and in to PrimFuncs introduced by LowerTE. In TIR memory scopes are represented by 'storage scopes' on the PointerType type annotations on TIR Buffer data variables. - It is straightforward to extract memory scopes from PrimFuncs by looking at the PrimFunc's buffer_map. We do this is 'phase 1' of PlanDevices, which collects all the device constraints implied by - However, pushing memory constraints in to PrimFuncs is more challenging due to buffer aliasing. This aspect is still experimental. - Allow device_copies to be inserted for both arguments and results of PrimFunc calls, on the assumption PlanDevices has already established a consistent device assignment prior to lowering and any new mismatch is required to match up memory scopes. We use the new 'free' on_device annotations to implement this. Coming along for the ride: - To make unit tests of mixed Relay/TIR functions possible needed to be able to supply a checked_type to GlobalVar since that's currently the only way to give a Relay type to PrimFuncs. - Use GenSym to get unique var names in ANF & partial eval so easier to diff debug output between passes and connect program fragments back into the overall program. Relying on pretty-printing to automagically unique-ify var names is certainly cute but until we have better span support is very hard to work with. - Realized both dead_code.cc and fold_constant.cc would happily move values into a different lexical virtual device context since device_planner.cc was being 'clever' and eliding on_devices for let-bound values when there's no change. Fixed so that every let-bound value has an on_device. Will be much better after apache/tvm-rfcs#45 is implemented. - Make build -Werror clean for clang-12 (mostly move fixups). - Address post-submit comments from apache#9693. * [checkpoint] thread safe GenSym
* [Relay] PlanDevices supports 'free' on_device annotations This is in support of apache#9613, which allows PlanDevices to be run after lowering so as to flow memory constraints in and out of PrimFuncs. That requires a way to insert device_copies when the memory scopes chosen during separate lowering of fused primitive functions clashes, but otherwise avoid device_copies when scopes can be chosen so as to avoid them. We support that by generalizing the "on_device" annotation to allow the device constraint to be independently controlled for its 'body' and 'result'. # Standard user annotation: body is constrained to S on_device(body, S) # Used by PlanDevices to 'fix' expression to S # (was is_fixed=True) on_device(body, S, constrain_result=True) # Used by PlanDevices to indicate a device_copy can be # inserted if necessary. on_device(body, S, constrain_body=False) # Supported, but currently has no use. on_device(body, S, constrain_result=True, constrain_body=False) A few extra odd's 'n ends collected along the way: - Some CallLowered cleanup which I found useful. - The usual extra debugging output needed as I debugged. In return I removed some particularly verbose logging I'd added while tracking down unexpected object copies. - Cleanup warnings from clang-12 as I touch files. * [checkpoint] unused var
…straints. (apache#9613) * [Relay] Re-run PlanDevices after LowerTE to flow new memory scope constraints. This PR: 1) Makes PlanDevices consider lowered calls when solving device domain constraints. 2) Connects the storage scopes on PrimFunc parameters (encoded in their Buffer data Var type annotation PointerTypes storage_scope fields) to the memory_scope fields of the SEScopes which PlanDevices unifies over. 3) Allows new device_copies to be inserted on the arguments and results of lowered calls so as to acount for any memory scope mismatches which are now apparent. [device_planner.cc has main changes, rest is secondary.] In the short term we'd like to use this machinery to flow memory scope choices made during lowering back out into the overall Relay program. In the longer term we'd also like to be able to use memory scopes to influence the lowering of yet-to-be-lowered functions (or lowered functions which have yet to been scheduled, a distinction now possible with TensorIR). - Memory scope constraints can flow both out of and in to PrimFuncs introduced by LowerTE. In TIR memory scopes are represented by 'storage scopes' on the PointerType type annotations on TIR Buffer data variables. - It is straightforward to extract memory scopes from PrimFuncs by looking at the PrimFunc's buffer_map. We do this is 'phase 1' of PlanDevices, which collects all the device constraints implied by - However, pushing memory constraints in to PrimFuncs is more challenging due to buffer aliasing. This aspect is still experimental. - Allow device_copies to be inserted for both arguments and results of PrimFunc calls, on the assumption PlanDevices has already established a consistent device assignment prior to lowering and any new mismatch is required to match up memory scopes. We use the new 'free' on_device annotations to implement this. Coming along for the ride: - To make unit tests of mixed Relay/TIR functions possible needed to be able to supply a checked_type to GlobalVar since that's currently the only way to give a Relay type to PrimFuncs. - Use GenSym to get unique var names in ANF & partial eval so easier to diff debug output between passes and connect program fragments back into the overall program. Relying on pretty-printing to automagically unique-ify var names is certainly cute but until we have better span support is very hard to work with. - Realized both dead_code.cc and fold_constant.cc would happily move values into a different lexical virtual device context since device_planner.cc was being 'clever' and eliding on_devices for let-bound values when there's no change. Fixed so that every let-bound value has an on_device. Will be much better after apache/tvm-rfcs#45 is implemented. - Make build -Werror clean for clang-12 (mostly move fixups). - Address post-submit comments from apache#9693. * [checkpoint] thread safe GenSym
* [Relay] PlanDevices supports 'free' on_device annotations This is in support of apache#9613, which allows PlanDevices to be run after lowering so as to flow memory constraints in and out of PrimFuncs. That requires a way to insert device_copies when the memory scopes chosen during separate lowering of fused primitive functions clashes, but otherwise avoid device_copies when scopes can be chosen so as to avoid them. We support that by generalizing the "on_device" annotation to allow the device constraint to be independently controlled for its 'body' and 'result'. # Standard user annotation: body is constrained to S on_device(body, S) # Used by PlanDevices to 'fix' expression to S # (was is_fixed=True) on_device(body, S, constrain_result=True) # Used by PlanDevices to indicate a device_copy can be # inserted if necessary. on_device(body, S, constrain_body=False) # Supported, but currently has no use. on_device(body, S, constrain_result=True, constrain_body=False) A few extra odd's 'n ends collected along the way: - Some CallLowered cleanup which I found useful. - The usual extra debugging output needed as I debugged. In return I removed some particularly verbose logging I'd added while tracking down unexpected object copies. - Cleanup warnings from clang-12 as I touch files. * [checkpoint] unused var
…straints. (apache#9613) * [Relay] Re-run PlanDevices after LowerTE to flow new memory scope constraints. This PR: 1) Makes PlanDevices consider lowered calls when solving device domain constraints. 2) Connects the storage scopes on PrimFunc parameters (encoded in their Buffer data Var type annotation PointerTypes storage_scope fields) to the memory_scope fields of the SEScopes which PlanDevices unifies over. 3) Allows new device_copies to be inserted on the arguments and results of lowered calls so as to acount for any memory scope mismatches which are now apparent. [device_planner.cc has main changes, rest is secondary.] In the short term we'd like to use this machinery to flow memory scope choices made during lowering back out into the overall Relay program. In the longer term we'd also like to be able to use memory scopes to influence the lowering of yet-to-be-lowered functions (or lowered functions which have yet to been scheduled, a distinction now possible with TensorIR). - Memory scope constraints can flow both out of and in to PrimFuncs introduced by LowerTE. In TIR memory scopes are represented by 'storage scopes' on the PointerType type annotations on TIR Buffer data variables. - It is straightforward to extract memory scopes from PrimFuncs by looking at the PrimFunc's buffer_map. We do this is 'phase 1' of PlanDevices, which collects all the device constraints implied by - However, pushing memory constraints in to PrimFuncs is more challenging due to buffer aliasing. This aspect is still experimental. - Allow device_copies to be inserted for both arguments and results of PrimFunc calls, on the assumption PlanDevices has already established a consistent device assignment prior to lowering and any new mismatch is required to match up memory scopes. We use the new 'free' on_device annotations to implement this. Coming along for the ride: - To make unit tests of mixed Relay/TIR functions possible needed to be able to supply a checked_type to GlobalVar since that's currently the only way to give a Relay type to PrimFuncs. - Use GenSym to get unique var names in ANF & partial eval so easier to diff debug output between passes and connect program fragments back into the overall program. Relying on pretty-printing to automagically unique-ify var names is certainly cute but until we have better span support is very hard to work with. - Realized both dead_code.cc and fold_constant.cc would happily move values into a different lexical virtual device context since device_planner.cc was being 'clever' and eliding on_devices for let-bound values when there's no change. Fixed so that every let-bound value has an on_device. Will be much better after apache/tvm-rfcs#45 is implemented. - Make build -Werror clean for clang-12 (mostly move fixups). - Address post-submit comments from apache#9693. * [checkpoint] thread safe GenSym
This is in support of #9613, which allows PlanDevices to be run
after lowering so as to flow memory constraints in and
out of PrimFuncs. That requires a way to insert device_copies
when the memory scopes chosen during separate lowering of fused
primitive functions clashes, but otherwise avoid device_copies when
scopes can be chosen so as to avoid them.
We support that by generalizing the "on_device" annotation to
allow the device constraint to be independently controlled for
its 'body' and 'result'.
(was is_fixed=True)
inserted if necessary.
A few extra odd's 'n ends collected along the way:
In return I removed some particularly verbose logging I'd
added while tracking down unexpected object copies.