-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Simplify creation of NodeEntry instances and use emplace_back #14095
Conversation
@mxnet-label-bot add [pr-work-in-progress] |
Does this change improve the time of scheduling launches to the engine? |
@ptrendx thanks for your comment, would still need to measure |
b4f1e57
to
ca7dde0
Compare
@larroy Could you please fix the CI failures? |
@larroy Is this PR still WIP ? |
43b24ab
to
9046556
Compare
966fb9e
to
287c7e3
Compare
@ptrendx suggestions on measurements? |
aab4834
to
1b93548
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patch. Could you please report the difference that this change is making through experiment?
@szha what would you suggest? |
@szha the initial motivation is to lean out node creation and for correctness since there are difficult to catch bugs depending on Node initialization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for answering my question. LGTM!
@szha Performance is the same, so motivation is correctness and ease of graph building: Tested in a tight loop:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR description says it's optimization so I'm expecting some performance difference. I'm not sure what you mean by "correctness" or "ease". Was it not working as expected? If so, is there a test for it?
The "ease of building graph" is probably easier for people to appreciate. It would be great if the change in recommended way of constructing nodes could be documented somewhere.
Moving a shared pointer is faster than copying. When using move in the microbenchmark is 10% faster. Overall is not much and in the big picture won't make a difference, but these small inefficiencies compound together and make the code in general slower in a way that you can't profile as it's scatered all over. What changes exactly are you suggesting? to the title of the PR? is not clear to me, please help clarify. It can also cause contention between threads due to the atomic lock.
|
The information in the PR suggests this patch is premature optimization as it's not making any difference in speed. If any of the symptoms can be captured in a test, please do so. Otherwise, if it's a usability improvement, then documenting the suggested use would be helpful. |
I just provided a microbenchmark where you can see it's 10% faster. Usually in C++ we try to avoid doing more work than necessary such as copying a shared_ptr when is not necessary. Do you disagree with that? It creates inefficiencies compounded that don't show in a profiler. It also makes it easier and less verbose to add to an nnvm graph. |
I don't think is premature optimization, is generally agreed that is better to emplace_back rather than push_back and create additional copies, it also makes less verbose to add nodes. Why you call it a premature optimization? What exact changes you are suggesting? |
I'm worried that it would be premature because of the lack of the measurable improvements in the problem space what mxnet concerns. |
@szha I changed the title according to your request, are we good to merge? |
Thanks for documenting the change. This PR needs one more rebase. |
apache/tvm#2576 Making copies of shared_ptr is more expensive than moving. This PR reduces lock contention by using move semantics in NNVM nodes making also more convenient to construct NodeEntry classes in the code due to the added ctors. Update NDarray with NodeEntry constructors and refine initializer lists. Sync gradient.cc with tvm
@szha Any more comments, or are we ready to go? |
…#14095) * Optimize move semantics of NodeEntry apache/tvm#2576 Making copies of shared_ptr is more expensive than moving. This PR reduces lock contention by using move semantics in NNVM nodes making also more convenient to construct NodeEntry classes in the code due to the added ctors. Update NDarray with NodeEntry constructors and refine initializer lists. Sync gradient.cc with tvm * Remove additional calls to NodeEntry in emplace_back * refine patch * Fix lint
Description
Reduce copying of shared_ptr. In a microbenchmark is 10% faster. But the main motivation is simplifying creation of nnvm graph on operators.
imperative.cc will be addressed in a different refactor.
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.