-
Notifications
You must be signed in to change notification settings - Fork 2.8k
[NPUW] Extend remote memory check in set_tensor() with sliced tensor case #32025
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NPUW] Extend remote memory check in set_tensor() with sliced tensor case #32025
Conversation
| if (tensor->is_continuous()) { | ||
| m_input_allocated.insert(tensor->data()); | ||
| } else { | ||
| LOG_WARN("Strided remote tensor is not supported on the device! Expect worse performance due " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok in case of strided tensor - shouldn't we clear m_input_allocated - so undo what we did in line 198?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. It doesn't occur in the use-case I'm considering, but generally it could happen. I've removed the old check
e9ab56b
dmatveev
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some post-merge comments here
| // ZeroRemoteTensor and ZeroHostTensor should guarantee the correct memory allocation | ||
| m_input_allocated.insert(tensor->data()); | ||
| // Also we can get a strided remote tensor. In this case the copy cannot be avoided for now. | ||
| if (m_npuw_model->global_mem_device() == "NPU") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if this check is right here.
| auto remote_ctx = | ||
| m_npuw_model->get_plugin()->get_core()->get_default_context(m_npuw_model->global_mem_device())._ptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having this on every set_tensor may be expensive.
Follow-up on #31792