-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Metal] Fix bad stream after interrupted tuning session #8244
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
echuraev
force-pushed
the
echuraev/fix_bad_stream
branch
from
June 11, 2021 14:27
823e1a9
to
59803ac
Compare
jwfromm
approved these changes
Jun 15, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
@echuraev please fix the conflict. |
echuraev
force-pushed
the
echuraev/fix_bad_stream
branch
from
June 16, 2021 05:13
59803ac
to
b8f45d2
Compare
@echuraev sorry please try running CI again, you've hit a known flaky issue #8140 (comment) |
After interrupted tuning session, we may face the problem that the stream object was released, but we didn't create a new one. In this case it wasn't possible to run a new Metal task on the device without restarting rpc application. Created a global function `metal.ResetGlobalState` which should be called in RPC application when the connection was closed. In this function, we reinitialize the streams of Metal devices. And it guarantees to us that the new RPC session will work with the correct streams.
- Rename function GetStream -> CastStreamOrGetCurrent - Add several checks on device id - When we use `SetStream` with nullptr, then the default stream will be associated with the device.
echuraev
force-pushed
the
echuraev/fix_bad_stream
branch
from
June 17, 2021 07:08
b8f45d2
to
0fa2aea
Compare
ylc
pushed a commit
to ylc/tvm
that referenced
this pull request
Sep 29, 2021
* [Metal] Fix bad stream after interrupted tuning session After interrupted tuning session, we may face the problem that the stream object was released, but we didn't create a new one. In this case it wasn't possible to run a new Metal task on the device without restarting rpc application. Created a global function `metal.ResetGlobalState` which should be called in RPC application when the connection was closed. In this function, we reinitialize the streams of Metal devices. And it guarantees to us that the new RPC session will work with the correct streams. * Refactor metal_device_api - Rename function GetStream -> CastStreamOrGetCurrent - Add several checks on device id - When we use `SetStream` with nullptr, then the default stream will be associated with the device.
zxy844288792
pushed a commit
to zxy844288792/tvm
that referenced
this pull request
Mar 4, 2022
* [Metal] Fix bad stream after interrupted tuning session After interrupted tuning session, we may face the problem that the stream object was released, but we didn't create a new one. In this case it wasn't possible to run a new Metal task on the device without restarting rpc application. Created a global function `metal.ResetGlobalState` which should be called in RPC application when the connection was closed. In this function, we reinitialize the streams of Metal devices. And it guarantees to us that the new RPC session will work with the correct streams. * Refactor metal_device_api - Rename function GetStream -> CastStreamOrGetCurrent - Add several checks on device id - When we use `SetStream` with nullptr, then the default stream will be associated with the device.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
After interrupted tuning session, we may face the problem that the
stream object was released, but we didn't create a new one. In this case
it wasn't possible to run a new Metal task on the device without
restarting rpc application.
Created a global function
metal.ResetGlobalState
which should becalled in RPC application when the connection was closed. In this
function, we reinitialize the streams of Metal devices. And it
guarantees to us that the new RPC session will work with the correct
streams.
Also refactored code of metal_device_api:
SetStream
with nullptr, then the default stream will beassociated with the device.
Thanks for contributing to TVM! Please refer to guideline https://tvm.apache.org/docs/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from Reviewers by @ them in the pull request thread.