Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Metal] Fix bad stream after interrupted tuning session #8244

Merged
merged 2 commits into from
Jun 18, 2021

Conversation

echuraev
Copy link
Contributor

After interrupted tuning session, we may face the problem that the
stream object was released, but we didn't create a new one. In this case
it wasn't possible to run a new Metal task on the device without
restarting rpc application.

Created a global function metal.ResetGlobalState which should be
called in RPC application when the connection was closed. In this
function, we reinitialize the streams of Metal devices. And it
guarantees to us that the new RPC session will work with the correct
streams.

Also refactored code of metal_device_api:

  • Rename function GetStream -> CastStreamOrGetCurrent
  • Add several checks on device id
  • When we use SetStream with nullptr, then the default stream will be
    associated with the device.

Thanks for contributing to TVM! Please refer to guideline https://tvm.apache.org/docs/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from Reviewers by @ them in the pull request thread.

Copy link
Contributor

@jwfromm jwfromm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@masahi
Copy link
Member

masahi commented Jun 15, 2021

@echuraev please fix the conflict.

@masahi
Copy link
Member

masahi commented Jun 16, 2021

@echuraev sorry please try running CI again, you've hit a known flaky issue #8140 (comment)

After interrupted tuning session, we may face the problem that the
stream object was released, but we didn't create a new one. In this case
it wasn't possible to run a new Metal task on the device without
restarting rpc application.

Created a global function `metal.ResetGlobalState` which should be
called in RPC application when the connection was closed. In this
function, we reinitialize the streams of Metal devices. And it
guarantees to us that the new RPC session will work with the correct
streams.
- Rename function GetStream -> CastStreamOrGetCurrent
- Add several checks on device id
- When we use `SetStream` with nullptr, then the default stream will be
  associated with the device.
@masahi masahi merged commit 77536da into apache:main Jun 18, 2021
@echuraev echuraev deleted the echuraev/fix_bad_stream branch September 24, 2021 10:37
ylc pushed a commit to ylc/tvm that referenced this pull request Sep 29, 2021
* [Metal] Fix bad stream after interrupted tuning session

After interrupted tuning session, we may face the problem that the
stream object was released, but we didn't create a new one. In this case
it wasn't possible to run a new Metal task on the device without
restarting rpc application.

Created a global function `metal.ResetGlobalState` which should be
called in RPC application when the connection was closed. In this
function, we reinitialize the streams of Metal devices. And it
guarantees to us that the new RPC session will work with the correct
streams.

* Refactor metal_device_api

- Rename function GetStream -> CastStreamOrGetCurrent
- Add several checks on device id
- When we use `SetStream` with nullptr, then the default stream will be
  associated with the device.
zxy844288792 pushed a commit to zxy844288792/tvm that referenced this pull request Mar 4, 2022
* [Metal] Fix bad stream after interrupted tuning session

After interrupted tuning session, we may face the problem that the
stream object was released, but we didn't create a new one. In this case
it wasn't possible to run a new Metal task on the device without
restarting rpc application.

Created a global function `metal.ResetGlobalState` which should be
called in RPC application when the connection was closed. In this
function, we reinitialize the streams of Metal devices. And it
guarantees to us that the new RPC session will work with the correct
streams.

* Refactor metal_device_api

- Rename function GetStream -> CastStreamOrGetCurrent
- Add several checks on device id
- When we use `SetStream` with nullptr, then the default stream will be
  associated with the device.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants