Skip to content

GPU resources not released after killing actor #1065

@ericl

Description

@ericl

The following crashes with

Exception: Could not find a node with enough GPUs or other resources to create this actor. The local scheduler information is [ {'ClientType': 'local_scheduler', 'Deleted': False, 'DBClientID': '31dc437d6df69857fea7a9eb6f04004421039e18', 'AuxAddress': '127.0.0.1:37853', 'NumCPUs': 32.0, 'NumGPUs': 1.0, 'LocalSchedulerSocketName': '/tmp/scheduler9534802'}].
import ray
import sys
import time

@ray.remote(num_gpus=1)
class Actor(object):
  def __init__(self):
    pass

ray.init(num_gpus=1)
a = Actor.remote()
a.__ray_terminate__.remote(a._ray_actor_id.id())

time.sleep(5)
a = Actor.remote()  # crashes with not enough gpus

cc @stephanie-wang @robertnishihara

Metadata

Metadata

Assignees

Labels

bugSomething that is supposed to be working; but isn't

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions