Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ray] Can RAY pause and continue tasks distributed to the cluster's nodes? #8263

Open
zewenli98 opened this issue May 1, 2020 · 1 comment
Labels
enhancement Request for new feature and/or capability P3 Issue moderate in impact or severity

Comments

@zewenli98
Copy link
Contributor

What is your question?

I'm not sure if RAY has the ability to pause and continue tasks distributed to the cluster's nodes.

For example, if there is a task running in my cluster, but at this point, a higher-priority task has to be executed immediately. I wonder if RAY is able to pause the low-priority task and then switch to execute the high-priority one. After it's finished, the low-priority one continues to run.

Any help would be appreciated!!!

@zewenli98 zewenli98 added the question Just a question :) label May 1, 2020
@rkooo567
Copy link
Contributor

rkooo567 commented May 7, 2020

It is not supported by our APIs. You can probably use AsyncActor to implement this in application level, but it could be hard. The code below is a brief idea.

@ray.remote
class TaskRunner:
    def __init__(self):
        self.pause = True
        self.resume_context = {}
        self.task_id = "hash_value"

   async def run_task(self, id=None):
      assert self.pause == True
      self.pause = False
      if not id:
          await self.run({})
      else:
          self.task_id = id
          await self.run(self.resume_context[id])

    async def run(self, context):
        while True:
            if self.pause:
                self.resume_context[self.task_id] = context
                return
            # do something and store state in resume_context
            await asyncio.sleep(0)

     async def pause_request(self):
        self.pause = True
        return self.task_id

t = TaskRunner.remote()
t.run_task.remote()
paused_task_id = t.pause_request.remote()
t.run_task.remote()
t.run_task.remote(id=paused_task_id)

@rkooo567 rkooo567 added core P3 Issue moderate in impact or severity enhancement Request for new feature and/or capability and removed question Just a question :) labels May 27, 2020
@ericl ericl removed the core label Jan 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Request for new feature and/or capability P3 Issue moderate in impact or severity
Projects
None yet
Development

No branches or pull requests

3 participants