Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiply ray Actors on a single machine fight for CPUs with PyTorch #3609

Closed
EricSteinberger opened this issue Dec 22, 2018 · 2 comments
Closed
Labels
question Just a question :)

Comments

@EricSteinberger
Copy link

EricSteinberger commented Dec 22, 2018

System information

Describe the problem

I want to run a ray program with many 2-CPU actors on a single m5.24xlarge instance on AWS to avoid network communication delays, but ray gets horribly slow when PyTorch calls are executed concurrently by multiple actors on the same machine. I tested this on my local and 2 remote Ubuntu machines and this seems to be true for all of them.

In the System Monitor, I can see all CPUs shooting up to close to 100% when limiting the actor to 1 CPU (this is, when just having 1 actor, of course!).
I am not sure whether this is a ray or a PyTorch problem, but I hope someone can help.

Note: on many separate AWS m5.large instances (each has 2 CPUs, i.e. one actor on each machine), my program scales very well, so that is not the cause.

I provide toy code that, when run on a single multi-CPU machine, runs slower if jobs are split among actors than if a single actor does it here:

import time

import ray
import torch


class NeuralNet(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.l = torch.nn.Linear(1000, 2048)
        self.l2 = torch.nn.Linear(2048, 2)

    def forward(self, x):
        return self.l2(self.l(x))


@ray.remote(num_cpus=1)
class TestActor:
    def __init__(self):
        self.net = NeuralNet()
        self.crit = torch.nn.MSELoss()

    def do_torch_stuff(self, batch_size):
        p = self.net(torch.rand((batch_size, 1000), ))


def _parallel_on_5_actors():
    t0 = time.time()

    ray.init()
    acs = [TestActor.remote() for _ in range(5)]
    for _ in range(1000):
        ray.get([ac.do_torch_stuff.remote(10) for ac in acs])

    print("With 5 actors: ", time.time() - t0)


def _all_on_1_actor():
    t0 = time.time()

    ray.init()
    ac = TestActor.remote()
    for _ in range(5000):
        ray.get(ac.do_torch_stuff.remote(10))

    print("With 1 actor: ", time.time() - t0)


if __name__ == '__main__':
    _all_on_1_actor() # ~10 sec on my machine
    # _parallel_on_5_actors() # -> ~18 sec on my machine. Should be 2?!?!?
@ericl
Copy link
Contributor

ericl commented Dec 22, 2018

PyTorch is already parallelizing using multiple threads internally. When you have multiple processes this can cause excessive thrashing from context switching.

It looks like pytorch doesn't let you set threads explicitly pytorch/pytorch#975
However setting OMP_NUM_THREADS=1 prior to starting ray should work.

@ericl ericl added the question Just a question :) label Dec 22, 2018
@EricSteinberger
Copy link
Author

Thank you for your quick response! That does the job!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Just a question :)
Projects
None yet
Development

No branches or pull requests

2 participants