Multiply ray Actors on a single machine fight for CPUs with PyTorch

### System information
- **OS Platform and Distribution**: Ubuntu 16.04; Ubuntu 18.04; Amazon Linux
- **Ray installed from (source or binary)**: pip and wheel from #3520
- **Ray version**: 0.4.0; 0.6.0 from #3520
- **Python version**: 3.6.6 for ray 0.4.0 and 3.7 for ray 0.6.0 from #3520

### Describe the problem
I want to run a ray program with many 2-CPU actors on a single m5.24xlarge instance on AWS to avoid network communication delays, but ray gets horribly slow when PyTorch calls are executed concurrently by multiple actors on the same machine. I tested this on my local and 2 remote Ubuntu machines and this seems to be true for all of them.

In the System Monitor, I can see all CPUs shooting up to close to 100% when limiting the actor to 1 CPU (this is, when just having 1 actor, of course!). 
I am not sure whether this is a ray or a PyTorch problem, but I hope someone can help.

Note: on many separate AWS m5.large instances (each has 2 CPUs, i.e. one actor on each machine), my program scales very well, so that is not the cause.

I provide toy code that, when run on a single multi-CPU machine, runs slower if jobs are split among actors than if a single actor does it here:

```
import time

import ray
import torch


class NeuralNet(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.l = torch.nn.Linear(1000, 2048)
        self.l2 = torch.nn.Linear(2048, 2)

    def forward(self, x):
        return self.l2(self.l(x))


@ray.remote(num_cpus=1)
class TestActor:
    def __init__(self):
        self.net = NeuralNet()
        self.crit = torch.nn.MSELoss()

    def do_torch_stuff(self, batch_size):
        p = self.net(torch.rand((batch_size, 1000), ))


def _parallel_on_5_actors():
    t0 = time.time()

    ray.init()
    acs = [TestActor.remote() for _ in range(5)]
    for _ in range(1000):
        ray.get([ac.do_torch_stuff.remote(10) for ac in acs])

    print("With 5 actors: ", time.time() - t0)


def _all_on_1_actor():
    t0 = time.time()

    ray.init()
    ac = TestActor.remote()
    for _ in range(5000):
        ray.get(ac.do_torch_stuff.remote(10))

    print("With 1 actor: ", time.time() - t0)


if __name__ == '__main__':
    _all_on_1_actor() # ~10 sec on my machine
    # _parallel_on_5_actors() # -> ~18 sec on my machine. Should be 2?!?!?
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multiply ray Actors on a single machine fight for CPUs with PyTorch #3609

System information

Describe the problem

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Multiply ray Actors on a single machine fight for CPUs with PyTorch #3609

Description

System information

Describe the problem

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions