Description
System information
- OS Platform and Distribution: Ubuntu 16.04; Ubuntu 18.04; Amazon Linux
- Ray installed from (source or binary): pip and wheel from Segmentation fault with ray 0.6.0 and PyTorch 0.4.1 and 1.0.0 #3520
- Ray version: 0.4.0; 0.6.0 from Segmentation fault with ray 0.6.0 and PyTorch 0.4.1 and 1.0.0 #3520
- Python version: 3.6.6 for ray 0.4.0 and 3.7 for ray 0.6.0 from Segmentation fault with ray 0.6.0 and PyTorch 0.4.1 and 1.0.0 #3520
Describe the problem
I want to run a ray program with many 2-CPU actors on a single m5.24xlarge instance on AWS to avoid network communication delays, but ray gets horribly slow when PyTorch calls are executed concurrently by multiple actors on the same machine. I tested this on my local and 2 remote Ubuntu machines and this seems to be true for all of them.
In the System Monitor, I can see all CPUs shooting up to close to 100% when limiting the actor to 1 CPU (this is, when just having 1 actor, of course!).
I am not sure whether this is a ray or a PyTorch problem, but I hope someone can help.
Note: on many separate AWS m5.large instances (each has 2 CPUs, i.e. one actor on each machine), my program scales very well, so that is not the cause.
I provide toy code that, when run on a single multi-CPU machine, runs slower if jobs are split among actors than if a single actor does it here:
import time
import ray
import torch
class NeuralNet(torch.nn.Module):
def __init__(self):
super().__init__()
self.l = torch.nn.Linear(1000, 2048)
self.l2 = torch.nn.Linear(2048, 2)
def forward(self, x):
return self.l2(self.l(x))
@ray.remote(num_cpus=1)
class TestActor:
def __init__(self):
self.net = NeuralNet()
self.crit = torch.nn.MSELoss()
def do_torch_stuff(self, batch_size):
p = self.net(torch.rand((batch_size, 1000), ))
def _parallel_on_5_actors():
t0 = time.time()
ray.init()
acs = [TestActor.remote() for _ in range(5)]
for _ in range(1000):
ray.get([ac.do_torch_stuff.remote(10) for ac in acs])
print("With 5 actors: ", time.time() - t0)
def _all_on_1_actor():
t0 = time.time()
ray.init()
ac = TestActor.remote()
for _ in range(5000):
ray.get(ac.do_torch_stuff.remote(10))
print("With 1 actor: ", time.time() - t0)
if __name__ == '__main__':
_all_on_1_actor() # ~10 sec on my machine
# _parallel_on_5_actors() # -> ~18 sec on my machine. Should be 2?!?!?