You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I am trying to create 4 jobs, each with 2 processes, on 4x 12GB GPUs. (using docker_run.py --headless, built based on Dockerfile_tiffany)
I was assuming that each job would be running on each isolated GPU so that the memory usage on each GPU would be roughly the same.
But I find this is not the case; each GPU are using a different amount of memory.
It seems that whenever a new job is created, some amount of GPU memory is allocated to GPU #0, although I specified different GPU # other than 0 by setting: --which_gpu 1 --sem_gpu_id 1 --sem_seg_gpu 1 --depth_gpu 1
Why is this happening? How can I balance the GPU memory load for better utilization?
(Maybe some part of the code is running on the default GPU (GPU #0)?)
bash run_tests_unseen.sh
Running PID: 8992
Waiting PIDS: 8992
Auto GPU config:
Number of processes: 5
Number of processes on GPU 0: 2
Number of processes per GPU: 1
Auto GPU config:
Number of processes: 5
Number of processes on GPU 0: 2
Number of processes per GPU: 1
dn is first_run_0
Auto GPU config:
Number of processes: 5
Number of processes on GPU 0: 2
Number of processes per GPU: 1
Auto GPU config:
Number of processes: 5
Number of processes on GPU 0: 2
Number of processes per GPU: 1
Found path: /root/.ai2thor/releases/thor-201909061227-Linux64/thor-201909061227-Linux64
Mono path[0] = '/root/.ai2thor/releases/thor-201909061227-Linux64/thor-201909061227-Linux64_Data/Managed'
Mono config path = '/root/.ai2thor/releases/thor-201909061227-Linux64/thor-201909061227-Linux64_Data/Mono/etc'
Found path: /root/.ai2thor/releases/thor-201909061227-Linux64/thor-201909061227-Linux64
Mono path[0] = '/root/.ai2thor/releases/thor-201909061227-Linux64/thor-201909061227-Linux64_Data/Managed'
Mono config path = '/root/.ai2thor/releases/thor-201909061227-Linux64/thor-201909061227-Linux64_Data/Mono/etc'
Preloaded 'ScreenSelector.so'
Display 0 '0': 1024x768 (primary device).
Display 1 '1': 1024x768 (secondary device).
Display 2 '2': 1024x768 (secondary device).
Display 3 '3': 1024x768 (secondary device).
Logging to /root/.config/unity3d/Allen Institute for Artificial Intelligence/AI2-Thor/Player.log
Preloaded 'ScreenSelector.so'
Display 0 '0': 1024x768 (primary device).
Display 1 '1': 1024x768 (secondary device).
Display 2 '2': 1024x768 (secondary device).
Display 3 '3': 1024x768 (secondary device).
Logging to /root/.config/unity3d/Allen Institute for Artificial Intelligence/AI2-Thor/Player.log
ThorEnv started.
ThorEnv started.
instruction goal is examine a bowl by the light of a lamp
self.goal_idx2cat is {0: 'Knife', 1: 'SinkBasin', 2: 'ArmChair', 3: 'BathtubBasin', 4: 'Bed', 5: 'Cabinet', 6: 'Cart', 7: 'CoffeeMachine', 8: 'CoffeeTable', 9: 'CounterTop', 10: 'Desk', 11: 'DiningTable', 12: 'Drawer', 13: 'Dresser', 14: 'Fridge', 15: 'GarbageCan', 16: 'Microwave', 17: 'Ottoman', 18: 'Safe', 19: 'Shelf', 20: 'SideTable', 21: 'Sofa', 22: 'StoveBurner', 23: 'TVStand', 24: 'Toilet', 29: 'Bowl', 30: 'FloorLamp', 34: 'None'}
Resetting ThorEnv
instruction goal is examine a grey bowl in the light of a lamp
self.goal_idx2cat is {0: 'Knife', 1: 'SinkBasin', 2: 'ArmChair', 3: 'BathtubBasin', 4: 'Bed', 5: 'Cabinet', 6: 'Cart', 7: 'CoffeeMachine', 8: 'CoffeeTable', 9: 'CounterTop', 10: 'Desk', 11: 'DiningTable', 12: 'Drawer', 13: 'Dresser', 14: 'Fridge', 15: 'GarbageCan', 16: 'Microwave', 17: 'Ottoman', 18: 'Safe', 19: 'Shelf', 20: 'SideTable', 21: 'Sofa', 22: 'StoveBurner', 23: 'TVStand', 24: 'Toilet', 25: 'Bowl', 26: 'FloorLamp', 34: 'None'}
Resetting ThorEnv
Task: Examine a grey bowl in the light of a lamp.
Running PID: 9240
Waiting PIDS: 8992 9240
Auto GPU config:
Number of processes: 5
Number of processes on GPU 0: 2
Number of processes per GPU: 1
Auto GPU config:
Number of processes: 5
Number of processes on GPU 0: 2
Number of processes per GPU: 1
dn is first_run_1
Auto GPU config:
Number of processes: 5
Number of processes on GPU 0: 2
Number of processes per GPU: 1
Auto GPU config:
Number of processes: 5
Number of processes on GPU 0: 2
Number of processes per GPU: 1
Found path: /root/.ai2thor/releases/thor-201909061227-Linux64/thor-201909061227-Linux64
Mono path[0] = '/root/.ai2thor/releases/thor-201909061227-Linux64/thor-201909061227-Linux64_Data/Managed'
Mono config path = '/root/.ai2thor/releases/thor-201909061227-Linux64/thor-201909061227-Linux64_Data/Mono/etc'
Found path: /root/.ai2thor/releases/thor-201909061227-Linux64/thor-201909061227-Linux64
Mono path[0] = '/root/.ai2thor/releases/thor-201909061227-Linux64/thor-201909061227-Linux64_Data/Managed'
Mono config path = '/root/.ai2thor/releases/thor-201909061227-Linux64/thor-201909061227-Linux64_Data/Mono/etc'
Preloaded 'ScreenSelector.so'
Display 0 '0': 1024x768 (primary device).
Display 1 '1': 1024x768 (secondary device).
Display 2 '2': 1024x768 (secondary device).
Display 3 '3': 1024x768 (secondary device).
Preloaded 'ScreenSelector.so'
Logging to /root/.config/unity3d/Allen Institute for Artificial Intelligence/AI2-Thor/Player.log
Display 0 '0': 1024x768 (primary device).
Display 1 '1': 1024x768 (secondary device).
Display 2 '2': 1024x768 (secondary device).
Display 3 '3': 1024x768 (secondary device).
Logging to /root/.config/unity3d/Allen Institute for Artificial Intelligence/AI2-Thor/Player.log
ThorEnv started.
ThorEnv started.
Running PID: 9492
Waiting PIDS: 8992 9240 9492
instruction goal is move two dog sculptures to the coffee table
instruction goal is grab the grey bowl on the corner table turn on the lamp
self.goal_idx2cat is {0: 'Knife', 1: 'SinkBasin', 2: 'ArmChair', 3: 'BathtubBasin', 4: 'Bed', 5: 'Cabinet', 6: 'Cart', 7: 'CoffeeMachine', 8: 'CoffeeTable', 9: 'CounterTop', 10: 'Desk', 11: 'DiningTable', 12: 'Drawer', 13: 'Dresser', 14: 'Fridge', 15: 'GarbageCan', 16: 'Microwave', 17: 'Ottoman', 18: 'Safe', 19: 'Shelf', 20: 'SideTable', 21: 'Sofa', 22: 'StoveBurner', 23: 'TVStand', 24: 'Toilet', 29: 'Statue', 34: 'None'}
self.goal_idx2cat is {0: 'Knife', 1: 'SinkBasin', 2: 'ArmChair', 3: 'BathtubBasin', 4: 'Bed', 5: 'Cabinet', 6: 'Cart', 7: 'CoffeeMachine', 8: 'CoffeeTable', 9: 'CounterTop', 10: 'Desk', 11: 'DiningTable', 12: 'Drawer', 13: 'Dresser', 14: 'Fridge', 15: 'GarbageCan', 16: 'Microwave', 17: 'Ottoman', 18: 'Safe', 19: 'Shelf', 20: 'SideTable', 21: 'Sofa', 22: 'StoveBurner', 23: 'TVStand', 24: 'Toilet', 25: 'Bowl', 26: 'FloorLamp', 34: 'None'}
Resetting ThorEnv
Resetting ThorEnv
Auto GPU config:
Number of processes: 5
Number of processes on GPU 0: 2
Number of processes per GPU: 1
Auto GPU config:
Number of processes: 5
Number of processes on GPU 0: 2
Number of processes per GPU: 1
dn is first_run_2
Auto GPU config:
Number of processes: 5
Number of processes on GPU 0: 2
Number of processes per GPU: 1
Auto GPU config:
Number of processes: 5
Number of processes on GPU 0: 2
Number of processes per GPU: 1
Found path: /root/.ai2thor/releases/thor-201909061227-Linux64/thor-201909061227-Linux64
Mono path[0] = '/root/.ai2thor/releases/thor-201909061227-Linux64/thor-201909061227-Linux64_Data/Managed'
Mono config path = '/root/.ai2thor/releases/thor-201909061227-Linux64/thor-201909061227-Linux64_Data/Mono/etc'
Preloaded 'ScreenSelector.so'
Display 0 '0': 1024x768 (primary device).
Display 1 '1': 1024x768 (secondary device).
Display 2 '2': 1024x768 (secondary device).
Display 3 '3': 1024x768 (secondary device).
Logging to /root/.config/unity3d/Allen Institute for Artificial Intelligence/AI2-Thor/Player.log
Found path: /root/.ai2thor/releases/thor-201909061227-Linux64/thor-201909061227-Linux64
Mono path[0] = '/root/.ai2thor/releases/thor-201909061227-Linux64/thor-201909061227-Linux64_Data/Managed'
Mono config path = '/root/.ai2thor/releases/thor-201909061227-Linux64/thor-201909061227-Linux64_Data/Mono/etc'
Preloaded 'ScreenSelector.so'
Display 0 '0': 1024x768 (primary device).
Display 1 '1': 1024x768 (secondary device).
Display 2 '2': 1024x768 (secondary device).
Display 3 '3': 1024x768 (secondary device).
Logging to /root/.config/unity3d/Allen Institute for Artificial Intelligence/AI2-Thor/Player.log
ThorEnv started.
ThorEnv started.
Task: Move two dog sculptures to the coffee table.
Running PID: 9741
Waiting PIDS: 8992 9240 9492 9741
Auto GPU config:
Number of processes: 5
Number of processes on GPU 0: 2
Number of processes per GPU: 1
Auto GPU config:
Number of processes: 5
Number of processes on GPU 0: 2
Number of processes per GPU: 1
dn is first_run_3
instruction goal is to move two statues to the living room table
self.goal_idx2cat is {0: 'Knife', 1: 'SinkBasin', 2: 'ArmChair', 3: 'BathtubBasin', 4: 'Bed', 5: 'Cabinet', 6: 'Cart', 7: 'CoffeeMachine', 8: 'CoffeeTable', 9: 'CounterTop', 10: 'Desk', 11: 'DiningTable', 12: 'Drawer', 13: 'Dresser', 14: 'Fridge', 15: 'GarbageCan', 16: 'Microwave', 17: 'Ottoman', 18: 'Safe', 19: 'Shelf', 20: 'SideTable', 21: 'Sofa', 22: 'StoveBurner', 23: 'TVStand', 24: 'Toilet', 29: 'Statue', 34: 'None'}
Resetting ThorEnv
instruction goal is put 2 dog decorations front to back on the edge of the right side of the table
self.goal_idx2cat is {0: 'Knife', 1: 'SinkBasin', 2: 'ArmChair', 3: 'BathtubBasin', 4: 'Bed', 5: 'Cabinet', 6: 'Cart', 7: 'CoffeeMachine', 8: 'CoffeeTable', 9: 'CounterTop', 10: 'Desk', 11: 'DiningTable', 12: 'Drawer', 13: 'Dresser', 14: 'Fridge', 15: 'GarbageCan', 16: 'Microwave', 17: 'Ottoman', 18: 'Safe', 19: 'Shelf', 20: 'SideTable', 21: 'Sofa', 22: 'StoveBurner', 23: 'TVStand', 24: 'Toilet', 25: 'Statue', 34: 'None'}
Resetting ThorEnv
Auto GPU config:
Number of processes: 5
Number of processes on GPU 0: 2
Number of processes per GPU: 1
Auto GPU config:
Number of processes: 5
Number of processes on GPU 0: 2
Number of processes per GPU: 1
Found path: /root/.ai2thor/releases/thor-201909061227-Linux64/thor-201909061227-Linux64
Mono path[0] = '/root/.ai2thor/releases/thor-201909061227-Linux64/thor-201909061227-Linux64_Data/Managed'
Mono config path = '/root/.ai2thor/releases/thor-201909061227-Linux64/thor-201909061227-Linux64_Data/Mono/etc'
Preloaded 'ScreenSelector.so'
Display 0 '0': 1024x768 (primary device).
Display 1 '1': 1024x768 (secondary device).
Display 2 '2': 1024x768 (secondary device).
Display 3 '3': 1024x768 (secondary device).
Logging to /root/.config/unity3d/Allen Institute for Artificial Intelligence/AI2-Thor/Player.log
Found path: /root/.ai2thor/releases/thor-201909061227-Linux64/thor-201909061227-Linux64
Mono path[0] = '/root/.ai2thor/releases/thor-201909061227-Linux64/thor-201909061227-Linux64_Data/Managed'
Mono config path = '/root/.ai2thor/releases/thor-201909061227-Linux64/thor-201909061227-Linux64_Data/Mono/etc'
Preloaded 'ScreenSelector.so'
Display 0 '0': 1024x768 (primary device).
Display 1 '1': 1024x768 (secondary device).
Display 2 '2': 1024x768 (secondary device).
Display 3 '3': 1024x768 (secondary device).
Logging to /root/.config/unity3d/Allen Institute for Artificial Intelligence/AI2-Thor/Player.log
Task: Examine a bowl by the light of a lamp.
ThorEnv started.
ThorEnv started.
Task: Put 2 dog decorations front to back on the edge of the right side of the table.
Process ForkServerProcess-1:
Traceback (most recent call last):
File "/custom/conda/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/custom/conda/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/hongin/FILM/envs/utils/vector_env.py", line 179, in _worker_env
env = env_fn(*env_fn_args)
File "/home/hongin/FILM/envs/__init__.py", line 122, in make_env_fn_alfred
env = Sem_Exp_Env_Agent_Thor(args, scene_names, rank)
File "/home/hongin/FILM/agents/sem_exp_thor.py", line 77, in __init__
self.seg = SemgnetationHelper(self)
File "/home/hongin/FILM/models/segmentation/segmentation_helper.py", line 27, in __init__
self.sem_seg_model_alfw_large = load_pretrained_model('models/segmentation/maskrcnn_alfworld/receps_lr5e-3_003.pth', torch.device("cuda:0" if args.cuda else "cpu"), 'recep')
File "/home/hongin/FILM/models/segmentation/alfworld_mrcnn.py", line 90, in load_pretrained_model
mask_rcnn.load_state_dict(torch.load(path, map_location=device))
File "/custom/conda/lib/python3.9/site-packages/torch/serialization.py", line 592, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File "/custom/conda/lib/python3.9/site-packages/torch/serialization.py", line 851, in _load
result = unpickler.load()
File "/custom/conda/lib/python3.9/site-packages/torch/serialization.py", line 843, in persistent_load
load_tensor(data_type, size, key, _maybe_decode_ascii(location))
File "/custom/conda/lib/python3.9/site-packages/torch/serialization.py", line 832, in load_tensor
loaded_storages[key] = restore_location(storage, location)
File "/custom/conda/lib/python3.9/site-packages/torch/serialization.py", line 812, in restore_location
return default_restore_location(storage, str(map_location))
File "/custom/conda/lib/python3.9/site-packages/torch/serialization.py", line 175, in default_restore_location
result = fn(storage, location)
File "/custom/conda/lib/python3.9/site-packages/torch/serialization.py", line 157, in _cuda_deserialize
return obj.cuda(device)
File "/custom/conda/lib/python3.9/site-packages/torch/_utils.py", line 80, in _cuda
return new_type(self.size()).copy_(self, non_blocking)
File "/custom/conda/lib/python3.9/site-packages/torch/cuda/__init__.py", line 484, in _lazy_new
return super(_CudaBase, cls).__new__(cls, *args, **kwargs)
RuntimeError: CUDA error: out of memory
Process ForkServerProcess-2:
Traceback (most recent call last):
File "/custom/conda/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/custom/conda/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/hongin/FILM/envs/utils/vector_env.py", line 179, in _worker_env
env = env_fn(*env_fn_args)
File "/home/hongin/FILM/envs/__init__.py", line 122, in make_env_fn_alfred
env = Sem_Exp_Env_Agent_Thor(args, scene_names, rank)
File "/home/hongin/FILM/agents/sem_exp_thor.py", line 77, in __init__
self.seg = SemgnetationHelper(self)
File "/home/hongin/FILM/models/segmentation/segmentation_helper.py", line 27, in __init__
self.sem_seg_model_alfw_large = load_pretrained_model('models/segmentation/maskrcnn_alfworld/receps_lr5e-3_003.pth', torch.device("cuda:0" if args.cuda else "cpu"), 'recep')
File "/home/hongin/FILM/models/segmentation/alfworld_mrcnn.py", line 90, in load_pretrained_model
mask_rcnn.load_state_dict(torch.load(path, map_location=device))
File "/custom/conda/lib/python3.9/site-packages/torch/serialization.py", line 592, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File "/custom/conda/lib/python3.9/site-packages/torch/serialization.py", line 851, in _load
result = unpickler.load()
File "/custom/conda/lib/python3.9/site-packages/torch/serialization.py", line 843, in persistent_load
load_tensor(data_type, size, key, _maybe_decode_ascii(location))
File "/custom/conda/lib/python3.9/site-packages/torch/serialization.py", line 832, in load_tensor
loaded_storages[key] = restore_location(storage, location)
File "/custom/conda/lib/python3.9/site-packages/torch/serialization.py", line 812, in restore_location
return default_restore_location(storage, str(map_location))
File "/custom/conda/lib/python3.9/site-packages/torch/serialization.py", line 175, in default_restore_location
result = fn(storage, location)
File "/custom/conda/lib/python3.9/site-packages/torch/serialization.py", line 157, in _cuda_deserialize
return obj.cuda(device)
File "/custom/conda/lib/python3.9/site-packages/torch/_utils.py", line 80, in _cuda
return new_type(self.size()).copy_(self, non_blocking)
File "/custom/conda/lib/python3.9/site-packages/torch/cuda/__init__.py", line 484, in _lazy_new
return super(_CudaBase, cls).__new__(cls, *args, **kwargs)
RuntimeError: CUDA error: out of memory
Traceback (most recent call last):
File "/home/hongin/FILM/main.py", line 831, in <module>
main()
File "/home/hongin/FILM/main.py", line 116, in main
envs = make_vec_envs(args)
File "/home/hongin/FILM/envs/__init__.py", line 15, in make_vec_envs
envs = construct_envs_alfred(args)
File "/home/hongin/FILM/envs/__init__.py", line 137, in construct_envs_alfred
envs = VectorEnv(make_env_fn=make_env_fn_alfred,
File "/home/hongin/FILM/envs/utils/vector_env.py", line 149, in __init__
self.observation_spaces = [
File "/home/hongin/FILM/envs/utils/vector_env.py", line 150, in <listcomp>
read_fn() for read_fn in self._connection_read_fns
File "/custom/conda/lib/python3.9/multiprocessing/connection.py", line 255, in recv
buf = self._recv_bytes()
File "/custom/conda/lib/python3.9/multiprocessing/connection.py", line 419, in _recv_bytes
buf = self._recv(4)
File "/custom/conda/lib/python3.9/multiprocessing/connection.py", line 384, in _recv
chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer
Exception ignored in: <function VectorEnv.__del__ at 0x7f67ef0f1ee0>
Traceback (most recent call last):
File "/home/hongin/FILM/envs/utils/vector_env.py", line 767, in __del__
self.close()
File "/home/hongin/FILM/envs/utils/vector_env.py", line 567, in close
write_fn((CLOSE_COMMAND, None))
File "/custom/conda/lib/python3.9/multiprocessing/connection.py", line 211, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/custom/conda/lib/python3.9/multiprocessing/connection.py", line 416, in _send_bytes
self._send(header + buf)
File "/custom/conda/lib/python3.9/multiprocessing/connection.py", line 373, in _send
n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Process ForkServerProcess-1:
Traceback (most recent call last):
File "/home/hongin/FILM/agents/sem_exp_thor.py", line 168, in load_initial_scene
obs, info = self.setup_scene(traj_data, task_type, r_idx, self.args)
File "/home/hongin/FILM/agents/sem_exp_thor.py", line 335, in setup_scene
obs, seg_print = self._preprocess_obs(obs)
File "/home/hongin/FILM/agents/sem_exp_thor.py", line 1358, in _preprocess_obs
sem_seg_pred = self.seg.get_sem_pred(rgb.astype(np.uint8)) #(300, 300, num_cat)
File "/home/hongin/FILM/models/segmentation/segmentation_helper.py", line 241, in get_sem_pred
self.get_instance_mask_seg_alfworld_both()
File "/home/hongin/FILM/models/segmentation/segmentation_helper.py", line 114, in get_instance_mask_seg_alfworld_both
results_large = self.sem_seg_model_alfw_large(im_tensors)[0]
File "/custom/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/custom/conda/lib/python3.9/site-packages/torchvision/models/detection/generalized_rcnn.py", line 94, in forward
features = self.backbone(images.tensors)
File "/custom/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/custom/conda/lib/python3.9/site-packages/torchvision/models/detection/backbone_utils.py", line 44, in forward
x = self.body(x)
File "/custom/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/custom/conda/lib/python3.9/site-packages/torchvision/models/_utils.py", line 63, in forward
x = module(x)
File "/custom/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/custom/conda/lib/python3.9/site-packages/torch/nn/modules/container.py", line 119, in forward
input = module(input)
File "/custom/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/custom/conda/lib/python3.9/site-packages/torchvision/models/resnet.py", line 133, in forward
out = self.bn3(out)
File "/custom/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/custom/conda/lib/python3.9/site-packages/torchvision/ops/misc.py", line 96, in forward
return x * scale + bias
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 11.91 GiB total capacity; 486.21 MiB already allocated; 9.12 MiB free; 492.00 MiB reserved in total by PyTorch)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/custom/conda/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/custom/conda/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/hongin/FILM/envs/utils/vector_env.py", line 222, in _worker_env
obs, info, actions_dict = env.load_initial_scene()
File "/home/hongin/FILM/agents/sem_exp_thor.py", line 187, in load_initial_scene
obs = np.zeros(self.obs.shape)
AttributeError: 'NoneType' object has no attribute 'shape'
Traceback (most recent call last):
File "/home/hongin/FILM/main.py", line 831, in <module>
main()
File "/home/hongin/FILM/main.py", line 121, in main
obs, infos, actions_dicts = envs.load_initial_scene()
File "/home/hongin/FILM/envs/__init__.py", line 71, in load_initial_scene
obs, info, actions_dict = self.venv.load_initial_scene()
File "/home/hongin/FILM/envs/utils/vector_env.py", line 475, in load_initial_scene
results.append(read_fn())
File "/custom/conda/lib/python3.9/multiprocessing/connection.py", line 255, in recv
buf = self._recv_bytes()
File "/custom/conda/lib/python3.9/multiprocessing/connection.py", line 419, in _recv_bytes
buf = self._recv(4)
File "/custom/conda/lib/python3.9/multiprocessing/connection.py", line 388, in _recv
raise EOFError
EOFError
Exception ignored in: <function VectorEnv.__del__ at 0x7f18a0f87ee0>
Traceback (most recent call last):
File "/home/hongin/FILM/envs/utils/vector_env.py", line 767, in __del__
self.close()
File "/home/hongin/FILM/envs/utils/vector_env.py", line 564, in close
read_fn()
File "/custom/conda/lib/python3.9/multiprocessing/connection.py", line 255, in recv
buf = self._recv_bytes()
File "/custom/conda/lib/python3.9/multiprocessing/connection.py", line 419, in _recv_bytes
buf = self._recv(4)
File "/custom/conda/lib/python3.9/multiprocessing/connection.py", line 388, in _recv
raise EOFError
EOFError:
The text was updated successfully, but these errors were encountered:
Plus, when using multiple Xorg processes (several startx.py with different displays), I find that semantic pictures are not properly visualized. ($ALFRED_ROOT/pictures/tests_unseen/first_run_0/Sem/Sem_*.png)
However, when using a single Xorg process (only one startx.py), semantic pictures are properly visualized.
Hi, I am trying to create 4 jobs, each with 2 processes, on 4x 12GB GPUs. (using docker_run.py --headless, built based on Dockerfile_tiffany)
I was assuming that each job would be running on each isolated GPU so that the memory usage on each GPU would be roughly the same.
But I find this is not the case; each GPU are using a different amount of memory.
It seems that whenever a new job is created, some amount of GPU memory is allocated to GPU #0, although I specified different GPU # other than 0 by setting:
--which_gpu 1 --sem_gpu_id 1 --sem_seg_gpu 1 --depth_gpu 1
Why is this happening? How can I balance the GPU memory load for better utilization?
(Maybe some part of the code is running on the default GPU (GPU #0)?)
My bash script: run_tests_unseen.sh
My startx script: run_xserver.sh
When loaded to only GPU 0
When loaded to only GPU 1
When loaded to all 4 GPUs (before CUDA OOM error)
Log when running all GPUs
The text was updated successfully, but these errors were encountered: