Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA out of memory #4

Open
Roger-GOAT opened this issue Feb 25, 2024 · 2 comments
Open

CUDA out of memory #4

Roger-GOAT opened this issue Feb 25, 2024 · 2 comments

Comments

@Roger-GOAT
Copy link

Hi ,great work!
when I run

prior_network = pd.read_csv('./network_mouse.csv')
data = cf.data_preparation(adata, prior_network)
[0] - Data loading and preprocessing...
Consider the input data with 1 lineages:
  Lineage - all:
    337 extra edges (Spearman correlation > 0.6) are added into the prior gene interaction network.
    Total number of edges: 3537148.
    n_genes × n_cells = 12335 × 1822

CUDA = '0'
%%time
cefcon_results_dict = {}
for li, data_li in data.items():
    # We suggest setting up multiple repeats to minimize the randomness of the computation.
    cefcon_GRN_model = cf.NetModel(epochs=350, repeats=3, seed=-1,cuda=CUDA)
    cefcon_GRN_model.run(data_li)

    cefcon_results = cefcon_GRN_model.get_cefcon_results(edge_threshold_avgDegree=8)
    cefcon_results_dict[li] = cefcon_results

get error like this

[1] - Constructing cell-lineage-specific GRN...
Lineage - all: 
Warning: Auxiliary gene scores (e.g., differential expression level) are not considered!
0%|                                                                       | 0/350 [00:00<?, ?it/s]
---------------------------------------------------------------------------
OutOfMemoryError                          Traceback (most recent call last)
File <timed exec>:5

File ~/run/miniconda3/envs/CEFCON/lib/python3.10/site-packages/cefcon/cell_lineage_GRN.py:370, in NetModel.run(self, adata, showProgressBar)
 368 with trange(self.epochs, ncols=100) as t:
 369     for epoch in t:
--> 370         loss = self.__train(data, DGI_model, optimizer)
 371         t.set_description('  Iter: {}/{}'.format(rep + 1, self.repeats))
 372         if epoch < self.epochs - 1:

File ~/run/miniconda3/envs/CEFCON/lib/python3.10/site-packages/cefcon/cell_lineage_GRN.py:326, in NetModel.__train(data, model, optimizer)
 324 model.train()
 325 optimizer.zero_grad()
--> 326 pos_z, neg_z, summary = model(data)
 327 loss = model.loss(pos_z, neg_z, summary)
 328 loss.backward()

File ~/run/miniconda3/envs/CEFCON/lib/python3.10/site-packages/torch/nn/modules/module.py:1194, in Module._call_impl(self, *input, **kwargs)
1190 # If we don't have any hooks, we want to skip the rest of the logic in
1191 # this function, and just call forward.
1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1193         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194     return forward_call(*input, **kwargs)
1195 # Do not call functions when jit is used
1196 full_backward_hooks, non_full_backward_hooks = [], []

File ~/run/miniconda3/envs/CEFCON/lib/python3.10/site-packages/torch_geometric/nn/models/deep_graph_infomax.py:52, in DeepGraphInfomax.forward(self, *args, **kwargs)
  49 def forward(self, *args, **kwargs) -> Tuple[Tensor, Tensor, Tensor]:
  50     """Returns the latent space for the input arguments, their
  51     corruptions and their summary representation."""
---> 52     pos_z = self.encoder(*args, **kwargs)
  54     cor = self.corruption(*args, **kwargs)
  55     cor = cor if isinstance(cor, tuple) else (cor, )

File ~/run/miniconda3/envs/CEFCON/lib/python3.10/site-packages/torch/nn/modules/module.py:1194, in Module._call_impl(self, *input, **kwargs)
1190 # If we don't have any hooks, we want to skip the rest of the logic in
1191 # this function, and just call forward.
1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1193         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194     return forward_call(*input, **kwargs)
1195 # Do not call functions when jit is used
1196 full_backward_hooks, non_full_backward_hooks = [], []

File ~/run/miniconda3/envs/CEFCON/lib/python3.10/site-packages/cefcon/cell_lineage_GRN.py:222, in GRN_Encoder.forward(self, data)
 220 for norm, attn_in, attn_out, ffn in self.layers:
 221     x = norm(x)
--> 222     x_in, att_weights_in_ = attn_in(x, edge_index, x_auxiliary, return_attention_weights=True)
 223     x_out, att_weights_out_ = attn_out(x, edge_index, x_auxiliary, return_attention_weights=True)
 224     x = ffn(torch.cat((self.act(x_in), self.act(x_out)), 1))

File ~/run/miniconda3/envs/CEFCON/lib/python3.10/site-packages/torch/nn/modules/module.py:1194, in Module._call_impl(self, *input, **kwargs)
1190 # If we don't have any hooks, we want to skip the rest of the logic in
1191 # this function, and just call forward.
1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1193         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194     return forward_call(*input, **kwargs)
1195 # Do not call functions when jit is used
1196 full_backward_hooks, non_full_backward_hooks = [], []

File ~/run/miniconda3/envs/CEFCON/lib/python3.10/site-packages/cefcon/cell_lineage_GRN.py:111, in GraphAttention_layer.forward(self, x, edge_index, x_auxiliary, return_attention_weights)
 109     x_norm_l = F.normalize(x_l, p=2., dim=-1)
 110     x_norm_r = F.normalize(x_r, p=2., dim=-1)
--> 111     out = self.propagate(edge_index, x=(x_l, x_r), x_norm=(x_norm_l, x_norm_r),
 112                          x_auxiliary=x_auxiliary, size=None)
 113 else:  # SD
 114     out = self.propagate(edge_index, x=(x_l, x_r), x_norm=None,
 115                          x_auxiliary=x_auxiliary, size=None)

File ~/run/miniconda3/envs/CEFCON/lib/python3.10/site-packages/torch_geometric/nn/conv/message_passing.py:455, in MessagePassing.propagate(self, edge_index, size, **kwargs)
 452     for arg in decomp_args:
 453         kwargs[arg] = decomp_kwargs[arg][i]
--> 455 coll_dict = self._collect(self._user_args, edge_index, size,
 456                           kwargs)
 458 msg_kwargs = self.inspector.distribute('message', coll_dict)
 459 for hook in self._message_forward_pre_hooks.values():

File ~/run/miniconda3/envs/CEFCON/lib/python3.10/site-packages/torch_geometric/nn/conv/message_passing.py:329, in MessagePassing._collect(self, args, edge_index, size, kwargs)
 327         if isinstance(data, Tensor):
 328             self._set_size(size, dim, data)
--> 329             data = self._lift(data, edge_index, dim)
 331         out[arg] = data
 333 if is_torch_sparse_tensor(edge_index):

File ~/run/miniconda3/envs/CEFCON/lib/python3.10/site-packages/torch_geometric/nn/conv/message_passing.py:276, in MessagePassing._lift(self, src, edge_index, dim)
 269     raise IndexError(
 270         f"Encountered an index error. Please ensure that all "
 271         f"indices in 'edge_index' point to valid indices in "
 272         f"the interval [0, {src.size(self.node_dim) - 1}] "
 273         f"(got interval "
 274         f"[{int(index.min())}, {int(index.max())}])")
 275 else:
--> 276     raise e
 278 if index.numel() > 0 and index.min() < 0:
 279     raise ValueError(
 280         f"Found negative indices in 'edge_index' (got "
 281         f"{index.min().item()}). Please ensure that all "
 282         f"indices in 'edge_index' point to valid indices "
 283         f"in the interval [0, {src.size(self.node_dim)}) in "
 284         f"your node feature matrix and try again.")

File ~/run/miniconda3/envs/CEFCON/lib/python3.10/site-packages/torch_geometric/nn/conv/message_passing.py:266, in MessagePassing._lift(self, src, edge_index, dim)
 264 try:
 265     index = edge_index[dim]
--> 266     return src.index_select(self.node_dim, index)
 267 except (IndexError, RuntimeError) as e:
 268     if index.min() < 0 or index.max() >= src.size(self.node_dim):

OutOfMemoryError: CUDA out of memory. Tried to allocate 6.77 GiB (GPU 0; 23.65 GiB total capacity; 20.62 GiB already allocated; 2.49 GiB free; 20.64 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
@WPZgithub
Copy link
Owner

Hi! Please make sure you have sufficient GPU memory.
You can use nvidia-smi to check the current memory usage.

@TdzBAS
Copy link

TdzBAS commented Aug 13, 2024

I have the same error. Any updates?

@TdzBAS TdzBAS mentioned this issue Aug 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants