why do we manually loop through each batch ? #22

linminhtoo · 2022-04-28T07:13:28Z

Hello authors,

I am in the process of modifying dMaSIF for the downstream task of protein-ligand binding affinity prediction. While reading & modifying your code, I noticed that in data_iteration.iterate, https://github.com/FreyrS/dMaSIF/blob/master/data_iteration.py#L290
we actually extract individual proteins/protein-pairs in a batch, and then do forward pass on each of those batches.

Effectively, doesn't this equate to a batch_size of 1 ? even though in the benchmark_scripts, the --batch_size argument is set to 64, it is not actually used and the batch_size is hardcoded to 1. https://github.com/FreyrS/dMaSIF/blob/master/main_training.py#L51

Is there a reason for doing this, rather than just doing a forward pass on the entire batch?

As a side note, this line (https://github.com/FreyrS/dMaSIF/blob/master/data_iteration.py#L299) also indicates that the code is hardcoded to a batch_size of 1. My understanding was that it should be
P1["rand_rot"] = protein_pair.rand_rot1.view(-1, 3, 3)[protein_it] instead of P1["rand_rot"] = protein_pair.rand_rot1.view(-1, 3, 3)[0]

Thank you and appreciate your help.

The text was updated successfully, but these errors were encountered:

Wendysigh · 2022-05-26T10:05:01Z

Hi @linminhtoo , I have same question as you. The scripts make batchsize=64 while actually batchsize is hardcoded as 1.

I noticed the line (https://github.com/FreyrS/dMaSIF/blob/master/data_iteration.py#L353) also ensures the batchsize=1 when optimize the model.

FreyrS · 2022-06-10T11:22:21Z

Hi @linminhtoo,

You're absolutely right, we generate the surfaces of a batch but then iterate individually through them.
The reason for this is that I found that during training a larger batch size causes instability in the training process.
In a follow-up work that I'm currently working on we were able to solve these issues and training with larger batch sizes is no longer a problem. I'll try to update this code appropriately after we finish our experiments of the follow-up

camel2000 · 2023-09-15T06:10:18Z

@FreyrS @linminhtoo @Wendysigh @jeanfeydy
I have modified the code and tried to test it by batch, but it is found that when the batch_size is different, the output embedding is not consistent, is this normal?
###############################
batch_size=1:

batch_size=2:

the first example in the masif-site test

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why do we manually loop through each batch ? #22

why do we manually loop through each batch ? #22

linminhtoo commented Apr 28, 2022 •

edited

Loading

Wendysigh commented May 26, 2022

FreyrS commented Jun 10, 2022

camel2000 commented Sep 15, 2023 •

edited

Loading

why do we manually loop through each batch ? #22

why do we manually loop through each batch ? #22

Comments

linminhtoo commented Apr 28, 2022 • edited Loading

Wendysigh commented May 26, 2022

FreyrS commented Jun 10, 2022

camel2000 commented Sep 15, 2023 • edited Loading

linminhtoo commented Apr 28, 2022 •

edited

Loading

camel2000 commented Sep 15, 2023 •

edited

Loading