Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pt: support list format batch size #3614

Merged
merged 3 commits into from
Mar 28, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 22 additions & 16 deletions deepmd/pt/utils/dataloader.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
)

import h5py
import numpy as np
import torch
import torch.distributed as dist
import torch.multiprocessing
Expand Down Expand Up @@ -106,29 +107,34 @@

self.dataloaders = []
self.batch_sizes = []
for system in self.systems:
if isinstance(batch_size, str):
if batch_size == "auto":
rule = 32
elif batch_size.startswith("auto:"):
rule = int(batch_size.split(":")[1])

Check warning on line 114 in deepmd/pt/utils/dataloader.py

View check run for this annotation

Codecov / codecov/patch

deepmd/pt/utils/dataloader.py#L113-L114

Added lines #L113 - L114 were not covered by tests
else:
rule = None
log.error("Unsupported batch size type")

Check warning on line 117 in deepmd/pt/utils/dataloader.py

View check run for this annotation

Codecov / codecov/patch

deepmd/pt/utils/dataloader.py#L116-L117

Added lines #L116 - L117 were not covered by tests
for ii in self.systems:
ni = ii._natoms
bsi = rule // ni
if bsi * ni < rule:
bsi += 1
self.batch_sizes.append(bsi)
elif isinstance(batch_size, list):
self.batch_sizes = batch_size

Check warning on line 125 in deepmd/pt/utils/dataloader.py

View check run for this annotation

Codecov / codecov/patch

deepmd/pt/utils/dataloader.py#L125

Added line #L125 was not covered by tests
else:
self.batch_sizes = batch_size * np.ones(len(systems), dtype=int)
assert len(self.systems) == len(self.batch_sizes)
for system, batch_size in zip(self.systems, self.batch_sizes):
CaRoLZhangxy marked this conversation as resolved.
Show resolved Hide resolved
if dist.is_initialized():
system_sampler = DistributedSampler(system)
self.sampler_list.append(system_sampler)
else:
system_sampler = None
if isinstance(batch_size, str):
if batch_size == "auto":
rule = 32
elif batch_size.startswith("auto:"):
rule = int(batch_size.split(":")[1])
else:
rule = None
log.error("Unsupported batch size type")
self.batch_size = rule // system._natoms
if self.batch_size * system._natoms < rule:
self.batch_size += 1
else:
self.batch_size = batch_size
self.batch_sizes.append(self.batch_size)
system_dataloader = DataLoader(
dataset=system,
batch_size=self.batch_size,
batch_size=int(batch_size),
num_workers=0, # Should be 0 to avoid too many threads forked
sampler=system_sampler,
collate_fn=collate_batch,
Expand Down