Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BE] Set weights and biases as default logger #647

Merged
merged 1 commit into from
Apr 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions TRAIN.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,8 +106,8 @@ To train a SchNet model for the IS2RE task on the 10k split, run:
python main.py --mode train --config-yml configs/is2re/10k/schnet/schnet.yml
```

Training logs are stored in `logs/tensorboard/[TIMESTAMP]` where `[TIMESTAMP]` is
the starting time-stamp of the run. You can monitor the training process by running:
Training logs are stored in `logs/wandb/[TIMESTAMP]` or `logs/tensorboard/[TIMESTAMP]` where `[TIMESTAMP]` is
the starting time-stamp of the run. For tensorboard, you can monitor the training process by running:
```bash
tensorboard --logdir logs/tensorboard/[TIMESTAMP]
```
Expand Down Expand Up @@ -187,7 +187,7 @@ To train a SchNet model for the S2EF task on the 2M split using 2 GPUs, run:
python -u -m torch.distributed.launch --nproc_per_node=2 main.py \
--mode train --config-yml configs/s2ef/2M/schnet/schnet.yml --num-gpus 2 --distributed
```
Similar to the IS2RE task, tensorboard logs are stored in `logs/tensorboard/[TIMESTAMP]` and the
Similar to the IS2RE task, logs are stored in `logs/wandb/[TIMESTAMP]` or `logs/tensorboard/[TIMESTAMP]` and the
checkpoint is stored in `checkpoints/[TIMESTAMP]/checkpoint.pt`.

Next, run this model on the test data:
Expand Down
2 changes: 1 addition & 1 deletion configs/is2re/100k/base.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ dataset:
target_std: 2.279365062713623
- src: data/is2re/all/val_id/data.lmdb

logger: tensorboard
logger: wandb

task:
dataset: single_point_lmdb
Expand Down
2 changes: 1 addition & 1 deletion configs/is2re/10k/base.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ dataset:
target_std: 2.279365062713623
- src: data/is2re/all/val_id/data.lmdb

logger: tensorboard
logger: wandb

task:
dataset: single_point_lmdb
Expand Down
2 changes: 1 addition & 1 deletion configs/is2re/all/base.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ dataset:
target_std: 2.279365062713623
- src: data/is2re/all/val_id/data.lmdb

logger: tensorboard
logger: wandb

task:
dataset: single_point_lmdb
Expand Down
2 changes: 1 addition & 1 deletion configs/ocp_example.yml
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ task:
# since this can be significantly slower.
set_deterministic_scatter: False # True or False

logger: tensorboard # 'wandb' or 'tensorboard'
logger: wandb # 'wandb' or 'tensorboard'

loss_functions:
# Specify the different terms in the loss function. For each term, the target property must
Expand Down
2 changes: 1 addition & 1 deletion configs/s2ef/200k/base.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ dataset:
grad_target_std: 2.887317180633545
- src: data/s2ef/all/val_id/

logger: tensorboard
logger: wandb

task:
dataset: trajectory_lmdb
Expand Down
2 changes: 1 addition & 1 deletion configs/s2ef/20M/base.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ dataset:
grad_target_std: 2.887317180633545
- src: data/s2ef/all/val_id/

logger: tensorboard
logger: wandb

task:
dataset: trajectory_lmdb
Expand Down
2 changes: 1 addition & 1 deletion configs/s2ef/2M/base.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ dataset:
grad_target_std: 2.887317180633545
- src: data/s2ef/all/val_id/

logger: tensorboard
logger: wandb

task:
dataset: trajectory_lmdb
Expand Down
2 changes: 1 addition & 1 deletion configs/s2ef/2M/dimenet_plus_plus/dpp_relax.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ dataset:
grad_target_std: 2.887317180633545
- src: data/s2ef/all/val_id/

logger: tensorboard
logger: wandb

task:
dataset: trajectory_lmdb
Expand Down
2 changes: 1 addition & 1 deletion configs/s2ef/all/base.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ dataset:
grad_target_std: 2.887317180633545
- src: data/s2ef/all/val_id/

logger: tensorboard
logger: wandb

task:
dataset: trajectory_lmdb
Expand Down
2 changes: 1 addition & 1 deletion configs/s2ef/all/dimenet_plus_plus/dpp10.7M_forceonly.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ dataset:
grad_target_std: 2.887317180633545
- src: data/s2ef/all/val_id/

logger: tensorboard
logger: wandb

task:
dataset: trajectory_lmdb
Expand Down
2 changes: 1 addition & 1 deletion configs/s2ef/all/dimenet_plus_plus/dpp_energyonly.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ dataset:
grad_target_std: 2.887317180633545
- src: data/s2ef/all/val_id/

logger: tensorboard
logger: wandb

task:
dataset: trajectory_lmdb
Expand Down
2 changes: 1 addition & 1 deletion configs/s2ef/all/dimenet_plus_plus/dpp_forceonly.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ dataset:
grad_target_std: 2.887317180633545
- src: data/s2ef/all/val_id/

logger: tensorboard
logger: wandb

task:
dataset: trajectory_lmdb
Expand Down
4 changes: 2 additions & 2 deletions ocpmodels/common/registry.py
Original file line number Diff line number Diff line change
Expand Up @@ -140,8 +140,8 @@ def register_logger(cls, name: str):

from ocpmodels.common.registry import registry

@registry.register_logger("tensorboard")
class WandB():
@registry.register_logger("wandb")
class WandBLogger():
...
"""

Expand Down
2 changes: 1 addition & 1 deletion ocpmodels/common/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -1028,7 +1028,7 @@ class _TrainingContext:
is_debug=config.get("is_debug", False),
print_every=config.get("print_every", 10),
seed=config.get("seed", 0),
logger=config.get("logger", "tensorboard"),
logger=config.get("logger", "wandb"),
local_rank=config["local_rank"],
amp=config.get("amp", False),
cpu=config.get("cpu", False),
Expand Down
2 changes: 1 addition & 1 deletion ocpmodels/modules/scaling/fit.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ def main(*, num_batches: int = 16) -> None:
parser = flags.get_parser()
args, override_args = parser.parse_known_args()
_config = build_config(args, override_args)
_config["logger"] = "tensorboard"
_config["logger"] = "wandb"
# endregion

assert not args.distributed, "This doesn't work with DDP"
Expand Down
2 changes: 1 addition & 1 deletion ocpmodels/trainers/base_trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ def __init__(
is_debug: bool = False,
print_every: int = 100,
seed: Optional[int] = None,
logger: str = "tensorboard",
logger: str = "wandb",
local_rank: int = 0,
amp: bool = False,
cpu: bool = False,
Expand Down
4 changes: 2 additions & 2 deletions ocpmodels/trainers/ocp_trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ class OCPTrainer(BaseTrainer):
seed (int, optional): Random number seed.
(default: :obj:`None`)
logger (str, optional): Type of logger to be used.
(default: :obj:`tensorboard`)
(default: :obj:`wandb`)
local_rank (int, optional): Local rank of the process, only applicable for distributed training.
(default: :obj:`0`)
amp (bool, optional): Run using automatic mixed precision.
Expand All @@ -81,7 +81,7 @@ def __init__(
is_debug=False,
print_every=100,
seed=None,
logger="tensorboard",
logger="wandb",
local_rank=0,
amp=False,
cpu=False,
Expand Down