Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

total_train_steps too high #8

Open
snimu opened this issue Apr 5, 2023 · 0 comments
Open

total_train_steps too high #8

snimu opened this issue Apr 5, 2023 · 0 comments

Comments

@snimu
Copy link

snimu commented Apr 5, 2023

Hi,

total_train_steps is currently at 200_000. This seems way too high; I get a val_loss of ~3.8 after around 1000 steps, and a perplexity of around 40.

Edit: When using torch 2.0, Setting total_train_steps to 1000 leads to an exception:

File "main.py", line 522, in main
    scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer=opt, max_lr=hyp['opt']['lr'], total_steps=hyp['opt']['total_train_steps'], pct_start=hyp['opt']['warmup_percent'], anneal_strategy='linear', cycle_momentum=False, div_factor=1e2, final_div_factor=.02)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 1676, in __init__
    super().__init__(optimizer, last_epoch, verbose)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 79, in __init__
    self._initial_step()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 85, in _initial_step
    self.step()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 150, in step
    values = self.get_lr()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 1714, in get_lr
    pct = (step_num - start_step) / (end_step - start_step)
ZeroDivisionError: float division by zero

(I use a slightly changed version of this package, but didn't touch main.py or any of the building blocks other than total_train_steps).

Using 'total_train_steps' = 2_000 seems to work fine for me, so I would cautiously suggest doing that :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant