val_wer #7450

leilei183 · 2023-09-18T12:26:57Z

When I change the ddp to auto, I can train, but I get a new error with the following error. What do you guys suggest please?
error executing job with overrides: []
Traceback (most recent call last):
File "/home/zhengbeida/code/NeMo-main/NeMo-main/NeMo-main/examples/slu/speech_intent_slot/run_speech_intent_slot_train.py", line 119, in main
trainer.fit(model)
File "/home/zhengbeida/anaconda3/envs/nemo/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 520, in fit
call._call_and_handle_interrupt(
File "/home/zhengbeida/anaconda3/envs/nemo/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 44, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/home/zhengbeida/anaconda3/envs/nemo/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 559, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/home/zhengbeida/anaconda3/envs/nemo/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 935, in _run
results = self._run_stage()
File "/home/zhengbeida/anaconda3/envs/nemo/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 978, in _run_stage
self.fit_loop.run()
File "/home/zhengbeida/anaconda3/envs/nemo/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 201, in run
self.advance()
File "/home/zhengbeida/anaconda3/envs/nemo/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 354, in advance
self.epoch_loop.run(self._data_fetcher)
File "/home/zhengbeida/anaconda3/envs/nemo/lib/python3.10/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 134, in run
self.on_advance_end()
File "/home/zhengbeida/anaconda3/envs/nemo/lib/python3.10/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 248, in on_advance_end
self.val_loop.run()
File "/home/zhengbeida/anaconda3/envs/nemo/lib/python3.10/site-packages/pytorch_lightning/loops/utilities.py", line 174, in _decorator
return loop_run(self, *args, **kwargs)
File "/home/zhengbeida/anaconda3/envs/nemo/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 122, in run
return self.on_run_end()
File "/home/zhengbeida/anaconda3/envs/nemo/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 258, in on_run_end
self._on_evaluation_end()
File "/home/zhengbeida/anaconda3/envs/nemo/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 303, in _on_evaluation_end
call._call_callback_hooks(trainer, hook_name, *args, **kwargs)
File "/home/zhengbeida/anaconda3/envs/nemo/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 190, in _call_callback_hooks
fn(trainer, trainer.lightning_module, *args, **kwargs)
File "/home/zhengbeida/anaconda3/envs/nemo/lib/python3.10/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 311, in on_validation_end
self._save_topk_checkpoint(trainer, monitor_candidates)
File "/home/zhengbeida/anaconda3/envs/nemo/lib/python3.10/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 358, in _save_topk_checkpoint
raise MisconfigurationException(m)
lightning_fabric.utilities.exceptions.MisconfigurationException: ModelCheckpoint(monitor='val_wer') could not find the monitored key in the returned metrics: ['train_loss', 'learning_rate_g0', 'learning_rate_g1', 'train_backward_timing', 'train_step_timing', 'training_batch_wer', 'epoch', 'step']. HINT: Did you call log('val_wer', value) in the LightningModule?
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

The text was updated successfully, but these errors were encountered:

titu1994 · 2023-09-22T18:25:57Z

@stevehuang52 could you take a look

XuesongYang · 2023-09-25T22:35:11Z

The bugfix PR is here: #7505

stevehuang52 · 2023-09-26T07:10:58Z

This PR #7505 is under review, needs a few more updates before merging

stevehuang52 · 2023-09-27T02:01:11Z

This PR #7505 was closed, I will create another PR for fixing the logging issue with PTL 2.0

stevehuang52 · 2023-09-27T02:33:19Z

@leilei183 the current NeMo main branch is moving to PTL 2.0, where a lot of APIs need to be updated and we're still working on that. Could you please use the stable NeMo r1.20 release for now?

leilei183 · 2023-09-27T02:36:37Z

Hi,@stevehuang52,thank you very much for your answer, I will use the stable version first.

github-actions · 2023-10-28T01:43:41Z

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions · 2023-11-04T01:44:22Z

This issue was closed because it has been inactive for 7 days since being marked as stale.

leilei183 added the bug Something isn't working label Sep 18, 2023

titu1994 assigned stevehuang52 Sep 22, 2023

KunalDhawan mentioned this issue Sep 25, 2023

Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue #7505

Closed

8 tasks

This was referenced Oct 11, 2023

fix ptl_bugs in slu_models.py #7688

Closed

fix ptl_bugs in slu_models.py #7689

Merged

github-actions bot mentioned this issue Oct 13, 2023

fix ptl_bugs in slu_models.py #7712

Merged

8 tasks

github-actions bot added the stale label Oct 28, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

val_wer #7450

val_wer #7450

leilei183 commented Sep 18, 2023

titu1994 commented Sep 22, 2023

XuesongYang commented Sep 25, 2023

stevehuang52 commented Sep 26, 2023

stevehuang52 commented Sep 27, 2023

stevehuang52 commented Sep 27, 2023

leilei183 commented Sep 27, 2023

github-actions bot commented Oct 28, 2023

github-actions bot commented Nov 4, 2023

val_wer #7450

val_wer #7450

Comments

leilei183 commented Sep 18, 2023

titu1994 commented Sep 22, 2023

XuesongYang commented Sep 25, 2023

stevehuang52 commented Sep 26, 2023

stevehuang52 commented Sep 27, 2023

stevehuang52 commented Sep 27, 2023

leilei183 commented Sep 27, 2023

github-actions bot commented Oct 28, 2023

github-actions bot commented Nov 4, 2023