Training loss #26

DevKretov · 2020-03-28T08:47:56Z

Hello,

I was wondering whether it is possible to add some loss metrics to the training cycle? The only thing I see during training Electra model is

1275000/3000000 = 42.5%, SPS: 3.1, ELAP: 9:24:02, ETA: 6 days, 11:55:19

which tells nothing about how good is it. I'm trying to add some code to the estimator, but it seems to me that it could be much easier to show all the metrics in order to see how successful the model is at this stage.

I'm training non-English model, so I wanted to get better insight into how my model is performing at the moment.

Thanks

The text was updated successfully, but these errors were encountered:

008karan · 2020-03-30T12:51:19Z

You can always use tensorboard though!

chriskhanhtran · 2020-05-01T21:48:04Z

@008karan Hi Karan. I am not a TF user. Can you please instruct me how to use tensorboard in this case? I see an tfevents file in the model directory but it seems not to be written for tensorboard. The script used tf.estimator.tpu.TPUEstimator to train the model so I don't know how to extract the loss. Thank you very much in advance!

008karan · 2020-05-02T05:18:28Z

I have trained my model on GPU and using tensorboard is similar here. You will find events.out files in your checkpoint folder. Just run tensorboard on it.

chriskhanhtran · 2020-05-02T19:56:30Z

@008karan Thank you Karan! It works for me now after I use tensorboard==1.15.0. Do you know how the author can continuously get the evaluation metrics as in this thread? I can only get the evaluation metrics at the end of my training progress.

008karan · 2020-05-03T13:41:26Z

In tensorboard you get loss and learning rate here I think you can add whatever you want in logs to see them on tensorboard!

mrm8488 · 2020-05-06T22:17:04Z

When I trained Electra small on my Spanish corpus the loss was shown if trained on GPU. Now, I got access to TFRC and trained it using its TPUs pod and loss it is not shown. Of course, I can get it from Tensorboard events but would be great to log it by default when running on TPU.

nemani · 2020-05-30T12:05:00Z

You can set the tensorflow log level to info and it will be much more verbose including printing the loss.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training loss #26

Training loss #26

DevKretov commented Mar 28, 2020 •

edited

Loading

008karan commented Mar 30, 2020

chriskhanhtran commented May 1, 2020

008karan commented May 2, 2020

chriskhanhtran commented May 2, 2020

008karan commented May 3, 2020

mrm8488 commented May 6, 2020 •

edited

Loading

nemani commented May 30, 2020

Training loss #26

Training loss #26

Comments

DevKretov commented Mar 28, 2020 • edited Loading

008karan commented Mar 30, 2020

chriskhanhtran commented May 1, 2020

008karan commented May 2, 2020

chriskhanhtran commented May 2, 2020

008karan commented May 3, 2020

mrm8488 commented May 6, 2020 • edited Loading

nemani commented May 30, 2020

DevKretov commented Mar 28, 2020 •

edited

Loading

mrm8488 commented May 6, 2020 •

edited

Loading