Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training loss #26

Open
DevKretov opened this issue Mar 28, 2020 · 7 comments
Open

Training loss #26

DevKretov opened this issue Mar 28, 2020 · 7 comments

Comments

@DevKretov
Copy link

DevKretov commented Mar 28, 2020

Hello,

I was wondering whether it is possible to add some loss metrics to the training cycle? The only thing I see during training Electra model is

1275000/3000000 = 42.5%, SPS: 3.1, ELAP: 9:24:02, ETA: 6 days, 11:55:19

which tells nothing about how good is it. I'm trying to add some code to the estimator, but it seems to me that it could be much easier to show all the metrics in order to see how successful the model is at this stage.

I'm training non-English model, so I wanted to get better insight into how my model is performing at the moment.

Thanks

@008karan
Copy link

You can always use tensorboard though!

@chriskhanhtran
Copy link

@008karan Hi Karan. I am not a TF user. Can you please instruct me how to use tensorboard in this case? I see an tfevents file in the model directory but it seems not to be written for tensorboard. The script used tf.estimator.tpu.TPUEstimator to train the model so I don't know how to extract the loss. Thank you very much in advance!

@008karan
Copy link

008karan commented May 2, 2020

I have trained my model on GPU and using tensorboard is similar here. You will find events.out files in your checkpoint folder. Just run tensorboard on it.

@chriskhanhtran
Copy link

@008karan Thank you Karan! It works for me now after I use tensorboard==1.15.0. Do you know how the author can continuously get the evaluation metrics as in this thread? I can only get the evaluation metrics at the end of my training progress.

@008karan
Copy link

008karan commented May 3, 2020

In tensorboard you get loss and learning rate here I think you can add whatever you want in logs to see them on tensorboard!

@mrm8488
Copy link
Contributor

mrm8488 commented May 6, 2020

When I trained Electra small on my Spanish corpus the loss was shown if trained on GPU. Now, I got access to TFRC and trained it using its TPUs pod and loss it is not shown. Of course, I can get it from Tensorboard events but would be great to log it by default when running on TPU.

@nemani
Copy link

nemani commented May 30, 2020

You can set the tensorflow log level to info and it will be much more verbose including printing the loss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants