Monitor Training with Tensorboard

I am trying to monitor training for the OpenChatKit-7B model by increasing the number of iterations etc. I want to monitor the quality of the training with Tensorboard but have not managed to get it to work. I have been including the SummaryWriter into the test_loop function in dist_clm_train.py:

        ...
        ...
        loss = torch.tensor(loss_list).mean()
        ppls = torch.exp(loss)
        metric = {"valid.perplexity": ppls.item(), "valid.loss": loss.item()}

        # ADDED to calculate tensorboard scalars 
        metric = {"train.perplexity": ppls.item(), "train.loss": loss.item()}
        train_log(metric, step=pipe.global_step)
        tb.add_scalar('train/perplexity', ppls.item(), tmpStep)
        tb.add_scalar('train/loss', loss.item(), tmpStep)
        tmpStep += 1
        # END of ADDED

        ...
        ...

Please can you advise whether this is possible and if so how it can be done. Any help / guidance would be much appreciated.

Many thanks,

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Monitor Training with Tensorboard #105

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Monitor Training with Tensorboard #105

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions