How do you calculate loss for commit data in Pretrain phase?

Thanks for your excellent job and opensourced model!
When I refer to the technical report, I found this sentence very interesting:
`To utilize GitHub commits data for pretraining, we format each sample as a code change prediction task: given a commit message and its associated context, the model predicts the modified file paths and the corresponding code changes.`
It seems you mask the loss of the commit message and context. I just want to confirm my point of view.
If so, do you mix these commit data with other corpus? How do you calculate the loss in one batch if you can’t tell which sample should be masked partially? 😃 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do you calculate loss for commit data in Pretrain phase? #17

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How do you calculate loss for commit data in Pretrain phase? #17

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions