Thanks for your excellent job and opensourced model!
When I refer to the technical report, I found this sentence very interesting:
To utilize GitHub commits data for pretraining, we format each sample as a code change prediction task: given a commit message and its associated context, the model predicts the modified file paths and the corresponding code changes.
It seems you mask the loss of the commit message and context. I just want to confirm my point of view.
If so, do you mix these commit data with other corpus? How do you calculate the loss in one batch if you can’t tell which sample should be masked partially? 😃