Skip to content

How do you calculate loss for commit data in Pretrain phase? #17

@zpcalan

Description

@zpcalan

Thanks for your excellent job and opensourced model!
When I refer to the technical report, I found this sentence very interesting:
To utilize GitHub commits data for pretraining, we format each sample as a code change prediction task: given a commit message and its associated context, the model predicts the modified file paths and the corresponding code changes.
It seems you mask the loss of the commit message and context. I just want to confirm my point of view.
If so, do you mix these commit data with other corpus? How do you calculate the loss in one batch if you can’t tell which sample should be masked partially? 😃

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions