-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add include_text parameter to SFT dataloaders #8198
Conversation
Signed-off-by: Igor Gitman <igitman@nvidia.com>
jenkins |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you rename include_text
to "output_original_text" or something similar to make more clear
@@ -58,6 +58,7 @@ def __init__( | |||
truncation_method: str = 'right', | |||
special_tokens: Optional[Mapping[str, str]] = None, # special tokens, a dictory of {token_type: token} | |||
is_test: bool = False, | |||
include_text: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you rename this to "output_original_text" or something to make more clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure!
Signed-off-by: Igor Gitman <igitman@nvidia.com>
jenkins |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
* Add include_text parameter to SFT dataloaders Signed-off-by: Igor Gitman <igitman@nvidia.com> * Rename include_text -> output_original_text Signed-off-by: Igor Gitman <igitman@nvidia.com> --------- Signed-off-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
* Add include_text parameter to SFT dataloaders Signed-off-by: Igor Gitman <igitman@nvidia.com> * Rename include_text -> output_original_text Signed-off-by: Igor Gitman <igitman@nvidia.com> --------- Signed-off-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
This reverts commit 2e61ecf.
* Add include_text parameter to SFT dataloaders Signed-off-by: Igor Gitman <igitman@nvidia.com> * Rename include_text -> output_original_text Signed-off-by: Igor Gitman <igitman@nvidia.com> --------- Signed-off-by: Igor Gitman <igitman@nvidia.com>
* Add include_text parameter to SFT dataloaders Signed-off-by: Igor Gitman <igitman@nvidia.com> * Rename include_text -> output_original_text Signed-off-by: Igor Gitman <igitman@nvidia.com> --------- Signed-off-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: stevehuang52 <heh@nvidia.com>
* Add include_text parameter to SFT dataloaders Signed-off-by: Igor Gitman <igitman@nvidia.com> * Rename include_text -> output_original_text Signed-off-by: Igor Gitman <igitman@nvidia.com> --------- Signed-off-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: Sasha Meister <ameister@nvidia.com>
* Add include_text parameter to SFT dataloaders Signed-off-by: Igor Gitman <igitman@nvidia.com> * Rename include_text -> output_original_text Signed-off-by: Igor Gitman <igitman@nvidia.com> --------- Signed-off-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: Pablo Garay <pagaray@nvidia.com>
* Add include_text parameter to SFT dataloaders Signed-off-by: Igor Gitman <igitman@nvidia.com> * Rename include_text -> output_original_text Signed-off-by: Igor Gitman <igitman@nvidia.com> --------- Signed-off-by: Igor Gitman <igitman@nvidia.com>
What does this PR do ?
Adding a new parameter to include original text in the dictionary returned from a dataloader. This is needed for a cleaner support of metrics in NeMo-Aligner.
Collection: [Note which collection this PR will affect]
Changelog
Usage
# Add a code snippet demonstrating how to use this
Jenkins CI
To run Jenkins, a NeMo User with write access must comment
jenkins
on the PR.Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information