-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure fine-tuning/prompt learning work for T5 #6385
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
SeanNaren
force-pushed
the
GPT_integrate_core_t5_finetune
branch
from
April 6, 2023 20:32
b53d47b
to
6f49c0f
Compare
ericharper
added a commit
that referenced
this pull request
Apr 13, 2023
* import parallel_state and tensor_parallel from megatron.core Signed-off-by: ericharper <complex451@gmail.com> * update column parallel async allreduce arg Signed-off-by: ericharper <complex451@gmail.com> * typos Signed-off-by: ericharper <complex451@gmail.com> * play stash + some changes Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * make grad scaler callable Signed-off-by: ericharper <complex451@gmail.com> * Fixed formatting Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Make sure RETRO integrates well with the core (#6207) * fix tests Signed-off-by: Yi Dong <yidong@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Yi Dong <yidong@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [NLP] Support T5 with Megatron Core (#6222) * Support T5 with Megatron Core Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Remove comment Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Update prediction step Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Further changes to fix fine-tuning Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Bug fixes from runs Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Revert changes to batch sampler, swap to pretrained sampler Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Address feedback Signed-off-by: SeanNaren <snarenthiran@nvidia.com> --------- Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * GPT P-tuning core (max_len pad -> slow) Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * add GPT p-tuning w/ global batch based passing Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * add T5 p-tuning support Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * add megatron core install to Jenkinsfile Signed-off-by: ericharper <complex451@gmail.com> * fix command Signed-off-by: ericharper <complex451@gmail.com> * add guard efault for arg Signed-off-by: ericharper <complex451@gmail.com> * shift bert, retro, adapter + other namespace changes Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * build_model merge into one Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * Ensure fine-tuning/prompt learning work for T5 (#6385) Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * rm extra split impl Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * fix for CI Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * temp change for tests Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * add bs=1 for log Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * fix Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * iter changes NMT Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * NMT partial fix Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * move on_train_batch_end to base_model Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * rm on_train_batch_end Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * temp remove NMT test Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * add training_step logic for T5 derived dynamic len models Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * add NMT test back Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * style fix Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * change no_async_tensor_model_parallel_allreduce Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * sequence_parallel_enabled -> sequence_parallel Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * fix T5 FT batch size Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * seq enabled Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * T5 sequence length fix Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * NMT mp fork to spawn Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * make function signatures consistent across models Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * make print log Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * rm unused import Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * update Dockerfile to install core Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * keep core path in workspace Signed-off-by: Abhinav Khattar <aklife97@gmail.com> --------- Signed-off-by: ericharper <complex451@gmail.com> Signed-off-by: Abhinav Khattar <aklife97@gmail.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Signed-off-by: Yi Dong <yidong@nvidia.com> Co-authored-by: ericharper <complex451@gmail.com> Co-authored-by: SeanNaren <snarenthiran@nvidia.com> Co-authored-by: Yi Dong <43824965+yidong72@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
hsiehjackson
pushed a commit
to hsiehjackson/NeMo
that referenced
this pull request
Jun 2, 2023
* import parallel_state and tensor_parallel from megatron.core Signed-off-by: ericharper <complex451@gmail.com> * update column parallel async allreduce arg Signed-off-by: ericharper <complex451@gmail.com> * typos Signed-off-by: ericharper <complex451@gmail.com> * play stash + some changes Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * make grad scaler callable Signed-off-by: ericharper <complex451@gmail.com> * Fixed formatting Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Make sure RETRO integrates well with the core (NVIDIA#6207) * fix tests Signed-off-by: Yi Dong <yidong@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Yi Dong <yidong@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [NLP] Support T5 with Megatron Core (NVIDIA#6222) * Support T5 with Megatron Core Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Remove comment Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Update prediction step Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Further changes to fix fine-tuning Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Bug fixes from runs Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Revert changes to batch sampler, swap to pretrained sampler Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Address feedback Signed-off-by: SeanNaren <snarenthiran@nvidia.com> --------- Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * GPT P-tuning core (max_len pad -> slow) Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * add GPT p-tuning w/ global batch based passing Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * add T5 p-tuning support Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * add megatron core install to Jenkinsfile Signed-off-by: ericharper <complex451@gmail.com> * fix command Signed-off-by: ericharper <complex451@gmail.com> * add guard efault for arg Signed-off-by: ericharper <complex451@gmail.com> * shift bert, retro, adapter + other namespace changes Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * build_model merge into one Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * Ensure fine-tuning/prompt learning work for T5 (NVIDIA#6385) Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * rm extra split impl Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * fix for CI Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * temp change for tests Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * add bs=1 for log Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * fix Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * iter changes NMT Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * NMT partial fix Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * move on_train_batch_end to base_model Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * rm on_train_batch_end Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * temp remove NMT test Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * add training_step logic for T5 derived dynamic len models Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * add NMT test back Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * style fix Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * change no_async_tensor_model_parallel_allreduce Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * sequence_parallel_enabled -> sequence_parallel Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * fix T5 FT batch size Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * seq enabled Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * T5 sequence length fix Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * NMT mp fork to spawn Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * make function signatures consistent across models Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * make print log Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * rm unused import Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * update Dockerfile to install core Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * keep core path in workspace Signed-off-by: Abhinav Khattar <aklife97@gmail.com> --------- Signed-off-by: ericharper <complex451@gmail.com> Signed-off-by: Abhinav Khattar <aklife97@gmail.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Signed-off-by: Yi Dong <yidong@nvidia.com> Co-authored-by: ericharper <complex451@gmail.com> Co-authored-by: SeanNaren <snarenthiran@nvidia.com> Co-authored-by: Yi Dong <43824965+yidong72@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do ?
Adds support for T5 Finetuning/Prompt tuning.
Changelog
Usage
# Add a code snippet demonstrating how to use this
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information