07 Jan 23:06

ko3n1g

6d3dee5

Latest

New Features and Optimizations

Added context parallel (CP) support for SFT. CP requires you to prepare your dataset using NeMo's prepare_packed_ft_dataset.py script prior to training. Be sure to pass the context parallel size to this script, for example:
```
python scripts/nlp_language_modeling/prepare_packed_ft_dataset.py \
   model.data.train_ds.file_names=[/path/to/training.jsonl] \
   model.data.train_ds.max_seq_length=2048 \
   +tokenizer_path=/path/to/tokenizer \
   +output_dir=/path/to/output_folder \
   +pack_sizes=[2048,4096,8192] \
   model.context_parallel_size=2
```
CP can then be enabled in your training run by setting model.context_parallel_size in your config. Refer to the SFT documentation
for more details on running prepare_packed_ft_dataset.py and on running SFT with a packed dataset.
Sequence packing is now supported when running DPO.
Added support for Knowledge Distillation with SFT. See the tutorial for details.
Added support for Megatron Core’s distributed optimizer, which can be configured using ++model.optim.name=mcore_distributed_optim.

Introduced ScopedTimer as a successor to SyncedTimer. SyncedTimer is marked for deprecation and will be removed in the next version.

from nemo_aligner.utils.distributed import ScopedTimer
timer = ScopedTimer()

# All durations are logged in the timer
with timer("step_time"):
    with timer("fwd"):
        model.fwd()
    with timer("bwd"):
        model.bwd()

# Consume all durations and reset internal store
durations = timer.consume_durations()

Add code and instructions for replicating Reward Modeling training in HelpSteer2 and HelpSteer2-Preference
Implement REINFORCE algorithm.

Breaking Changes

Upgrade TRTLLM dependency from v0.10.0 to v0.12.0 and migrate from GPTSession cpp runtime to ModelRunner python runtime. Please use the latest Dockerfile.
Using latest TransformerEngine versions may require ++model.dist_ckpt_load_strictness=log_all when loading from a older pre-existing checkpoint to not error out.
NeMo-Aligner now requires Megatron-LM==0.9.0 for the APIs to calculate the microbatch sizes (API introduced megatron.core.num_microbatches_calculator.reconfigure_num_microbatch_calculator).
NeMo-Aligner now requires a version of NeMo with this change to how the MoE spec is handled: NVIDIA/NeMo#9035 .

Bug Fixes

It is now required, for stability, to add export NCCL_ALGO=... to scripts launching PPO training loop. Please see the RLHF docs for information.

Deprecation Notices

SyncedTimer is marked for deprecation and will be removed in 0.7.0. Please switch to ScopedTimer
broadcast_2d_tensor and broadcast_2d_tensor_within_pp is marked for deprecation and will be removed in 0.7.0. Please switch to broadcast_tensor and broadcast_tensor_within_pp.

Assets 2

20 Dec 22:58

ko3n1g

v0.6.0rc1

6721e72

NVIDIA NeMo-Aligner 0.6.0rc1.dev0 Pre-release

Pre-release

Prerelease: NVIDIA NeMo-Aligner 0.6.0rc1.dev0 (2024-12-20)'

Assets 2

14 Dec 00:27

pablo-garay

v0.6.0rc0

3791aad

v0.6.0rc0: fix: fix DPO sequence packing + pipeline parallel (#437) Pre-release

Pre-release

Signed-off-by: ashors1 <ashors@nvidia.com>

Assets 2

15 Nov 00:01

ko3n1g

v0.5.0

660a3ad

NVIDIA NeMo-Aligner 0.5.0

New Features and Optimizations

Implement Kahneman-Tversky Optimization (KTO).
Sequence packing is now supported when running SFT with SFTChatDataset.

Breaking Changes

Bug Fixes

Change log_prob_forward_micro_batch_size in DPO to mean the same as the micro_batch_size, which is how many samples(chosen and rejected included) that we process at once.

Assets 2

23 Sep 16:18

ko3n1g

v0.4.0

59f8d16

NVIDIA NeMo-Aligner 0.4.0

Implement reward-aware preference optimization.
Added TRT-LLM support in PPO. This can be enabled by doing trainer.ppo.trt_llm.enable=True. There is also a reshard option to reshard out pipeline parallelism during inference for further speedup via trainer.ppo.trt_llm.reshard=True.
PPO algorithm will now detect if the sample sequence is ended, and if so zero out the gradient of the samples that did not stop properly.
Added critic warmup to the PPO with the flag trainer.ppo.critic_warmup_steps.

New Features and Optimizations

Critic and Reward Model server refactored. Now the reward model will have a flag called model.forward_micro_batch_size which determines the micro batch size on which it runs inferences. This can be higher than the training micro batch size since during inference, we have less memory pressure.
In the critic and reward model server, it is now possible to specify inference_micro_batch_size as a list. This allows us to provide more information to PyTriton regarding the preferred batch sizes for inference.
It is no longer a requirement to specify num_rollout_samples to be a multiple of inference_micro_batch_size * dp size in PPO.

Breaking Changes

inference.micro_batch_size is now renamed to inference.inference_micro_batch_size when running reward model inference in inference_rm.yaml. This is to stay consistent with the naming scheme of the PPO critic.
It is no longer possible to specify add_EOS when running reward model or critic inference.
NeMo-Aligner now requires Megatron-LM>=0.8.0 for the APIs to calculate the microbatch sizes.

Bug Fixes

Make num_workers for dataloaders 0 by default. This prevents issues when using MPI (with TRT-LLM) or more sophisticated launchers.

Assets 2

03 Jun 20:20

gshennvm

v0.3.1

18cc0fa

NVIDIA NeMo-Aligner v0.3.1

SPIN: added rollout_micro_batch_size parameter which allows users to set the batch size for doing generation during SPIN training.
previously the generation batch size was automatically set to the data parallel size (DP) of the model

New features and optimizations

Add MoE Support for our reward models.
SFT/SteerLM: LoRA can now be enabled on all model layers
DPO: Enable LoRA on all model layers (In this case the actor will be reference model + LoRA weights, we can switch between actor/reference model by enabling/disabling LoRA)
PPO: Enable LoRA on all model layers (In this case the actor will be init policy + LoRA weights, we can switch between actor/init_policy model by enabling/disabling LoRA)

Breaking changes

Bug Fixes

Fixed issue where random sampler keeps state when resetting for validation, leading to a different validation batch each validation step. Fixed by using a deterministic sampler
Fixed crash with float val check interval in DPOTrainer
Fixed crash with float val check interval when checking progress in DPOTrainer
Fixed potential crash in SPIN when prompts are longer than encoder_seq_len - generation.max_length
Fixed crash when calling the generate() method of an SFT model with pipeline parallelism greater than two
Fixed crash when calling the generate() method of an SFT model with compute_logprob=True and string inputs
Fixed crash when model.micro_batch_size > 1 in DPO
Fixed issue when model.encoder_seq_length is mismatched with model.data.train_ds.max_seq_length in SFT and SPIN.
Delete MegatronPretrainingRandomSampler from Aligner since it has been upstreamed into NeMo

Container

docker pull nvcr.io/nvidia/nemo:24.05

To get access:

Sign up to get free and immediate access to NVIDIA NeMo Framework container. If you don’t have an NVIDIA NGC account, you will be prompted to sign up for an account before proceeding.
If you don’t have an NVIDIA NGC API key, sign into NVIDIA NGC, selecting organization/team: ea-bignlp/ga-participants and click Generate API key. Save this key for the next step. Else, skip this step.
On your machine, docker login to nvcr.io using

docker login nvcr.io
Username: $oauthtoken
Password: <Your Saved NGC API Key>

PyPi

https://pypi.org/project/nemo-aligner/0.3.1/

Assets 2

13 Mar 23:17

gshennvm

v0.2.0

3b8b70e

NVIDIA NeMo-Aligner v0.2.0

New features and optimizations

Added public-facing official Dockerfile for NeMo-Aligner.
PPO: memory optimization to help avoid OOM in the actor when sending training data to the critic.
PPO: it is now possible to use a custom end string in sampling_params.end_strings that is different from <extra_id_1>.
SFT: added support for custom validation metrics based on model generations.
Added the ability to do multi-epoch (cfg.max_epochs > 1) training for reward models, DPO, PPO, and SFT
SFT/SteerLM: added LoRA tuning as an option besides full fine-tuning, only attention_qkv layer is supported

Breaking changes

We have changed the shuffle logic in the data sampler to support multi-epoch training, so training runs using identical parameters
will not give the same results anymore because the shuffle logic has changed (specifically the seed value is modified slightly per epoch).
If you run CI/regression type tests, then be warned that the test may break due to this shuffle change.

Bug Fixes

Fixed a potential issue when the base model's model.data.data_prefix config is a list and is about to be overridden with
a dictionary from the training configuration.
exp_manager.max_time_per_run is now respected, the trainers will save and run validation before exiting if we've reached the time limit.
Fixed crash in PPO when using a separate reward model server (i.e., with combine_rm_and_critic_server=False).
Fixed crash when LR scheduler is not specified

Container

docker pull nvcr.io/nvidia/nemo:24.01.framework

To get access:

Sign up to get free and immediate access to NVIDIA NeMo Framework container. If you don’t have an NVIDIA NGC account, you will be prompted to sign up for an account before proceeding.
If you don’t have an NVIDIA NGC API key, sign into NVIDIA NGC, selecting organization/team: ea-bignlp/ga-participants and click Generate API key. Save this key for the next step. Else, skip this step.
On your machine, docker login to nvcr.io using

docker login nvcr.io
Username: $oauthtoken
Password: <Your Saved NGC API Key>

PyPi

https://pypi.org/project/nemo-aligner/0.2.0/

Assets 2

06 Dec 17:59

gshennvm

v0.1.0

b6ce38e

NVIDIA NeMo-Aligner v0.1.0

Highlights

First open source release of NeMo-Aligner. Featuring:

Support for the full Reinforcement Learning from Human Feedback(RLHF) pipeline including SFT, Reward Model Training and Reinforcement Learning
Support for the SteerLM technique
Support for Direct Preference Optimization
Support for all Megatron Core GPT models such as LLAMA2 70B

Container

docker pull nvcr.io/ea-bignlp/ga-participants/nemofw-training:23.11

To get access:

Sign up to get free and immediate access to NVIDIA NeMo Framework container. If you don’t have an NVIDIA NGC account, you will be prompted to sign up for an account before proceeding.
If you don’t have an NVIDIA NGC API key, sign into NVIDIA NGC, selecting organization/team: ea-bignlp/ga-participants and click Generate API key. Save this key for the next step. Else, skip this step.
On your machine, docker login to nvcr.io using

docker login nvcr.io
Username: $oauthtoken
Password: <Your Saved NGC API Key>

PyPi

https://pypi.org/project/nemo-aligner/0.1.0/

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Features and Optimizations

Breaking Changes

Bug Fixes

Deprecation Notices

New Features and Optimizations

Breaking Changes

Bug Fixes

New Features and Optimizations

Breaking Changes

Bug Fixes

New features and optimizations

Breaking changes

Bug Fixes

Container

PyPi

New features and optimizations

Breaking changes

Bug Fixes

Container

PyPi

Highlights

Container

PyPi

Releases: NVIDIA/NeMo-Aligner

NVIDIA NeMo-Aligner 0.6.0

New Features and Optimizations

Breaking Changes

Bug Fixes

Deprecation Notices

NVIDIA NeMo-Aligner 0.6.0rc1.dev0

v0.6.0rc0: fix: fix DPO sequence packing + pipeline parallel (#437)

NVIDIA NeMo-Aligner 0.5.0

New Features and Optimizations

Breaking Changes

Bug Fixes

NVIDIA NeMo-Aligner 0.4.0

New Features and Optimizations

Breaking Changes

Bug Fixes

NVIDIA NeMo-Aligner v0.3.1

New features and optimizations

Breaking changes

Bug Fixes

Container

PyPi

NVIDIA NeMo-Aligner v0.2.0

New features and optimizations

Breaking changes

Bug Fixes

Container

PyPi

NVIDIA NeMo-Aligner v0.1.0

Highlights

Container

PyPi