Releases: microsoft/Tutel
Releases · microsoft/Tutel
Tutel v0.3.2
What's New in v0.3.2:
- Add
--use_tensorcore
option for benchmarking in tutel.examples.helloworld. - Read
TUTEL_GLOBAL_TIMEOUT_SEC
from environment variable to configure NCCL timeout setting. - Extend
tutel.examples.helloworld_custom_expert
to explain the way to override MoE with customized expert layers.
How to Setup:
python3 -m pip install -v -U --no-build-isolation https://github.com/microsoft/tutel/archive/refs/tags/v0.3.2.tar.gz
Tutel v0.3.1
What's New in v0.3.1:
- Enable 2 additional collective communication primitives: net.batch_all_to_all_v(), net.batch_all_gather_v().
How to Setup:
python3 -m pip install -v -U --no-build-isolation https://github.com/microsoft/tutel/archive/refs/tags/v0.3.1.tar.gz
Tutel v0.3.0
What's New in v0.3.0:
- Support Megablocks-style dMoE inference (see README.md for more information)
How to Setup:
python3 -m pip install -v -U --no-build-isolation https://github.com/microsoft/tutel/archive/refs/tags/v0.3.0.tar.gz
Tutel v0.2.1
What's New in v0.2.1:
- Support Switchable Parallelism with example
tutel.examples.helloworld_switch
.
How to Setup:
python3 -m pip install --user https://github.com/microsoft/tutel/archive/refs/tags/v0.2.1.tar.gz
Tutel v0.2.0
What's New in v0.2.0:
- Support Windows Python3 + Torch Installation;
- Add examples to enable Tutel MoE in Fairseq;
- Refactor MoE Layer implementation, letting all features (e.g. top-X, overlap, parallel_type, capacity, ..) be able to change at different forward interations;
- New features: load_importance_loss, cosine router, inequivalent_tokens;
- Extend capacity_factor value that includes zero value and negative values for smarter capacity estimation;
- Add tutel.checkpoint conversion tools to reformat checkpoint files, making it able to use existing checkpoints to train/infer with a different world size.
How to Setup:
python3 -m pip install --user https://github.com/microsoft/tutel/archive/refs/tags/v0.2.0.tar.gz
Tutel v0.1.5
What's New in v0.1.5:
- Add 2D hierarchical a2a algorithm used for extremely-large scaling;
- Support different parallel_type for MoE computation: data, model, auto;
- Combine different expert granularities (e.g. normal, sharded experts, megatron dense ffn) into same programming interface & style;
- New features: is_postscore to specify whether gating scores are weighed during encoding or decoding;
- Enhance existing features: JIT compiler, a2a overlap with 2D.
How to Setup:
python3 -m pip install --user https://github.com/microsoft/tutel/archive/refs/tags/v0.1.5.tar.gz
Contributors: @abuccts, @yzygitzh, @ghostplant, @EricWangCN
Tutel v0.1.4
What's New in v0.1.4:
- Enhance communication features: a2a overlap with computation, support different granularity of group creation, etc.
- Add single-thread CPU implementation for correctness check & reference;
- Refine JIT compiler interface for flexible usability: jit::inject_source && jit::jit_execute;
- Enhance examples: fp64 support, cuda amp, checkpointing, etc.
- Support execution inside torch.distributed.pipeline.
How to Setup:
python3 -m pip install --user https://github.com/microsoft/tutel/archive/refs/tags/v0.1.4.tar.gz
Contributors: @yzygitzh, @ghostplant, @EricWangCN
Tutel v0.1.3
What's New in v0.1.3:
- Add Tutel Launcher Support based on Open MPI;
- Support Establishing Data Model Parallel in Initialization;
- Support Single Expert Evenly Sharded on Multiple GPUs;
- Support List of Gates and Forwarding MoE Layer with Specified Gating Index;
- Fix NVRTC Compatibility when Enabling
USE_NVRTC=1
; - Other Implementation Enhancements & Correctness Checking;
How to Setup:
python3 -m pip install --user https://github.com/microsoft/tutel/archive/refs/tags/v0.1.3.tar.gz
Contributors: @ghostplant, @EricWangCN, @guoshzhao.
Tutel v0.1.2
What's New in v0.1.2:
- General-purpose top-k gating with
{'type': 'top', 'k': 2}
; - Add Megatron-ML Tensor Parallel as gating type;
- Add deepspeed-based & megatron-based helloworld example for fair comparison;
- Add torch.bfloat16 datatype support for single-GPU;
How to Setup:
python3 -m pip install --user https://github.com/microsoft/tutel/archive/refs/tags/v0.1.2.tar.gz
Contributors: @ghostplant, @EricWangCN, @foreveronehundred.
Tutel v0.1.1
What's New in v0.1.1:
- Enable fp16 support for AMDGPU.
- Using NVRTC for JIT compilation if available.
- Add new system_init interface for initializing NUMA settings in distributed GPUs.
- Extend more gating types: Top3Gate & Top4Gate.
- Allow high level to change capacity value in Tutel fast dispatcher.
- Add custom AllToAll extension for old Pytorch version without builtin AllToAll operator support.
How to Setup:
python3 -m pip install --user https://github.com/microsoft/tutel/archive/refs/tags/v0.1.1.tar.gz
Contributors: @jspark1105 , @ngoyal2707 , @guoshzhao, @ghostplant .