Releases · awslabs/slapo

23 Mar 19:05

chhzh123

v0.0.3

1e57052

v0.0.3 Latest

Latest

This release mainly improves

Fix some fidelity issues.
Refactor schedule primitives, and add .fork_rng(), .annotate(), and .replace_all() primitives.
Other bug fixing.

If any of the following cases match your existing schedule based on v0.0.2, you have to change them to support v0.0.3.

Tagging parameters for DeepSpeed pipeline runtime to perform an additional all-reduce on TP group. For example, you may have the following code snippet that tags LayerNorm parameters:

def tag_layernorm(sch):
    for m in sch.mod.modules():
        if isinstance(m, nn.LayerNorm):
            for p in m.parameters(recurse=False):
                p.replicated_param = True

This can be changed to the following in v0.0.3:

def annotate_layernorm_and_bias(sch):
    for sub_sch in sch.child.values():
        if isinstance(sub_sch.mod, nn.LayerNorm):
            for name, _ in sub_sch.mod.named_parameters(recurse=False):
                sub_sch.annotate(name, "replicated_param", True)
        if issubclass(sub_sch.mod.__class__, LinearWithSyncFunc):
            sub_sch.annotate("bias", "replicated_param", True)
        annotate_layernorm_and_bias(sub_sch)

Reference: https://github.com/awslabs/slapo/blob/main/slapo/model_schedule/gpt2.py#L529

RNG control can be done easily with a new introduced schedule primitive .fork_rng(). Accordingly, the old slapo.op.AttentionOpWithRNG is removed. If you have the following code snippet:

new_op = AttentionOpWithRNG(
	sub_sch["module"]["attn_op"].mod.attn_op_name,
    sub_sch["module"]["attn_op"].mod.apply_causal_mask,
    sub_sch["module"]["attn_op"].mod.scale,
)
sub_sch["module"]["attn_op"].replace(new_op)

It has to be changed to

sub_sch["module"]["attn_op"].fork_rng()

The primitive .trace_for_pipeline() has been renamed to .trace_until(). Since the arguments remain the same, you could simply replace all occurrences.
If you use slapo.op.FusedMLP with sharding, you need to change your schedule to reflect the change of FusedMLP implementation. For example:

fc_names = ["fc_in", "act", "fc_out"]
sub_sch[fc_names[0]].shard("weight", axis=0)
sub_sch[fc_names[1]].shard("bias", axis=0)
sub_sch[fc_names[2]].shard("weight", axis=1)
sub_sch[fc_names[0]].sync(mode="bwd_post", sync_op_or_fn="all_reduce")
sub_sch[fc_names[2]].sync(mode="fwd_post", sync_op_or_fn="all_reduce")

changes to

fc_names = ["fc_in", "fc_out"]
sub_sch[fc_names[0]].shard("weight", axis=0)
sub_sch[fc_names[0]].shard("bias", axis=0)
sub_sch[fc_names[1]].shard("weight", axis=1)
sub_sch[fc_names[0]].sync(mode="bwd_post", sync_op_or_fn="all_reduce")
sub_sch[fc_names[1]].sync(mode="fwd_post", sync_op_or_fn="all_reduce")

What's Changed

[Action] Fix release flow by @comaniac in #69
[Refactor] Schedule primitives by @comaniac in #68
[Primitive] .fork_rng() by @comaniac in #70
[Primitive] .annotate() and .trace_until() by @comaniac in #71
[CI] Update CI rules for docs by @chhzh123 in #72
[Op] Fuse bias+dropout in FusedMLP by @comaniac in #73
[Refactor] Modulize sharding methods by @comaniac in #74
[CI] Quick fix by @chhzh123 in #75
[Primitive][fork_rng] Do not replace module by @comaniac in #76
[Bugfix] Include other custom LinearWithXX by @comaniac in #77
[Primitive] Add fallback fusion by @chhzh123 in #78
[examples] Refactor dataloader to support BERT by @chhzh123 in #79
[Bugfix] Shard embedding hooks by @comaniac in #80
[Version] Refactor version updating logic by @comaniac in #82
[Op] Print by @comaniac in #81
[Primitive] Add .replace_all() by @chhzh123 in #85
[Version] Update version to v0.0.3 by @chhzh123 in #84

Full Changelog: v0.0.2...v0.0.3

Contributors

comaniac and chhzh123

Assets 2

20 Feb 20:21

comaniac

v0.0.2

5ae9047

v0.0.2

This release mainly improves

More unit tests.
Add .fuse and related primitives.
Improve overall training efficiency of GPT models by adding sequence parallelism, tie weight supports, etc.
Documentation and tutorials.
Bug fixing.

What's Changed

[Release] Setup wheel and release scripts by @comaniac in #18
[Pipeline] Drop last batch in DeepSpeed scripts by @comaniac in #19
[Examples] Add disable_flash_attn by @chhzh123 in #22
[Bugfix] Fix sequence parallelism by @szhengac in #20
[Schedule][replace] Transfer hooks when replacing modules by @comaniac in #27
[Bugfix] Fix GPT script by @szhengac in #26
[Bugfix] Transfer hooks in pipeline modules by @comaniac in #28
[Tracer] Add flatten argument to .trace() by @chhzh123 in #29
[Benchmark] Fix ZeRO-3 step log by @comaniac in #31
[Bugfix] Fix for sharding TP only by @zarzen in #32
[Primitive][shard] Use autograd function for all sync ops by @comaniac in #33
[Bugfix] Using None for mpu when PP > 1 by @zarzen in #34
[Bugfix] Fix GPT script by @szhengac in #36
[Schedule] Refactor subgraph matching by @chhzh123 in #35
[Schedule] Add .fuse() primitive by @chhzh123 in #25
[Setup] Fix dependency by @chhzh123 in #39
[Random] Random state management by @comaniac in #38
[GPT] Use flash-attention and enable dropout by @comaniac in #40
[Op] Add attention and bias_gelu ops by @comaniac in #41
[Tracer] Remove SelfAttention renaming by @chhzh123 in #44
[Model] Add HuggingFace GPT-2 by @comaniac in #45
[Op] Refactor qkv processing by @comaniac in #46
Add num_workers to GPT dataloader by @szhengac in #48
[Op] Add flash-attention CUDA kernel by @comaniac in #49
[Bugfix] Fix tensor device by @szhengac in #50
[Example] Use .fuse() primitive when possible by @chhzh123 in #42
[Refactor] model_dialect -> framework_dialect by @comaniac in #51
[Test] Add default initialization test by @chhzh123 in #54
[Schedule] Create subschedule for subgraph replacement by @chhzh123 in #52
[Schedule] Support partial checkpointing by @chhzh123 in #55
[DeepSpeed] Support TP=nGPU and PP=DP=1 by @comaniac in #56
[Examples] Move examples to slapo.model_schedule by @chhzh123 in #53
[Bugfix] Support tree-like subgraph matching by @chhzh123 in #58
[Bugfix] Consolidate params with orig size by @comaniac in #59
[Bugfix] Fix a small device bug by @szhengac in #57
[README] Temporary remove paper info by @comaniac in #60
Add param_name to shard infer type and fix consolidate by @comaniac in #62
[Feature] Layernorm Tag by @szhengac in #61
[Docs] Add initial documentations by @chhzh123 in #63
Enable launch training with torchrun by @zarzen in #64
[Examples] Enable launch with torchrun by @comaniac in #65

New Contributors

@zarzen made their first contribution in #32

Full Changelog: v0.0.1...v0.0.2

Contributors

zarzen, szhengac, and 2 other contributors

Assets 2

25 Jan 18:40

comaniac

v0.0.1

36c0f46

First release of v0.0.1

What's Changed

[Lint] Fix almost all linting errors by @comaniac in #1
[CI] Setup CI by @comaniac in #3
[Lint] Fix rest linting errors by @comaniac in #2
[Bugfix] Fix batch size in slapo-deepspeed by @chhzh123 in #7
Fix transformers import order in megatron scripts by @szhengac in #5
[Pipeline] Tie weight analysis by @comaniac in #8
[Bugfix] fix initialization by @szhengac in #4
[Bugfix] Reproduce experimental results in docker image by @chhzh123 in #9
[Schedule] Support sequence parallelism by @comaniac in #6
[Test] Add end-to-end tests by @chhzh123 in #14
[Pipeline] Register tie weights by @comaniac in #15
[Bugfix] Fix schedule and dockerfile by @comaniac in #17
[Test] Add tracer unit tests by @chhzh123 in #16

New Contributors

@comaniac made their first contribution in #1
@chhzh123 made their first contribution in #7
@szhengac made their first contribution in #5

Full Changelog: https://github.com/awslabs/slapo/commits/v0.0.1

Contributors

szhengac, comaniac, and chhzh123

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

Releases: awslabs/slapo

v0.0.3

What's Changed

Contributors

v0.0.2

What's Changed

New Contributors

Contributors

First release of v0.0.1

What's Changed

New Contributors

Contributors