replacing apex.normalization.FusedLayerNorm with torch.nn.LayerNorm #9377

stas00 · 2021-01-02T06:13:21Z

It seems that time has arrived to drop apex.normalization.FusedLayerNorm in favor of torch.nn.LayerNorm

the latter was ported more than a year ago from apex Add fused layer norm impl on CUDA in PyTorch pytorch/pytorch#27634 (around pt-1.4)
it's faster than the apex according to my benchmarks Builtin FusedLayerNorm is slower than apex one pytorch/pytorch#37713 (comment) (33% faster on rtx-3090!, 10% faster on gtx-1070)

but note: this same benchmark run here facebookresearch/fairseq#2012 (comment) on V100 reports the opposite - that the native is slower (pt-1.5). So it might help to run this very quick benchmark on other cards and compare. In particular if you have access to V100 please report back the findings at this thread: pytorch/pytorch#37713

The main reason for this need is that apex.normalization.FusedLayerNorm is buggy (corrupts memory) when it comes to switching devices, which is done a lot under Model Parallel. NVIDIA/apex#1022
With apex.normalization.FusedLayerNorm things fail a lot under MP and requires sticking torch.cuda.set_device(id) in many many places as a workaround :( Since this overload is used at model's init time it's not possible to not use it under MP as the latter gets activate after model's init.

I will use that workaround if you find out that apex is faster still on some important-to-consider hardware. And, of course, in that case please report back to the pytorch team so that they could fix it. Otherwise apex support is pretty much no more and it's just a matter of time before apex will be unusable.

The models that need that change are bart/fsmt/prophetnet.

@patrickvonplaten, @LysandreJik

The text was updated successfully, but these errors were encountered:

patrickvonplaten · 2021-01-02T11:37:25Z

I'm good with changing to torch.nn.LayerNorm. At @stas00 - do you know what the advantage of apex.normalization.FusedLayerNorm is supposed to be? Why did we add apex.normalization.FusedLayerNorm in the first place?

stas00 · 2021-01-02T17:25:54Z

Prior to about a year ago, apex.normalization.FusedLayerNorm was faster than torch.nn.LayerNorm, but then the former got ported to native torch.nn.LayerNorm, and now the native appears to be faster - at least the 2 cards I have experimented with. I checked with pt-1.4 .. pt-1.8.dev

If you have other than gtx-1070/rtx-3090 cards which I benchmarked with please run that benchmark and see if it stands true for other cards: pytorch/pytorch#37713 (comment)
it only takes a few seconds if you have apex installed already. To install apex:

git clone https://github.com/NVIDIA/apex
cd apex
rm -rf build
pip install  --global-option="--cpp_ext" --global-option="--cuda_ext"  .

The benchmark measures and reports a total run time, so the smaller the numbers the faster it is.

If you do run the benchmarks please post your results at pytorch/pytorch#37713 so that it can be seen whether it's safe to drop apex.normalization.FusedLayerNorm based on hard data and not anecdotal info.

Thank you.

This was referenced Jan 2, 2021

[model parallelism] Bart goes parallel #9384

Closed

replace apex.normalization.FusedLayerNorm with torch.nn.LayerNorm #9386

Merged

stas00 changed the title ~~removing apex.normalization.FusedLayerNorm in favor of faster torch.nn.LayerNorm~~ replacing apex.normalization.FusedLayerNorm with torch.nn.LayerNorm Jan 2, 2021

patrickvonplaten closed this as completed in #9386 Jan 4, 2021

aleSuglia mentioned this issue Jul 19, 2021

Docker and ease of debugging VALUE-Leaderboard/StarterCode#3

Open

stas00 mentioned this issue Oct 14, 2021

is fused layernorm really better? bigscience-workshop/Megatron-DeepSpeed#136

Open

sdtblck mentioned this issue Oct 17, 2021

Remove Fused layernorm kernel EleutherAI/gpt-neox#428

Merged

fhieber added a commit to awslabs/sockeye that referenced this issue Nov 8, 2021

remove Apex LayerNorm because of huggingface/transformers#9377

b064b03

patrickvonplaten mentioned this issue Sep 21, 2022

Add Jukebox model (replaces #16875) #17826

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

replacing apex.normalization.FusedLayerNorm with torch.nn.LayerNorm #9377

replacing apex.normalization.FusedLayerNorm with torch.nn.LayerNorm #9377

stas00 commented Jan 2, 2021 •

edited

Loading

patrickvonplaten commented Jan 2, 2021

stas00 commented Jan 2, 2021 •

edited

Loading

replacing apex.normalization.FusedLayerNorm with torch.nn.LayerNorm #9377

replacing apex.normalization.FusedLayerNorm with torch.nn.LayerNorm #9377

Comments

stas00 commented Jan 2, 2021 • edited Loading

patrickvonplaten commented Jan 2, 2021

stas00 commented Jan 2, 2021 • edited Loading

stas00 commented Jan 2, 2021 •

edited

Loading

stas00 commented Jan 2, 2021 •

edited

Loading