[RFC] TensorCore 8bit implementations #1

XapaJIaMnu · 2020-10-20T21:29:57Z

Thank you for your recent improvements to marian. Do you have any particular development goal with regards to marian?

I am asking this, since I have been working on 8bit GPU version with and without tensor cores, using CUTLASS and I would like to avoid duplicated efforts where possible. In particular the GPU code for the 8bit GEMM is located here: https://github.com/XapaJIaMnu/marian-dev/blob/8bitgpu/src/tensors/gpu/integer_tools.cu

Do you have an opinion on the use of CUTLASS vs CUBLAS for TensorCore operations? Any particular comments on the GPU code?

Part of this pull request has been pending for review @marian-dev master, and the rest will incrementally go in, hopefully.

Cheers,

Nick

…arian-nmt#749) * Add Triton Marian backend running in AzureML Inference Environment

…ith is causes a misaligned read when the bias is not a multiple of 8 (this occurs during slicing). This case is handled specially. This bug will be fixed in a subsequent release of cuda.

…l. Fixes buf in AddFactorMaxes

* This PR adds training of embedding spaces with better separation based on https://arxiv.org/abs/2007.01852 * We can now train with in-batch negative examples or a handful of hand-constructed negative examples provided in a tsv-file.

…nmt#759) * fix problem if the optimization step is set to 0 * set the first error residual to 0

- Updates sentencepiece to the newest version (removes dependency on protobuf) - Enable SentencePiece compilation by default since there is no dependency in protobuf anymore.

…search This PR changes the stopping criterion for mini-batch-fit binary search to better maximize batch size.

This updates the SentencePiece version in Marian to a much more recent revision. Due to that there is no dependency on Protobuf anymore.

…nmt#761)

…y default

* attempt to enable cutlass tensorcore with fp16: compilation OK * FP16 support for NodeOp (partial implementation) * switch to reinterpret_cast * add cutlass FP16 support from Nick * add cutlass FP16 support for DotNodeOp * set quant NodeOp type * more NodeOp changes for FP16 support * remove debugging info * some comments and aborts

This is done so that we can compile with newer cuda versions. It also adds some extra templates that are not used in this branch of marian, but it doesn't impede translation or compilation.

XapaJIaMnu added 30 commits February 5, 2020 12:37

Work around a compiler bug on gcc 5.4

643ad53

Remove commented-out code

9bea7d4

Update some comments and abort on bad matrix sizes.

b4ff680

Restore proper hashing for shortlists

179b34b

Remove an old comment

b7c742c

Merge branch 'master' of https://github.com/marian-nmt/marian-dev

12ad42e

fix the worse of the long term memory leakages

0a096d1

Avoid consturcting extra nodes when working from preprocessed model

3f1eeb5

Fix 16bit packed models

9959363

Merge with master

b4b78cc

Merge branch 'master' of https://github.com/marian-nmt/marian-dev

b2aac21

Fix indentation and add comments

65feffd

Merge branch 'master' into intgemm_reintegrated

1e529a1

Updated intgemm version

e002919

Merge branch 'master' into intgemm_reintegrated

5d3757f

initial shifted support. 0.1 BLEU

3741cc9

Fix GPU compilation and remove some debugging

c9f6da5

Fix memoization

9bb6d27

Add a shiftedAll option with FakeBias

f75e388

Dump matrix stats

ba57a41

Using static alphas

b75168d

Cached alphas working

eaadf3d

Prepare the output layer once

f5580bb

Use cblas_sgemm_batched by default

c1b9500

Merge branch 'master' of https://github.com/marian-nmt/marian-dev

5b7b479

Optionally compress Wemb

5dbe9d8

Switch the default to off

e55b168

Merge branch 'master' of https://github.com/marian-nmt/marian-dev

f1cc49e

Wrap the alphas in a node

2b9e69d

Use the correct size

5b908b9

delong-coder and others added 30 commits November 4, 2020 14:29

Add Triton Marian backend running in AzureML Inference Environment (m…

ca7a887

…arian-nmt#749) * Add Triton Marian backend running in AzureML Inference Environment

Merge branch 'master' into pmaster

3d233ec

Updates CUDA 11 affine support and SpMM support. CublasLt has a bug w…

e3871ce

…ith is causes a misaligned read when the bias is not a multiple of 8 (this occurs during slicing). This case is handled specially. This bug will be fixed in a subsequent release of cuda.

Adds guard for cuda 11 around spmm call

abdb037

Adds check to cublas and does reductions in AddFactorMaxes in paralle…

0ef2d0d

…l. Fixes buf in AddFactorMaxes

Changes alignment requirement back to 8

e25bd5f

Hotfix: crash when --quantize-optimization-steps is set to 0 (marian-…

65ea504

…nmt#759) * fix problem if the optimization step is set to 0 * set the first error residual to 0

Update sentencepiece to newest version (marian-nmt#753)

bbdccd1

- Updates sentencepiece to the newest version (removes dependency on protobuf) - Enable SentencePiece compilation by default since there is no dependency in protobuf anymore.

update regression test pointer

79b35af

Merged PR 16294: Change stopping criterion for mini-batch-fit binary …

b90229d

…search This PR changes the stopping criterion for mini-batch-fit binary search to better maximize batch size.

Merged PR 16337: Update sentencepiece to new version

9dad84a

This updates the SentencePiece version in Marian to a much more recent revision. Due to that there is no dependency on Protobuf anymore.

merge with internal master

99e0661

update version based on number of PRs from last time

a7268a3

Fix the allocation size for opt-step Tensor in the quantizer (marian-…

5990142

…nmt#761)

Fixes marian-nmt#763

69dc82c

Merge branch 'master' of github.com:marian-nmt/marian-dev into pmaster

c8d4457

Update Boost in GitHub checks to 1.72 (marian-nmt#764)

2637a45

Remove ::set-env from GitHub checks for Windows (marian-nmt#766)

3b468e4

Merge remote-tracking branch 'nvidia/gpu_optimizations' into 8bitgpu

07bda2b

Merge with upstream master and update cutlass

080f13e

RE-enable fused relu with the latest CUTLASS code

98f1811

Fix CUTLASS build flags

362d2c0

Enable the use of apple accelerate framework on macs and turn it on b…

3a8a82f

…y default

Ampere RTX support

b5015ed

Updated cutlass

80ee422

Add Sm80 support (not working yet)

bc417aa

SM80 fix for 3090tis

ba4e2c5

backport add_all.inc from upstream

2a96b28

This is done so that we can compile with newer cuda versions. It also adds some extra templates that are not used in this branch of marian, but it doesn't impede translation or compilation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] TensorCore 8bit implementations #1

[RFC] TensorCore 8bit implementations #1

XapaJIaMnu commented Oct 20, 2020 •

edited

Loading

[RFC] TensorCore 8bit implementations #1

Are you sure you want to change the base?

[RFC] TensorCore 8bit implementations #1

Conversation

XapaJIaMnu commented Oct 20, 2020 • edited Loading

XapaJIaMnu commented Oct 20, 2020 •

edited

Loading