Zhwang/debug tp8 #34

sfc-gh-zhwang · 2023-10-31T06:00:19Z

No description provided.

* Update beam_search_topk_kernels.cu fix: fix bug of beam search * fix: change int of some kernels to int64_t to prevent overflow * fix: gpt tensor shapes inconsistency (NVIDIA#505) Signed-off-by: AkiyamaYummy <842720660@qq.com> * Update gpt_guide.md (NVIDIA#529) * fix: fix bug of gpt buffer and gpt gemm overflow * Update T5DecodingWeight.cc fix: fix loading bug of t5 * [Enhancement]add pytorch backend support for gptneox (NVIDIA#550) * add pytorch backend support for gptneox Signed-off-by: AkiyamaYummy <842720660@qq.com> * fix early stopping invalid * 1) Some unused parameters and logic have been removed. 2) Revisions that would affect pipeline parallelism have been reverted. 3) The code has been made capable of direct validation on TabbyML/NeoX-1.3B. Signed-off-by: AkiyamaYummy <842720660@qq.com> * Change the names of classes, removing 'parallel' from their names Signed-off-by: AkiyamaYummy <842720660@qq.com> * Format the code. Signed-off-by: AkiyamaYummy <842720660@qq.com> * Only print results when rank is 0. Signed-off-by: AkiyamaYummy <842720660@qq.com> * Add dist.init_process_group(). Signed-off-by: AkiyamaYummy <842720660@qq.com> * update docs Signed-off-by: AkiyamaYummy <842720660@qq.com> --------- Signed-off-by: AkiyamaYummy <842720660@qq.com> * Update cublasMMWrapper.cc Fix the CUBLAS_VERSION checking of cublasMMWrapper * Update cublasMMWrapper.cc * fix overflow in softmax_kernel when process long seqlen and big batch_size (NVIDIA#524) * Update unfused_attention_kernels.cu fix bug of softmax kernel * [Enhancement]create huggingface_gptneox_convert.py (NVIDIA#569) * create huggingface_gptneox_convert.py Signed-off-by: AkiyamaYummy <842720660@qq.com> * adjust HF's multi bin files Signed-off-by: AkiyamaYummy <842720660@qq.com> * update gptneox_guide.md Signed-off-by: AkiyamaYummy <842720660@qq.com> --------- Signed-off-by: AkiyamaYummy <842720660@qq.com> * perf(bloom): improve performance of huggingface_bloom_convert.py, decrease the time cost and the mem using (NVIDIA#568) Co-authored-by: r.yang <r.yang@tianrang-inc.com> * Fix/gpt early stop (NVIDIA#584) * fix: fix bug of early stopping of gpt * [bugfix] Fix 2-shot All Reduce correctness issue (indexing bug). (NVIDIA#672) FasterTransformer 2-shot all reduce is implemented as a reduce-scatter + all-gather. There is an indexing bug in the all-gather step. Prior to this change, 2-shot all reduce was only producing correct results on device 0. Now, all devices have the correct results. * fix: swap tensor bug (NVIDIA#683) * Support size_per_head=112 (NVIDIA#660) * fix multi-gpu build * add support for size_per_head=112 for gpt decoder * remove mpi_cxx from multi-gpu build for now (NVIDIA#705) --------- Signed-off-by: AkiyamaYummy <842720660@qq.com> Co-authored-by: byshiue <bhsueh@nvidia.com> Co-authored-by: _yummy_ <842720660@qq.com> Co-authored-by: Ying Sheng <sqy1415@gmail.com> Co-authored-by: zhangxin81 <115389973+zhangxin81@users.noreply.github.com> Co-authored-by: 杨睿 <595403043@qq.com> Co-authored-by: r.yang <r.yang@tianrang-inc.com> Co-authored-by: Rahul Kindi <rkindi@users.noreply.github.com> Co-authored-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> Co-authored-by: Daya Khudia <37562707+dskhudia@users.noreply.github.com> Co-authored-by: Dean Wyatte <2512762+dwyatte@users.noreply.github.com>

…ug-tp8

sfc-gh-ashankar and others added 30 commits July 10, 2023 19:24

commit

da9ef99

commit

7f1e8bf

commit

bb6fce4

commit

49c94e8

commit

c510c26

commit

8933482

commit

98ab7df

commit

626287a

commit

787c1c5

commit

06e941b

commit

ce8272a

commit

c6f2543

commit

e1f2a76

commit

441c343

commit

3cf5490

commit

38919b6

commit

4c0dbba

commit

b6945af

commit

728f890

commit

8aeb13a

commit

ffd2f96

commit

18eb7b4

commit

28cba07

commit

165704c

commit

d792097

commit

f2534be

commit

67aa284

commit

b415055

commit

2ecae5e

sfc-gh-zhwang added 30 commits October 30, 2023 22:05

commit

4973c9b

commit

caf7c00

commit

f7af369

commit

4545f61

commit

376110d

commit

74df027

commit

eaa0a17

commit

580a796

commit

9255f7c

commit

debacbd

commit

62e4177

commit

4f14e32

commit

34b48e8

commit

596f6d9

commit

5772f09

commit

96ccec9

commit

04f5ab2

commit

c79afa9

commit

dbd5287

commit

599e8da

commit

59f2c93

commit

f330f2e

commit

3ef5d24

commit

8e57eb5

commit

3e50243

commit

87cfd58

commit

09b5f45

commit

407a868

Merge branch 'corvo', remote-tracking branch 'origin' into zhwang/deb…

6451b5f

…ug-tp8

commit

2d7be1a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zhwang/debug tp8 #34

Zhwang/debug tp8 #34

sfc-gh-zhwang commented Oct 31, 2023

Zhwang/debug tp8 #34

Are you sure you want to change the base?

Zhwang/debug tp8 #34

Conversation

sfc-gh-zhwang commented Oct 31, 2023