Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FastTransformer v3.0/Pytorch] FasterTransformer v3.0 decoding doesn't work with small vocab_size #746

Closed
upczww opened this issue Nov 6, 2020 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@upczww
Copy link

upczww commented Nov 6, 2020

Related to Model/Framework(s)
FasterTransformer/v3.0

Describe the bug
Runed FasterTransformer decoding under FP32 on PyTorch with:

./bin/decoding_gemm 8 4 8 64 3153 32 512 0
python pytorch/decoding_sample.py 8 6 32 8 64 4 3153 --time

where the vocab_size is 3153 instead of the original 31538, it raised error:

=============== Argument ===============
batch_size: 8
layer_num: 6
seq_len: 32
head_num: 8
head_size: 64
hidden_dim: 512
beam_size: 4
vocab_size: 3153
use_pretrained: False
use_fp16: False
TorchScript mode: False
test_time: True
========================================

tensor([[[1910, 1692, 1692,  ..., 1692, 1692, 1692],
         [2027, 1692, 1692,  ..., 1692, 1692, 1692],
         [1910, 1692, 1692,  ..., 1692, 1692, 2803],
         [1910, 1692, 1692,  ..., 1692, 2803, 2027]],

        [[2021,  154, 2021,  ..., 2794,  154, 2794],
         [2021,  154, 2021,  ..., 2794, 1892,  814],
         [2021,  154, 2021,  ..., 2794, 1892, 1892],
         [2021,  154, 2021,  ..., 2794,  814, 2794]],

        [[ 356, 2803, 2803,  ..., 2803, 2803, 2803],
         [2021, 2794, 2803,  ..., 2803, 2803, 2803],
         [2021, 2794, 2803,  ..., 2803, 2803, 2803],
         [2021, 2794, 2803,  ..., 2803, 2803, 2803]],

        ...,

        [[2696, 2027, 1782,  ..., 2696, 2027, 2696],
         [2696, 2027, 1782,  ..., 2696, 1779, 2696],
         [2696, 2027, 1782,  ..., 2696, 2696, 2027],
         [2696, 2027, 1782,  ..., 2696, 2027, 1146]],

        [[2794, 2794, 2794,  ..., 2794, 2794, 2794],
         [2794, 2794, 2794,  ..., 2794, 2794, 1146],
         [1910, 2794, 2794,  ..., 2794, 2794, 2794],
         [2794, 2794, 2794,  ..., 2794, 1146, 2794]],

        [[1910, 1910, 1910,  ..., 1910, 1910, 1910],
         [2803, 1910, 1910,  ..., 1910, 1910, 1910],
         [1910, 1910, 1910,  ..., 1910, 1910, 2027],
         [1910, 1910, 1910,  ..., 1910, 1910,  814]]], device='cuda:0',
       dtype=torch.int32)
tensor([[32, 32, 32, 32],
        [32, 32, 32, 32],
        [32, 32, 32, 32],
        [32, 32, 32, 32],
        [32, 32, 32, 32],
        [32, 32, 32, 32],
        [32, 32, 32, 32],
        [32, 32, 32, 32]], device='cuda:0')
tensor([[[1910, 1692, 1692,  ..., 1692, 1692, 1692],
         [2027, 1692, 1692,  ..., 1692, 1692, 1692],
         [1910, 1692, 1692,  ..., 1692, 1692, 2803],
         [1910, 1692, 1692,  ..., 1692, 2803, 2027]],

        [[2021,  154, 2021,  ..., 2794,  154, 2794],
         [2021,  154, 2021,  ..., 2794, 1892,  814],
         [2021,  154, 2021,  ..., 2794, 1892, 1892],
         [2021,  154, 2021,  ..., 2794,  814, 2794]],

        [[ 356, 2803, 2803,  ..., 2803, 2803, 2803],
         [2021, 2794, 2803,  ..., 2803, 2803, 2803],
         [2021, 2794, 2803,  ..., 2803, 2803, 2803],
         [2021, 2794, 2803,  ..., 2803, 2803, 2803]],

        ...,

        [[2696, 2027, 1782,  ..., 2696, 2027, 2696],
         [2696, 2027, 1782,  ..., 2696, 1779, 2696],
         [2696, 2027, 1782,  ..., 2696, 2696, 2027],
         [2696, 2027, 1782,  ..., 2696, 2027, 1146]],

        [[2794, 2794, 2794,  ..., 2794, 2794, 2794],
         [2794, 2794, 2794,  ..., 2794, 2794, 1146],
         [1910, 2794, 2794,  ..., 2794, 2794, 2794],
         [2794, 2794, 2794,  ..., 2794, 1146, 2794]],

        [[1910, 1910, 1910,  ..., 1910, 1910, 1910],
         [2803, 1910, 1910,  ..., 1910, 1910, 1910],
         [1910, 1910, 1910,  ..., 1910, 1910, 2027],
         [1910, 1910, 1910,  ..., 1910, 1910,  814]]], device='cuda:0',
       dtype=torch.int32)
tensor([[32, 32, 32, 32],
        [32, 32, 32, 32],
        [32, 32, 32, 32],
        [32, 32, 32, 32],
        [32, 32, 32, 32],
        [32, 32, 32, 32],
        [32, 32, 32, 32],
        [32, 32, 32, 32]], device='cuda:0')

Traceback (most recent call last):
  File "pytorch/decoding_sample.py", line 167, in <module>
    main()
  File "pytorch/decoding_sample.py", line 131, in main
    output2, lens2 = custom_decoding(args.batch_size, args.beam_size, args.seq_len, mem, mem_seq_lens)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/work/FasterTransformer/build/pytorch/utils/decoding.py", line 473, in forward
    output_ids, parent_ids, out_seq_lens = self.decoding.forward(batch_size, beam_size, max_seq_len, extended_memory, extended_memory_seq_lens)
RuntimeError: [FT][ERROR] CUDA runtime error: CUBLAS_STATUS_EXECUTION_FAILED /work/FasterTransformer/fastertransformer/cuda/open_decoder.cu:838

To Reproduce
Steps to reproduce the behavior:
1.build with Pytorch image nvcr.io/nvidia/pytorch:20.03-py3

mkdir build
cd build
cmake -DSM=60 -DCMAKE_BUILD_TYPE=Release -DBUILD_THE=ON -DBUILD_THS=ON -DBUILD_THSOP=ON -DCXX_STD=14 ..
make

2.install opennmt-py:

pip install opennmt-py==1.1.1

3.generate GEMM config:

./bin/decoding_gemm 8 4 8 64 3153 32 512 0

4.run decoding_sample

python pytorch/decoding_sample.py 8 6 32 8 64 4 3153 --time

Expected behavior
The decoding sample worked with vocab_size=31538 as the decoding demos,when I decreased the vocab_size, it raised error.

Environment
Please provide at least:

  • Container version (e.g. pytorch:19.05-py3): nvcr.io/nvidia/pytorch:20.03-py3
  • GPUs in the system: (e.g. 8x Tesla V100-SXM2-16GB): 4x Tesla P40 24G
  • CUDA driver version (e.g. 418.67): 440.64
  • HOST CUDA version: 10.2
@upczww upczww added the bug Something isn't working label Nov 6, 2020
@upczww upczww changed the title [Model/Framework] FasterTransformer v3.0 decoding doesn't work with small vocab_size [FastTransformer v3.0/Pytorch] FasterTransformer v3.0 decoding doesn't work with small vocab_size Nov 6, 2020
@byshiue byshiue self-assigned this Nov 6, 2020
@byshiue
Copy link
Collaborator

byshiue commented Nov 9, 2020

Thanks for your feedback. This bug is fixed in #747.

@byshiue byshiue closed this as completed Nov 9, 2020
changlan pushed a commit to changlan/DeepLearningExamples that referenced this issue Apr 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants