Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nvcc Unsupported gpu architecture error #189

Open
rkoystart opened this issue Sep 27, 2021 · 4 comments
Open

nvcc Unsupported gpu architecture error #189

rkoystart opened this issue Sep 27, 2021 · 4 comments

Comments

@rkoystart
Copy link

Hi, i have installed , lightseq, fairseq, sacremoses using the following command

pip install lightseq==2.0.2
pip install fairseq
pip install sacremoses
conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=11.0 -c pytorch

After all these installation when i run

sh examples/training/fairseq/ls_fairseq_wmt14en2de.sh

I am getting the following error.

+ THIS_DIR=/data/rkoy/lightseq/lightseq/examples/training/fairseq                                                                                                                           [271/1821]
+ cd /data/rkoy/lightseq/lightseq/examples/training/fairseq/../../..                                                                                                                                  
+ lightseq-train wmt14_en_de/ --task translation --arch ls_transformer_wmt_en_de_big --optimizer ls_adam --adam-betas (0.9, 0.98) --clip-norm 0.0 --lr 5e-4 --lr-scheduler inverse_sqrt --warmup-updates 40
00 --weight-decay 0.0001 --criterion ls_label_smoothed_cross_entropy --label-smoothing 0.1 --max-tokens 8192 --eval-bleu --eval-bleu-args {"beam": 5, "max_len_a": 1.2, "max_len_b": 10} --eval-bleu-detok 
moses --eval-bleu-remove-bpe --eval-bleu-print-samples --best-checkpoint-metric bleu --maximize-best-checkpoint-metric                                                                                     
2021-09-27 22:26:20 | INFO | fairseq.distributed_utils | distributed init (rank 0): tcp://localhost:11236                                                                                                  
2021-09-27 22:26:20 | INFO | fairseq.distributed_utils | distributed init (rank 5): tcp://localhost:11236                                                                                                  
2021-09-27 22:26:20 | INFO | fairseq.distributed_utils | distributed init (rank 7): tcp://localhost:11236                                                                                                  
2021-09-27 22:26:21 | INFO | fairseq.distributed_utils | distributed init (rank 4): tcp://localhost:11236                                                                                                  
2021-09-27 22:26:23 | INFO | fairseq.distributed_utils | distributed init (rank 2): tcp://localhost:11236                                                                                                  
2021-09-27 22:26:24 | INFO | fairseq.distributed_utils | distributed init (rank 3): tcp://localhost:11236                                                                                                  
2021-09-27 22:26:24 | INFO | fairseq.distributed_utils | distributed init (rank 6): tcp://localhost:11236                                                                                                  
2021-09-27 22:26:24 | INFO | fairseq.distributed_utils | distributed init (rank 1): tcp://localhost:11236                                                                                                  
2021-09-27 22:26:29 | INFO | fairseq.distributed_utils | initialized host rkoy-gpu-machine as rank 6                                                                                                            
2021-09-27 22:26:29 | INFO | fairseq.distributed_utils | initialized host rkoy-gpu-machine as rank 7                                                                                                            
2021-09-27 22:26:29 | INFO | fairseq.distributed_utils | initialized host rkoy-gpu-machine as rank 0                                                                                                            
2021-09-27 22:26:29 | INFO | fairseq.distributed_utils | initialized host rkoy-gpu-machine as rank 2                                                                                                            
2021-09-27 22:26:29 | INFO | fairseq.distributed_utils | initialized host rkoy-gpu-machine as rank 1                                                                                                            
2021-09-27 22:26:29 | INFO | fairseq.distributed_utils | initialized host rkoy-gpu-machine as rank 3
2021-09-27 22:26:29 | INFO | fairseq.distributed_utils | initialized host rkoy-gpu-machine as rank 5
2021-09-27 22:26:29 | INFO | fairseq.distributed_utils | initialized host rkoy-gpu-machine as rank 4
2021-09-27 22:26:29 | INFO | fairseq_cli.train | Namespace(activation_dropout=0.0, activation_fn='relu', adam_betas='(0.9, 0.98)', adam_eps=1e-08, adaptive_input=False, adaptive_softmax_cutoff=None, adap
tive_softmax_dropout=0, all_gather_list_size=16384, arch='ls_transformer_wmt_en_de_big', attention_dropout=0.1, batch_size=None, batch_size_valid=None, best_checkpoint_metric='bleu', bf16=False, bpe=None
, broadcast_buffers=False, bucket_cap_mb=25, checkpoint_activations=False, checkpoint_shard_count=1, checkpoint_suffix='', clip_norm=0.0, cpu=False, criterion='ls_label_smoothed_cross_entropy', cross_sel
f_attention=False, curriculum=0, data='wmt14_en_de/', data_buffer_size=10, dataset_impl=None, ddp_backend='c10d', decoder_attention_heads=16, decoder_embed_dim=1024, decoder_embed_path=None, decoder_ffn_
embed_dim=4096, decoder_input_dim=1024, decoder_layerdrop=0, decoder_layers=6, decoder_layers_to_keep=None, decoder_learned_pos=False, decoder_normalize_before=False, decoder_output_dim=1024, device_id=0
, disable_validation=False, distributed_backend='nccl', distributed_init_method='tcp://localhost:11236', distributed_no_spawn=False, distributed_num_procs=8, distributed_port=-1, distributed_rank=0, dist
ributed_world_size=8, distributed_wrapper='DDP', dropout=0.3, empty_cache_freq=0, encoder_attention_heads=16, encoder_embed_dim=1024, encoder_embed_path=None, encoder_ffn_embed_dim=4096, encoder_layerdro
p=0, encoder_layers=6, encoder_layers_to_keep=None, encoder_learned_pos=False, encoder_normalize_before=False, eval_bleu=True, eval_bleu_args='{"beam": 5, "max_len_a": 1.2, "max_len_b": 10}', eval_bleu_d
etok='moses', eval_bleu_detok_args=None, eval_bleu_print_samples=True, eval_bleu_remove_bpe='@@ ', eval_tokenized_bleu=False, fast_stat_sync=False, find_unused_parameters=False, finetune_from_model=None,
 fix_batches_to_gpus=False, fixed_validation_seed=None, fp16=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, gen_subset='test', ignore_prefix_si
ze=0, keep_best_checkpoints=-1, keep_interval_updates=-1, keep_last_epochs=-1, label_smoothing=0.1, layernorm_embedding=False, left_pad_source='True', left_pad_target='False', load_alignments=False, loca
lsgd_frequency=3, log_format=None, log_interval=100, lr=[0.0005], lr_scheduler='inverse_sqrt', max_epoch=0, max_source_positions=1024, max_target_positions=1024, max_tokens=8192, max_tokens_valid=8192, m
ax_update=0, maximize_best_checkpoint_metric=True, memory_efficient_bf16=False, memory_efficient_fp16=False, min_loss_scale=0.0001, min_lr=-1.0, min_params_to_wrap=100000000, model_parallel_size=1, no_cr
oss_attention=False, no_epoch_checkpoints=False, no_last_checkpoints=False, no_progress_bar=False, no_save=False, no_save_optimizer_state=False, no_scale_embedding=False, no_seed_provided=False, no_token
_positional_embeddings=False, nprocs_per_node=8, num_batch_buckets=0, num_shards=1, num_workers=1, offload_activations=False, optimizer='ls_adam', optimizer_overrides='{}', patience=-1, pipeline_balance=
None, pipeline_checkpoint='never', pipeline_chunks=0, pipeline_decoder_balance=None, pipeline_decoder_devices=None, pipeline_devices=None, pipeline_encoder_balance=None, pipeline_encoder_devices=None, pi
peline_model_parallel=False, profile=False, quant_noise_pq=0, quant_noise_pq_block_size=8, quant_noise_scalar=0, quantization_config_path=None, report_accuracy=False, required_batch_size_multiple=8, requ
ired_seq_len_multiple=1, reset_dataloader=False, reset_lr_scheduler=False, reset_meters=False, reset_optimizer=False, restore_file='checkpoint_last.pt', save_dir='checkpoints', save_interval=1, save_inte
rval_updates=0, scoring='bleu', seed=1, sentence_avg=False, shard_id=0, share_all_embeddings=False, share_decoder_input_output_embed=False, skip_invalid_size_inputs_valid_test=False, slowmo_algorithm='Lo
calSGD', slowmo_momentum=None, source_lang=None, stop_time_hours=0, target_lang=None, task='translation', tensorboard_logdir=None, threshold_loss_scale=None, tie_adaptive_weights=False, tokenizer=None, t
pu=False, train_subset='train', truncate_source=False, update_freq=[1], upsample_primary=1, use_bmuf=False, use_old_adam=False, user_dir='/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packag
es/examples/training/fairseq/fs_modules', valid_subset='valid', validate_after_updates=0, validate_interval=1, validate_interval_updates=0, warmup_init_lr=-1, warmup_updates=4000, weight_decay=0.0001, ze
ro_sharding='none')
2021-09-27 22:26:29 | INFO | fairseq.tasks.translation | [en] dictionary: 40480 types
2021-09-27 22:26:29 | INFO | fairseq.tasks.translation | [de] dictionary: 42720 types
2021-09-27 22:26:29 | INFO | fairseq.data.data_utils | loaded 39414 examples from: wmt14_en_de/valid.en-de.en
2021-09-27 22:26:29 | INFO | fairseq.data.data_utils | loaded 39414 examples from: wmt14_en_de/valid.en-de.de
2021-09-27 22:26:29 | INFO | fairseq.tasks.translation | wmt14_en_de/ valid en-de 39414 examples
Using /home/rkoy/.cache/torch_extensions as PyTorch extensions root...
Using /home/rkoy/.cache/torch_extensions as PyTorch extensions root...
Using /home/rkoy/.cache/torch_extensions as PyTorch extensions root...
Using /home/rkoy/.cache/torch_extensions as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/rkoy/.cache/torch_extensions/lightseq_layers/build.ninja...
Building extension module lightseq_layers...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
Using /home/rkoy/.cache/torch_extensions as PyTorch extensions root...
[1/15] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi101
1\" -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/c
src/ops/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torc
h/csrc/api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include
/THC -isystem /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO
_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__C
UDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels
/cublas_wrappers.cu -o cublas_wrappers.cuda.o 
FAILED: cublas_wrappers.cuda.o 
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/
data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/ops
/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torch/csrc/
api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/THC -i
system /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_
OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_
HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/cublas
_wrappers.cu -o cublas_wrappers.cuda.o 
nvcc fatal   : Unsupported gpu architecture 'compute_86'
[2/15] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi101
1\" -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/c
src/ops/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torc
h/csrc/api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include
/THC -isystem /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO
_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__C
UDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels
/transform_kernels.cu -o transform_kernels.cuda.o  
FAILED: transform_kernels.cuda.o 
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/
data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/ops
/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torch/csrc/
api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/THC -i
system /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_
OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_
HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/transf
orm_kernels.cu -o transform_kernels.cuda.o 
nvcc fatal   : Unsupported gpu architecture 'compute_86'
[3/15] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi101
1\" -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/c
src/ops/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torc
h/csrc/api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include
/THC -isystem /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO
_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__C
UDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels
/dropout_kernels.cu -o dropout_kernels.cuda.o 
FAILED: dropout_kernels.cuda.o                                                                                                                                                                   [169/1821]
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/
data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/ops
/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torch/csrc/
api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/THC -i
system /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_
OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_
HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/dropou
t_kernels.cu -o dropout_kernels.cuda.o 
nvcc fatal   : Unsupported gpu architecture 'compute_86'
[4/15] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi101
1\" -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/c
src/ops/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torc
h/csrc/api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include
/THC -isystem /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO
_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__C
UDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels
/normalize_kernels.cu -o normalize_kernels.cuda.o  
FAILED: normalize_kernels.cuda.o 
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/
data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/ops
/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torch/csrc/
api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/THC -i
system /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_
OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_
HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/normal
ize_kernels.cu -o normalize_kernels.cuda.o 
nvcc fatal   : Unsupported gpu architecture 'compute_86'
[5/15] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi101
1\" -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/c
src/ops/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torc
h/csrc/api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include
/THC -isystem /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO
_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__C
UDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels
/softmax_kernels.cu -o softmax_kernels.cuda.o 
FAILED: softmax_kernels.cuda.o 
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/
data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/ops
/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torch/csrc/
api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/THC -i
system /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_
OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_
HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/softma
x_kernels.cu -o softmax_kernels.cuda.o 
nvcc fatal   : Unsupported gpu architecture 'compute_86'
[6/15] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"[123/1821]
1\" -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/c
src/ops/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torc
h/csrc/api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include
/THC -isystem /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO
_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__C
UDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels
/cuda_util.cu -o cuda_util.cuda.o 
FAILED: cuda_util.cuda.o 
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/
data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/ops
/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torch/csrc/
api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/THC -i
system /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_
OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_
HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/cuda_u
til.cu -o cuda_util.cuda.o 
nvcc fatal   : Unsupported gpu architecture 'compute_86'
[7/15] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi101
1\" -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/c
src/ops/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torc
h/csrc/api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include
/THC -isystem /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO
_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__C
UDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels
/general_kernels.cu -o general_kernels.cuda.o 
FAILED: general_kernels.cuda.o 
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/
data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/ops
/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torch/csrc/
api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/THC -i
system /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_
OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_
HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/genera
l_kernels.cu -o general_kernels.cuda.o 
nvcc fatal   : Unsupported gpu architecture 'compute_86'
[8/15] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi101
1\" -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/c
src/ops/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torc
h/csrc/api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include
/THC -isystem /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO
_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__C
UDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels
/embedding_kernels.cu -o embedding_kernels.cuda.o  
FAILED: embedding_kernels.cuda.o 
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/
data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/ops
/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torch/csrc/
api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/THC -i
system /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_
OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_
HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/embedd
ing_kernels.cu -o embedding_kernels.cuda.o 
nvcc fatal   : Unsupported gpu architecture 'compute_86'                                                                                                                                          [70/1821]
[9/15] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi101
1\" -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/c
src/ops/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torc
h/csrc/api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include
/THC -isystem /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO
_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__C
UDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels
/cross_entropy.cu -o cross_entropy.cuda.o 
FAILED: cross_entropy.cuda.o 
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/
data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/ops
/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torch/csrc/
api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/THC -i
system /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_
OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_
HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/cross_
entropy.cu -o cross_entropy.cuda.o 
nvcc fatal   : Unsupported gpu architecture 'compute_86'
Using /home/rkoy/.cache/torch_extensions as PyTorch extensions root...
Using /home/rkoy/.cache/torch_extensions as PyTorch extensions root...
Using /home/rkoy/.cache/torch_extensions as PyTorch extensions root...
[10/15] c++ -MMD -MF cross_entropy_layer.o.d -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI
=\"_cxxabi1011\" -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lights
eq/training/csrc/ops/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch
/include/torch/csrc/api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/
torch/include/THC -isystem /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -g -Wno-reorder -c /data/r
avi-9151/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/ops/cross_entropy_layer.cpp -o cross_entropy_layer.o 
[11/15] c++ -MMD -MF transformer_embedding_layer.o.d -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_B
UILD_ABI=\"_cxxabi1011\" -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-package
s/lightseq/training/csrc/ops/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packag
es/torch/include/torch/csrc/api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-p
ackages/torch/include/THC -isystem /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -g -Wno-reorder -c
 /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/ops/transformer_embedding_layer.cpp -o transformer_embedding_layer.o 
[12/15] c++ -MMD -MF transformer_encoder_layer.o.d -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUI
LD_ABI=\"_cxxabi1011\" -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/
lightseq/training/csrc/ops/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages
/torch/include/torch/csrc/api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-pac
kages/torch/include/THC -isystem /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -g -Wno-reorder -c /
data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/ops/transformer_encoder_layer.cpp -o transformer_encoder_layer.o 
[13/15] c++ -MMD -MF transformer_decoder_layer.o.d -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUI
LD_ABI=\"_cxxabi1011\" -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/
lightseq/training/csrc/ops/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages
/torch/include/torch/csrc/api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-pac
kages/torch/include/THC -isystem /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -g -Wno-reorder -c /
data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/ops/transformer_decoder_layer.cpp -o transformer_decoder_layer.o 
[14/15] c++ -MMD -MF pybind_op.o.d -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi
1011\" -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/trainin
g/csrc/ops/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/t
orch/csrc/api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/incl
ude/THC -isystem /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -g -Wno-reorder -c /data/rkoy/a
naconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/torch/pybind_op.cpp -o pybind_op.o 
ninja: build stopped: subcommand failed.
Loading extension module lightseq_layers...                                                                                                                                                       [17/1821]
Loading extension module lightseq_layers...
Loading extension module lightseq_layers...
Loading extension module lightseq_layers...
Loading extension module lightseq_layers...
Loading extension module lightseq_layers...
Loading extension module lightseq_layers...
Traceback (most recent call last):
  File "/data/rkoy/anaconda3/envs/lightseq/bin/lightseq-train", line 8, in <module>
    sys.exit(ls_cli_main())
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/examples/training/fairseq/lightseq_fairseq_train_cli.py", line 10, in ls_cli_main
    cli_main(*args, **kwargs)
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/fairseq_cli/train.py", line 352, in cli_main
    distributed_utils.call_main(args, main)
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/fairseq/distributed_utils.py", line 283, in call_main
    torch.multiprocessing.spawn(
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 157, in start_processes
    while not context.join():
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 118, in join
    raise Exception(msg)
Exception: 

-- Process 1 terminated with the following error:
Traceback (most recent call last):
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1533, in _run_ninja_build
    subprocess.run(
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
    fn(i, *args)
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/fairseq/distributed_utils.py", line 270, in distributed_main
    main(args, **kwargs)
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/fairseq_cli/train.py", line 68, in main
    model = task.build_model(args)
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/fairseq/tasks/translation.py", line 327, in build_model
    model = super().build_model(args)
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/fairseq/tasks/fairseq_task.py", line 547, in build_model
    model = models.build_model(args, self)
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/fairseq/models/__init__.py", line 58, in build_model
    return ARCH_MODEL_REGISTRY[model_cfg.arch].build_model(model_cfg, task)
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/examples/training/fairseq/fs_modules/ls_transformer.py", line 136, in build_model
    encoder_embed_tokens = cls.build_embedding(
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/examples/training/fairseq/fs_modules/ls_transformer.py", line 159, in build_embedding
    emb = LSTransformerEmbeddingLayer(config)
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/ops/pytorch/transformer_embedding_layer.py", line 96, in __init__
    transformer_cuda_module = TransformerBuilder().load()
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/ops/pytorch/builder/builder.py", line 203, in load
    return self.jit_load(verbose)
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/ops/pytorch/builder/builder.py", line 231, in jit_load
    op_module = load(
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 986, in load
    return _jit_compile(
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1193, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1297, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1555, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'lightseq_layers'

And these are the details of my gpu

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03    Driver Version: 460.91.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  RTX A6000           Off  | 00000000:01:00.0 Off |                  Off |
| 30%   53C    P2    71W / 300W |    363MiB / 48685MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  RTX A6000           Off  | 00000000:25:00.0 Off |                  Off |
| 30%   36C    P8    26W / 300W |      2MiB / 48685MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

This is the content of sh examples/training/fairseq/ls_fairseq_wmt14en2de.sh

#!/usr/bin/env bash
set -ex
THIS_DIR=$(dirname $(readlink -f $0))
cd $THIS_DIR/../../..

#if [ ! -d "/tmp/wmt14_en_de" ]; then
#    echo "Downloading dataset"
#    wget http://sf3-ttcdn-tos.pstatp.com/obj/nlp-opensource/lightseq/wmt_data/databin_wmt14_en_de.tar.gz -P /tmp
#    tar -zxvf /tmp/databin_wmt14_en_de.tar.gz -C /tmp && rm /tmp/databin_wmt14_en_de.tar.gz
#fi

lightseq-train wmt14_en_de/ \
    --task translation \
    --arch ls_transformer_wmt_en_de_big \
    --optimizer ls_adam --adam-betas '(0.9, 0.98)' \
    --clip-norm 0.0 \
    --lr 5e-4 --lr-scheduler inverse_sqrt --warmup-updates 4000 --weight-decay 0.0001 \
    --criterion ls_label_smoothed_cross_entropy --label-smoothing 0.1 \
    --max-tokens 8192 \
    --eval-bleu \
    --eval-bleu-args '{"beam": 5, "max_len_a": 1.2, "max_len_b": 10}' \
    --eval-bleu-detok moses \
    --eval-bleu-remove-bpe \
    --eval-bleu-print-samples \
    --best-checkpoint-metric bleu \
    --maximize-best-checkpoint-metric \

The only change i made is downloaded the dataset manually and have changes the path in the command as well and have also changes the architecture.

Please let me know the mistake i have made.
Thanks in advance!!!!!!!

@Taka152
Copy link
Contributor

Taka152 commented Sep 28, 2021

It seems like a torch extension problem, you can try this solution torch/torch7#1190 (comment)

@Taka152 Taka152 changed the title not able to train a transformer model using lightseq-train nvcc Unsupported gpu architecture error Sep 28, 2021
@rkoystart
Copy link
Author

I have 2 doubts

  1. apart from nvcc fatal : Unsupported gpu architecture 'compute_86' it is also saying RuntimeError: Error building extension 'lightseq_layers'. Is it also because of the torch version

It seems like a torch extension problem, you can try this solution torch/torch7#1190 (comment)

The mentioned repo seems to be still in development phase and i always prefer installing pytorch from here https://pytorch.org/get-started/previous-versions/ . I would like to know what torch version or cudatoolkit version i need to install in the gpu machine based on the gpu specs i have already mentioned.

@Taka152
Copy link
Contributor

Taka152 commented Oct 9, 2021

@Andrewlesson
Copy link

i have encountered the same problem. Has this problem been solved now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants