Illegal memory error when training with multi-GPU #247

desh2608 · 2022-03-11T18:53:27Z

I am facing the following error when training with multiple GPUs (on the same node). I am not sure if this is icefall related, but I thought maybe someone has seen it before? (I also tried running with CUDA_LAUNCH_BLOCKING=1 but got the same error message.

# Running on r7n01
# Started at Fri Mar 11 13:48:01 EST 2022
# python conformer_ctc/train.py --world-size 4 
free gpu: 0 1 2 3

terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: an illegal memory access was encountered
Exception raised from create_event_internal at /pytorch/c10/cuda/CUDACachingAllocator.cpp:733 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x2aab217dc2f2 in /home/hltcoe/draj/.conda/envs/scale/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5b (0x2aab217d967b in /home/hltcoe/draj/.conda/envs/scale/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x809 (0x2aab2156d1f9 in /home/hltcoe/draj/.conda/envs/scale/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::TensorImpl::release_resources() + 0x54 (0x2aab217c43a4 in /home/hltcoe/draj/.conda/envs/scale/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #4: std::vector<c10d::Reducer::Bucket, std::allocator<c10d::Reducer::Bucket> >::~vector() + 0x2f9 (0x2aaaad8aecc9 in /home/hltcoe/draj/.conda/envs/scale/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #5: c10d::Reducer::~Reducer() + 0x26a (0x2aaaad8a3c8a in /home/hltcoe/draj/.conda/envs/scale/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #6: std::_Sp_counted_ptr<c10d::Reducer*, (__gnu_cxx::_Lock_policy)2>::_M_dispose() + 0x12 (0x2aaaad8caf22 in /home/hltcoe/draj/.conda/envs/scale/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #7: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x46 (0x2aaaad207e76 in /home/hltcoe/draj/.conda/envs/scale/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #8: <unknown function> + 0xa2121f (0x2aaaad8ce21f in /home/hltcoe/draj/.conda/envs/scale/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #9: <unknown function> + 0x369f80 (0x2aaaad216f80 in /home/hltcoe/draj/.conda/envs/scale/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #10: <unknown function> + 0x36b1ee (0x2aaaad2181ee in /home/hltcoe/draj/.conda/envs/scale/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #11: <unknown function> + 0x10fd35 (0x555555663d35 in /home/hltcoe/draj/.conda/envs/scale/bin/python)
frame #12: <unknown function> + 0x1aa047 (0x5555556fe047 in /home/hltcoe/draj/.conda/envs/scale/bin/python)
frame #13: <unknown function> + 0x110882 (0x555555664882 in /home/hltcoe/draj/.conda/envs/scale/bin/python)
frame #14: <unknown function> + 0x1102a9 (0x5555556642a9 in /home/hltcoe/draj/.conda/envs/scale/bin/python)
frame #15: <unknown function> + 0x110293 (0x555555664293 in /home/hltcoe/draj/.conda/envs/scale/bin/python)
frame #16: <unknown function> + 0x1130b8 (0x5555556670b8 in /home/hltcoe/draj/.conda/envs/scale/bin/python)
frame #17: <unknown function> + 0x1106ff (0x5555556646ff in /home/hltcoe/draj/.conda/envs/scale/bin/python)
frame #18: <unknown function> + 0x1fba33 (0x55555574fa33 in /home/hltcoe/draj/.conda/envs/scale/bin/python)
frame #19: _PyEval_EvalFrameDefault + 0x2685 (0x55555572c0d5 in /home/hltcoe/draj/.conda/envs/scale/bin/python)
frame #20: _PyEval_EvalCodeWithName + 0x260 (0x5555557201f0 in /home/hltcoe/draj/.conda/envs/scale/bin/python)
frame #21: _PyFunction_Vectorcall + 0x534 (0x555555721754 in /home/hltcoe/draj/.conda/envs/scale/bin/python)
frame #22: _PyEval_EvalFrameDefault + 0x4bf (0x555555729f0f in /home/hltcoe/draj/.conda/envs/scale/bin/python)
frame #23: _PyFunction_Vectorcall + 0x1b7 (0x5555557213d7 in /home/hltcoe/draj/.conda/envs/scale/bin/python)
frame #24: _PyEval_EvalFrameDefault + 0x71a (0x55555572a16a in /home/hltcoe/draj/.conda/envs/scale/bin/python)
frame #25: _PyEval_EvalCodeWithName + 0x260 (0x5555557201f0 in /home/hltcoe/draj/.conda/envs/scale/bin/python)
frame #26: _PyFunction_Vectorcall + 0x594 (0x5555557217b4 in /home/hltcoe/draj/.conda/envs/scale/bin/python)
frame #27: _PyEval_EvalFrameDefault + 0x1517 (0x55555572af67 in /home/hltcoe/draj/.conda/envs/scale/bin/python)
frame #28: _PyEval_EvalCodeWithName + 0x260 (0x5555557201f0 in /home/hltcoe/draj/.conda/envs/scale/bin/python)
frame #29: PyEval_EvalCode + 0x23 (0x555555721aa3 in /home/hltcoe/draj/.conda/envs/scale/bin/python)
frame #30: <unknown function> + 0x241382 (0x555555795382 in /home/hltcoe/draj/.conda/envs/scale/bin/python)
frame #31: <unknown function> + 0x252202 (0x5555557a6202 in /home/hltcoe/draj/.conda/envs/scale/bin/python)
frame #32: PyRun_StringFlags + 0x7a (0x5555557a8e4a in /home/hltcoe/draj/.conda/envs/scale/bin/python)
frame #33: PyRun_SimpleStringFlags + 0x3c (0x5555557a8eac in /home/hltcoe/draj/.conda/envs/scale/bin/python)
frame #34: Py_RunMain + 0x15b (0x5555557a981b in /home/hltcoe/draj/.conda/envs/scale/bin/python)
frame #35: Py_BytesMain + 0x39 (0x5555557a9c69 in /home/hltcoe/draj/.conda/envs/scale/bin/python)
frame #36: __libc_start_main + 0xf5 (0x2aaaab616445 in /lib64/libc.so.6)
frame #37: <unknown function> + 0x1f7427 (0x55555574b427 in /home/hltcoe/draj/.conda/envs/scale/bin/python)

Traceback (most recent call last):
  File "conformer_ctc/train.py", line 787, in <module>
    main()
  File "conformer_ctc/train.py", line 775, in main
    mp.spawn(run, args=(world_size, args), nprocs=world_size, join=True)
  File "/home/hltcoe/draj/.conda/envs/scale/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/home/hltcoe/draj/.conda/envs/scale/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
    while not context.join():
  File "/home/hltcoe/draj/.conda/envs/scale/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 3 terminated with the following error:
Traceback (most recent call last):
  File "/home/hltcoe/draj/.conda/envs/scale/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
    fn(i, *args)
  File "/exp/draj/mini_scale_2022/icefall/egs/spgispeech/ASR/conformer_ctc/train.py", line 701, in run
    train_one_epoch(
  File "/exp/draj/mini_scale_2022/icefall/egs/spgispeech/ASR/conformer_ctc/train.py", line 527, in train_one_epoch
    loss.backward()
  File "/home/hltcoe/draj/.conda/envs/scale/lib/python3.8/site-packages/torch/tensor.py", line 245, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/hltcoe/draj/.conda/envs/scale/lib/python3.8/site-packages/torch/autograd/__init__.py", line 145, in backward
    Variable._execution_engine.run_backward(
RuntimeError: merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered

When I train on single GPU, it seems to be working fine:

# Running on r7n07
# Started at Fri Mar 11 13:36:00 EST 2022
# python conformer_ctc/train.py --world-size 1 
free gpu: 0

2022-03-11 13:36:03,704 INFO [train.py:589] Training started
2022-03-11 13:36:03,705 INFO [train.py:590] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 100, 'reset_interval': 500, 'valid_interval': 25000, 'feature_dim': 80, 'subsampling_factor': 4, 'use_feat_batchnorm': True, 'attention_dim': 512, 'nhead': 8, 'beam_size': 10, 'reduction': 'sum', 'use_double_scores': True, 'weight_decay': 1e-06, 'warm_step': 80000, 'env_info': {'k2-version': '1.13', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '5ee082ea55f50e8bd42203ba266945ea5a236ab8', 'k2-git-date': 'Sat Feb 26 20:00:48 2022', 'lhotse-version': '1.0.0.dev+git.e6e73e4.dirty', 'torch-cuda-available': True, 'torch-cuda-version': '11.1', 'python-version': '3.8', 'icefall-git-branch': 'spgi', 'icefall-git-sha1': '0c27ba4-dirty', 'icefall-git-date': 'Tue Mar 8 15:01:58 2022', 'icefall-path': '/exp/draj/mini_scale_2022/icefall', 'k2-path': '/home/hltcoe/draj/.conda/envs/scale/lib/python3.8/site-packages/k2/__init__.py', 'lhotse-path': '/exp/draj/mini_scale_2022/lhotse/lhotse/__init__.py', 'hostname': 'r7n07', 'IP address': '10.1.7.7'}, 'world_size': 1, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 20, 'start_epoch': 0, 'exp_dir': PosixPath('conformer_ctc/exp'), 'lang_dir': PosixPath('data/lang_bpe_5000'), 'att_rate': 0.8, 'num_decoder_layers': 6, 'lr_factor': 5.0, 'seed': 42, 'manifest_dir': PosixPath('data/manifests'), 'enable_musan': True, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'max_duration': 150.0, 'num_buckets': 30, 'on_the_fly_feats': False, 'shuffle': True, 'num_workers': 8, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80}
2022-03-11 13:36:03,859 INFO [lexicon.py:176] Loading pre-compiled data/lang_bpe_5000/Linv.pt
2022-03-11 13:36:04,019 INFO [train.py:638] About to create model
2022-03-11 13:36:08,869 INFO [asr_datamodule.py:295] About to get SPGISpeech dev cuts
2022-03-11 13:36:08,874 INFO [asr_datamodule.py:243] About to create dev dataset
2022-03-11 13:36:09,048 INFO [asr_datamodule.py:258] About to create dev dataloader
2022-03-11 13:36:09,049 INFO [train.py:735] Sanity check -- see if any of the batches in epoch 0 would cause OOM.
2022-03-11 13:36:14,049 INFO [train.py:697] epoch 0, learning rate 5.8593749999999995e-08
2022-03-11 13:36:15,186 INFO [train.py:532] Epoch 0, batch 0, loss[ctc_loss=7.717, att_loss=1.04, loss=2.376, over 3593.00 frames.], tot_loss[ctc_loss=7.717, att_loss=1.04, loss=2.376, over 3593.00 frames.], batch size: 13

The text was updated successfully, but these errors were encountered:

desh2608 · 2022-03-11T20:09:15Z

The error went away on reducing --max-duration in the asr_datamodule.py to 100s, so it seems it was a weirdly thrown OOM issue.

danpovey · 2022-03-12T09:43:32Z

Hm. It might be worthwhile trying to debug that a bit, e.g. see if you can do
export K2_SYNC_KERNELS=1
export CUDA_LAUNCH_BLOCKING=1
and possibly the error might show up earlier.

desh2608 · 2022-03-23T17:17:32Z

I get the same error even after adding export K2_SYNC_KERNELS=1 and export CUDA_LAUNCH_BLOCKING=1. I have k2 compiled in the debug mode. Is there some flag I can change to print more information?

csukuangfj · 2022-03-24T03:48:29Z

export K2_DISABLE_CHECKS=0 can enable extra checks.

You can use the steps in #142 (comment)
to debug the code with gdb.

ahazned · 2022-04-13T11:14:34Z

Hi, any updates on this issue?

I also get the same error on both single-gpu and multi-gpu setups unless I decrease "--max-duration" to 50.

I've also tried K2_SYNC_KERNELS=1 and CUDA_LAUNCH_BLOCKING=1 but the problem continues.

danpovey · 2022-04-13T11:29:17Z

How up-to-date is your code? We haven't seen this type of error for a while on our end.

ahazned · 2022-04-13T11:51:25Z

Hi Dan,

I cloned Icefall yesterday, my branch is up to date with 'origin/master' and k2 details are below. By the way I'm trying egs/librispeech/ASR/pruned_transducer_stateless2/train.py on Librispeech 100 hours.

/tmp/icefall$ git status
On branch master
Your branch is up to date with 'origin/master'.

python3 -m k2.version
Collecting environment information...

k2 version: 1.14
Build type: Release
Git SHA1: 6833270cb228aba7bf9681fccd41e2b52f7d984c
Git date: Wed Mar 16 03:16:05 2022
Cuda used to build k2: 11.1
cuDNN used to build k2: 8.0.4
Python version used to build k2: 3.8
OS used to build k2: Ubuntu 18.04.6 LTS
CMake version: 3.18.4
GCC version: 7.5.0
CMAKE_CUDA_FLAGS: --expt-extended-lambda -gencode arch=compute_35,code=sm_35 --expt-extended-lambda -gencode arch=compute_50,code=sm_50 --expt-extended-lambda -gencode arch=compute_60,code=sm_60 --expt-extended-lambda -gencode arch=compute_61,code=sm_61 --expt-extended-lambda -gencode arch=compute_70,code=sm_70 --expt-extended-lambda -gencode arch=compute_75,code=sm_75 --expt-extended-lambda -gencode arch=compute_80,code=sm_80 --expt-extended-lambda -gencode arch=compute_86,code=sm_86 -D_GLIBCXX_USE_CXX11_ABI=0 --compiler-options -Wall --compiler-options -Wno-unknown-pragmas --compiler-options -Wno-strict-overflow
CMAKE_CXX_FLAGS: -D_GLIBCXX_USE_CXX11_ABI=0 -Wno-strict-overflow
PyTorch version used to build k2: 1.8.1
PyTorch is using Cuda: 11.1
NVTX enabled: True
With CUDA: True
Disable debug: True
Sync kernels : False
Disable checks: False

Here is what I got:

python3 pruned_transducer_stateless2/train.py --exp-dir=pruned_transducer_stateless2/exp_100h_ws1 --world-size 2 --num-epochs 40 --full-libri 0 --max-duration 300

/tmp/miniconda3/envs/k2/lib/python3.8/site-packages/lhotse/dataset/sampling/bucketing.py:96: UserWarning: Lazy CutSet detected in BucketingSampler: we will read it into memory anyway. Please use lhotse.dataset.DynamicBucketingSampler instead.
warnings.warn(
/tmp/miniconda3/envs/k2/lib/python3.8/site-packages/lhotse/dataset/sampling/bucketing.py:96: UserWarning: Lazy CutSet detected in BucketingSampler: we will read it into memory anyway. Please use lhotse.dataset.DynamicBucketingSampler instead.
warnings.warn(
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: an illegal memory access was encountered
Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1616554793803/work/c10/cuda/CUDACachingAllocator.cpp:733 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f0c4b9b82f2 in /tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5b (0x7f0c4b9b567b in /tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x809 (0x7f0c4bc11219 in /tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::TensorImpl::release_resources() + 0x54 (0x7f0c4b9a03a4 in /tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #4: + 0x6e0e5a (0x7f0ca2916e5a in /tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #5: + 0x6e0ef1 (0x7f0ca2916ef1 in /tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #6: + 0x1a974a (0x5568edb6a74a in /tmp/miniconda3/envs/k2/bin/python3)
frame #7: + 0x10f660 (0x5568edad0660 in /tmp/miniconda3/envs/k2/bin/python3)
frame #8: + 0x10f660 (0x5568edad0660 in /tmp/miniconda3/envs/k2/bin/python3)
frame #9: + 0x10faf5 (0x5568edad0af5 in /tmp/miniconda3/envs/k2/bin/python3)
frame #10: + 0x1a9727 (0x5568edb6a727 in /tmp/miniconda3/envs/k2/bin/python3)
frame #11: + 0x110632 (0x5568edad1632 in /tmp/miniconda3/envs/k2/bin/python3)
frame #12: + 0x110059 (0x5568edad1059 in /tmp/miniconda3/envs/k2/bin/python3)
frame #13: + 0x110043 (0x5568edad1043 in /tmp/miniconda3/envs/k2/bin/python3)
frame #14: + 0x112f68 (0x5568edad3f68 in /tmp/miniconda3/envs/k2/bin/python3)
frame #15: + 0x1104af (0x5568edad14af in /tmp/miniconda3/envs/k2/bin/python3)
frame #16: + 0x1fe1f3 (0x5568edbbf1f3 in /tmp/miniconda3/envs/k2/bin/python3)
frame #17: _PyEval_EvalFrameDefault + 0x2681 (0x5568edb9a021 in /tmp/miniconda3/envs/k2/bin/python3)
frame #18: _PyEval_EvalCodeWithName + 0x260 (0x5568edb8d600 in /tmp/miniconda3/envs/k2/bin/python3)
frame #19: _PyFunction_Vectorcall + 0x534 (0x5568edb8eb64 in /tmp/miniconda3/envs/k2/bin/python3)
frame #20: _PyEval_EvalFrameDefault + 0x4c0 (0x5568edb97e60 in /tmp/miniconda3/envs/k2/bin/python3)
frame #21: _PyFunction_Vectorcall + 0x1b7 (0x5568edb8e7e7 in /tmp/miniconda3/envs/k2/bin/python3)
frame #22: _PyEval_EvalFrameDefault + 0x71b (0x5568edb980bb in /tmp/miniconda3/envs/k2/bin/python3)
frame #23: _PyEval_EvalCodeWithName + 0x260 (0x5568edb8d600 in /tmp/miniconda3/envs/k2/bin/python3)
frame #24: _PyFunction_Vectorcall + 0x594 (0x5568edb8ebc4 in /tmp/miniconda3/envs/k2/bin/python3)
frame #25: _PyEval_EvalFrameDefault + 0x1510 (0x5568edb98eb0 in /tmp/miniconda3/envs/k2/bin/python3)
frame #26: _PyEval_EvalCodeWithName + 0x260 (0x5568edb8d600 in /tmp/miniconda3/envs/k2/bin/python3)
frame #27: PyEval_EvalCode + 0x23 (0x5568edb8eeb3 in /tmp/miniconda3/envs/k2/bin/python3)
frame #28: + 0x242622 (0x5568edc03622 in /tmp/miniconda3/envs/k2/bin/python3)
frame #29: + 0x2531d2 (0x5568edc141d2 in /tmp/miniconda3/envs/k2/bin/python3)
frame #30: PyRun_StringFlags + 0x7a (0x5568edc16e0a in /tmp/miniconda3/envs/k2/bin/python3)
frame #31: PyRun_SimpleStringFlags + 0x3c (0x5568edc16e6c in /tmp/miniconda3/envs/k2/bin/python3)
frame #32: Py_RunMain + 0x15b (0x5568edc177db in /tmp/miniconda3/envs/k2/bin/python3)
frame #33: Py_BytesMain + 0x39 (0x5568edc17c29 in /tmp/miniconda3/envs/k2/bin/python3)
frame #34: __libc_start_main + 0xe7 (0x7f0cd469fc87 in /lib/x86_64-linux-gnu/libc.so.6)
frame #35: + 0x1f9ad7 (0x5568edbbaad7 in /tmp/miniconda3/envs/k2/bin/python3)

terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: an illegal memory access was encountered
Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1616554793803/work/c10/cuda/CUDACachingAllocator.cpp:733 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f27956ae2f2 in /tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5b (0x7f27956ab67b in /tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x809 (0x7f2795907219 in /tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::TensorImpl::release_resources() + 0x54 (0x7f27956963a4 in /tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #4: + 0x6e0e5a (0x7f27ec60ce5a in /tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #5: + 0x6e0ef1 (0x7f27ec60cef1 in /tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #6: + 0x1a974a (0x55953ec0f74a in /tmp/miniconda3/envs/k2/bin/python3)
frame #7: + 0x10f660 (0x55953eb75660 in /tmp/miniconda3/envs/k2/bin/python3)
frame #8: + 0x10f660 (0x55953eb75660 in /tmp/miniconda3/envs/k2/bin/python3)
frame #9: + 0x10faf5 (0x55953eb75af5 in /tmp/miniconda3/envs/k2/bin/python3)
frame #10: + 0x1a9727 (0x55953ec0f727 in /tmp/miniconda3/envs/k2/bin/python3)
frame #11: + 0x110632 (0x55953eb76632 in /tmp/miniconda3/envs/k2/bin/python3)
frame #12: + 0x110059 (0x55953eb76059 in /tmp/miniconda3/envs/k2/bin/python3)
frame #13: + 0x110043 (0x55953eb76043 in /tmp/miniconda3/envs/k2/bin/python3)
frame #14: + 0x112f68 (0x55953eb78f68 in /tmp/miniconda3/envs/k2/bin/python3)
frame #15: + 0x1104af (0x55953eb764af in /tmp/miniconda3/envs/k2/bin/python3)
frame #16: + 0x1fe1f3 (0x55953ec641f3 in /tmp/miniconda3/envs/k2/bin/python3)
frame #17: _PyEval_EvalFrameDefault + 0x2681 (0x55953ec3f021 in /tmp/miniconda3/envs/k2/bin/python3)
frame #18: _PyEval_EvalCodeWithName + 0x260 (0x55953ec32600 in /tmp/miniconda3/envs/k2/bin/python3)
frame #19: _PyFunction_Vectorcall + 0x534 (0x55953ec33b64 in /tmp/miniconda3/envs/k2/bin/python3)
frame #20: _PyEval_EvalFrameDefault + 0x4c0 (0x55953ec3ce60 in /tmp/miniconda3/envs/k2/bin/python3)
frame #21: _PyFunction_Vectorcall + 0x1b7 (0x55953ec337e7 in /tmp/miniconda3/envs/k2/bin/python3)
frame #22: _PyEval_EvalFrameDefault + 0x71b (0x55953ec3d0bb in /tmp/miniconda3/envs/k2/bin/python3)
frame #23: _PyEval_EvalCodeWithName + 0x260 (0x55953ec32600 in /tmp/miniconda3/envs/k2/bin/python3)
frame #24: _PyFunction_Vectorcall + 0x594 (0x55953ec33bc4 in /tmp/miniconda3/envs/k2/bin/python3)
frame #25: _PyEval_EvalFrameDefault + 0x1510 (0x55953ec3deb0 in /tmp/miniconda3/envs/k2/bin/python3)
frame #26: _PyEval_EvalCodeWithName + 0x260 (0x55953ec32600 in /tmp/miniconda3/envs/k2/bin/python3)
frame #27: PyEval_EvalCode + 0x23 (0x55953ec33eb3 in /tmp/miniconda3/envs/k2/bin/python3)
frame #28: + 0x242622 (0x55953eca8622 in /tmp/miniconda3/envs/k2/bin/python3)
frame #29: + 0x2531d2 (0x55953ecb91d2 in /tmp/miniconda3/envs/k2/bin/python3)
frame #30: PyRun_StringFlags + 0x7a (0x55953ecbbe0a in /tmp/miniconda3/envs/k2/bin/python3)
frame #31: PyRun_SimpleStringFlags + 0x3c (0x55953ecbbe6c in /tmp/miniconda3/envs/k2/bin/python3)
frame #32: Py_RunMain + 0x15b (0x55953ecbc7db in /tmp/miniconda3/envs/k2/bin/python3)
frame #33: Py_BytesMain + 0x39 (0x55953ecbcc29 in /tmp/miniconda3/envs/k2/bin/python3)
frame #34: __libc_start_main + 0xe7 (0x7f281e395c87 in /lib/x86_64-linux-gnu/libc.so.6)
frame #35: + 0x1f9ad7 (0x55953ec5fad7 in /tmp/miniconda3/envs/k2/bin/python3)

Traceback (most recent call last):
File "pruned_transducer_stateless2/train.py", line 997, in
main()
File "pruned_transducer_stateless2/train.py", line 988, in main
mp.spawn(run, args=(world_size, args), nprocs=world_size, join=True)
File "/tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/tmp/icefall/egs/librispeech/ASR/pruned_transducer_stateless2/train.py", line 878, in run
scan_pessimistic_batches_for_oom(
File "/tmp/icefall/egs/librispeech/ASR/pruned_transducer_stateless2/train.py", line 964, in scan_pessimistic_batches_for_oom
loss.backward()
File "/tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/tensor.py", line 245, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/autograd/init.py", line 145, in backward
Variable._execution_engine.run_backward(
RuntimeError: merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered

danpovey · 2022-04-13T11:57:23Z

Try doing export K2_SYNC_KERNELS=1 and rerunning

…

On Wed, Apr 13, 2022 at 7:51 PM ahazned ***@***.***> wrote: Hi Dan, I cloned Icefall yesterday and my branch is up to date with 'origin/master' and k2 details are below. By the way I'm trying egs/librispeech/ASR/pruned_transducer_stateless2/train.py on Librispeech 100 hours. /tmp/icefall$ git status On branch master Your branch is up to date with 'origin/master'. python3 -m k2.version Collecting environment information... k2 version: 1.14 Build type: Release Git SHA1: 6833270cb228aba7bf9681fccd41e2b52f7d984c Git date: Wed Mar 16 03:16:05 2022 Cuda used to build k2: 11.1 cuDNN used to build k2: 8.0.4 Python version used to build k2: 3.8 OS used to build k2: Ubuntu 18.04.6 LTS CMake version: 3.18.4 GCC version: 7.5.0 CMAKE_CUDA_FLAGS: --expt-extended-lambda -gencode arch=compute_35,code=sm_35 --expt-extended-lambda -gencode arch=compute_50,code=sm_50 --expt-extended-lambda -gencode arch=compute_60,code=sm_60 --expt-extended-lambda -gencode arch=compute_61,code=sm_61 --expt-extended-lambda -gencode arch=compute_70,code=sm_70 --expt-extended-lambda -gencode arch=compute_75,code=sm_75 --expt-extended-lambda -gencode arch=compute_80,code=sm_80 --expt-extended-lambda -gencode arch=compute_86,code=sm_86 -D_GLIBCXX_USE_CXX11_ABI=0 --compiler-options -Wall --compiler-options -Wno-unknown-pragmas --compiler-options -Wno-strict-overflow CMAKE_CXX_FLAGS: -D_GLIBCXX_USE_CXX11_ABI=0 -Wno-strict-overflow PyTorch version used to build k2: 1.8.1 PyTorch is using Cuda: 11.1 NVTX enabled: True With CUDA: True Disable debug: True Sync kernels : False Disable checks: False Here is what I got: python3 pruned_transducer_stateless2/train.py --exp-dir=pruned_transducer_stateless2/exp_100h_ws1 --world-size 2 --num-epochs 40 --full-libri 0 --max-duration 300 /tmp/miniconda3/envs/k2/lib/python3.8/site-packages/lhotse/dataset/sampling/bucketing.py:96: UserWarning: Lazy CutSet detected in BucketingSampler: we will read it into memory anyway. Please use lhotse.dataset.DynamicBucketingSampler instead. warnings.warn( /tmp/miniconda3/envs/k2/lib/python3.8/site-packages/lhotse/dataset/sampling/bucketing.py:96: UserWarning: Lazy CutSet detected in BucketingSampler: we will read it into memory anyway. Please use lhotse.dataset.DynamicBucketingSampler instead. warnings.warn( terminate called after throwing an instance of 'c10::Error' what(): CUDA error: an illegal memory access was encountered Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1616554793803/work/c10/cuda/CUDACachingAllocator.cpp:733 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f0c4b9b82f2 in /tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/lib/libc10.so) frame #1 <#1>: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5b (0x7f0c4b9b567b in /tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/lib/libc10.so) frame #2 <#2>: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x809 (0x7f0c4bc11219 in /tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/lib/libc10_cuda.so) frame #3 <#3>: c10::TensorImpl::release_resources() + 0x54 (0x7f0c4b9a03a4 in /tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/lib/libc10.so) frame #4 <#4>: + 0x6e0e5a (0x7f0ca2916e5a in /tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/lib/libtorch_python.so) frame #5 <#5>: + 0x6e0ef1 (0x7f0ca2916ef1 in /tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/lib/libtorch_python.so) frame #6 <#6>: + 0x1a974a (0x5568edb6a74a in /tmp/miniconda3/envs/k2/bin/python3) frame #7 <#7>: + 0x10f660 (0x5568edad0660 in /tmp/miniconda3/envs/k2/bin/python3) frame #8 <#8>: + 0x10f660 (0x5568edad0660 in /tmp/miniconda3/envs/k2/bin/python3) frame #9 <#9>: + 0x10faf5 (0x5568edad0af5 in /tmp/miniconda3/envs/k2/bin/python3) frame #10 <#10>: + 0x1a9727 (0x5568edb6a727 in /tmp/miniconda3/envs/k2/bin/python3) frame #11 <#11>: + 0x110632 (0x5568edad1632 in /tmp/miniconda3/envs/k2/bin/python3) frame #12 <#12>: + 0x110059 (0x5568edad1059 in /tmp/miniconda3/envs/k2/bin/python3) frame #13 <#13>: + 0x110043 (0x5568edad1043 in /tmp/miniconda3/envs/k2/bin/python3) frame #14 <#14>: + 0x112f68 (0x5568edad3f68 in /tmp/miniconda3/envs/k2/bin/python3) frame #15 <#15>: + 0x1104af (0x5568edad14af in /tmp/miniconda3/envs/k2/bin/python3) frame #16 <#16>: + 0x1fe1f3 (0x5568edbbf1f3 in /tmp/miniconda3/envs/k2/bin/python3) frame #17 <#17>: _PyEval_EvalFrameDefault + 0x2681 (0x5568edb9a021 in /tmp/miniconda3/envs/k2/bin/python3) frame #18 <#18>: _PyEval_EvalCodeWithName + 0x260 (0x5568edb8d600 in /tmp/miniconda3/envs/k2/bin/python3) frame #19 <#19>: _PyFunction_Vectorcall + 0x534 (0x5568edb8eb64 in /tmp/miniconda3/envs/k2/bin/python3) frame #20 <#20>: _PyEval_EvalFrameDefault + 0x4c0 (0x5568edb97e60 in /tmp/miniconda3/envs/k2/bin/python3) frame #21 <#21>: _PyFunction_Vectorcall + 0x1b7 (0x5568edb8e7e7 in /tmp/miniconda3/envs/k2/bin/python3) frame #22 <#22>: _PyEval_EvalFrameDefault + 0x71b (0x5568edb980bb in /tmp/miniconda3/envs/k2/bin/python3) frame #23 <#23>: _PyEval_EvalCodeWithName + 0x260 (0x5568edb8d600 in /tmp/miniconda3/envs/k2/bin/python3) frame #24 <#24>: _PyFunction_Vectorcall + 0x594 (0x5568edb8ebc4 in /tmp/miniconda3/envs/k2/bin/python3) frame #25 <#25>: _PyEval_EvalFrameDefault + 0x1510 (0x5568edb98eb0 in /tmp/miniconda3/envs/k2/bin/python3) frame #26 <#26>: _PyEval_EvalCodeWithName + 0x260 (0x5568edb8d600 in /tmp/miniconda3/envs/k2/bin/python3) frame #27 <#27>: PyEval_EvalCode + 0x23 (0x5568edb8eeb3 in /tmp/miniconda3/envs/k2/bin/python3) frame #28 <#28>: + 0x242622 (0x5568edc03622 in /tmp/miniconda3/envs/k2/bin/python3) frame #29 <#29>: + 0x2531d2 (0x5568edc141d2 in /tmp/miniconda3/envs/k2/bin/python3) frame #30 <#30>: PyRun_StringFlags + 0x7a (0x5568edc16e0a in /tmp/miniconda3/envs/k2/bin/python3) frame #31 <#31>: PyRun_SimpleStringFlags + 0x3c (0x5568edc16e6c in /tmp/miniconda3/envs/k2/bin/python3) frame #32 <#32>: Py_RunMain + 0x15b (0x5568edc177db in /tmp/miniconda3/envs/k2/bin/python3) frame #33 <#33>: Py_BytesMain + 0x39 (0x5568edc17c29 in /tmp/miniconda3/envs/k2/bin/python3) frame #34 <#34>: __libc_start_main + 0xe7 (0x7f0cd469fc87 in /lib/x86_64-linux-gnu/libc.so.6) frame #35 <#35>: + 0x1f9ad7 (0x5568edbbaad7 in /tmp/miniconda3/envs/k2/bin/python3) terminate called after throwing an instance of 'c10::Error' what(): CUDA error: an illegal memory access was encountered Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1616554793803/work/c10/cuda/CUDACachingAllocator.cpp:733 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f27956ae2f2 in /tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/lib/libc10.so) frame #1 <#1>: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5b (0x7f27956ab67b in /tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/lib/libc10.so) frame #2 <#2>: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x809 (0x7f2795907219 in /tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/lib/libc10_cuda.so) frame #3 <#3>: c10::TensorImpl::release_resources() + 0x54 (0x7f27956963a4 in /tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/lib/libc10.so) frame #4 <#4>: + 0x6e0e5a (0x7f27ec60ce5a in /tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/lib/libtorch_python.so) frame #5 <#5>: + 0x6e0ef1 (0x7f27ec60cef1 in /tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/lib/libtorch_python.so) frame #6 <#6>: + 0x1a974a (0x55953ec0f74a in /tmp/miniconda3/envs/k2/bin/python3) frame #7 <#7>: + 0x10f660 (0x55953eb75660 in /tmp/miniconda3/envs/k2/bin/python3) frame #8 <#8>: + 0x10f660 (0x55953eb75660 in /tmp/miniconda3/envs/k2/bin/python3) frame #9 <#9>: + 0x10faf5 (0x55953eb75af5 in /tmp/miniconda3/envs/k2/bin/python3) frame #10 <#10>: + 0x1a9727 (0x55953ec0f727 in /tmp/miniconda3/envs/k2/bin/python3) frame #11 <#11>: + 0x110632 (0x55953eb76632 in /tmp/miniconda3/envs/k2/bin/python3) frame #12 <#12>: + 0x110059 (0x55953eb76059 in /tmp/miniconda3/envs/k2/bin/python3) frame #13 <#13>: + 0x110043 (0x55953eb76043 in /tmp/miniconda3/envs/k2/bin/python3) frame #14 <#14>: + 0x112f68 (0x55953eb78f68 in /tmp/miniconda3/envs/k2/bin/python3) frame #15 <#15>: + 0x1104af (0x55953eb764af in /tmp/miniconda3/envs/k2/bin/python3) frame #16 <#16>: + 0x1fe1f3 (0x55953ec641f3 in /tmp/miniconda3/envs/k2/bin/python3) frame #17 <#17>: _PyEval_EvalFrameDefault + 0x2681 (0x55953ec3f021 in /tmp/miniconda3/envs/k2/bin/python3) frame #18 <#18>: _PyEval_EvalCodeWithName + 0x260 (0x55953ec32600 in /tmp/miniconda3/envs/k2/bin/python3) frame #19 <#19>: _PyFunction_Vectorcall + 0x534 (0x55953ec33b64 in /tmp/miniconda3/envs/k2/bin/python3) frame #20 <#20>: _PyEval_EvalFrameDefault + 0x4c0 (0x55953ec3ce60 in /tmp/miniconda3/envs/k2/bin/python3) frame #21 <#21>: _PyFunction_Vectorcall + 0x1b7 (0x55953ec337e7 in /tmp/miniconda3/envs/k2/bin/python3) frame #22 <#22>: _PyEval_EvalFrameDefault + 0x71b (0x55953ec3d0bb in /tmp/miniconda3/envs/k2/bin/python3) frame #23 <#23>: _PyEval_EvalCodeWithName + 0x260 (0x55953ec32600 in /tmp/miniconda3/envs/k2/bin/python3) frame #24 <#24>: _PyFunction_Vectorcall + 0x594 (0x55953ec33bc4 in /tmp/miniconda3/envs/k2/bin/python3) frame #25 <#25>: _PyEval_EvalFrameDefault + 0x1510 (0x55953ec3deb0 in /tmp/miniconda3/envs/k2/bin/python3) frame #26 <#26>: _PyEval_EvalCodeWithName + 0x260 (0x55953ec32600 in /tmp/miniconda3/envs/k2/bin/python3) frame #27 <#27>: PyEval_EvalCode + 0x23 (0x55953ec33eb3 in /tmp/miniconda3/envs/k2/bin/python3) frame #28 <#28>: + 0x242622 (0x55953eca8622 in /tmp/miniconda3/envs/k2/bin/python3) frame #29 <#29>: + 0x2531d2 (0x55953ecb91d2 in /tmp/miniconda3/envs/k2/bin/python3) frame #30 <#30>: PyRun_StringFlags + 0x7a (0x55953ecbbe0a in /tmp/miniconda3/envs/k2/bin/python3) frame #31 <#31>: PyRun_SimpleStringFlags + 0x3c (0x55953ecbbe6c in /tmp/miniconda3/envs/k2/bin/python3) frame #32 <#32>: Py_RunMain + 0x15b (0x55953ecbc7db in /tmp/miniconda3/envs/k2/bin/python3) frame #33 <#33>: Py_BytesMain + 0x39 (0x55953ecbcc29 in /tmp/miniconda3/envs/k2/bin/python3) frame #34 <#34>: __libc_start_main + 0xe7 (0x7f281e395c87 in /lib/x86_64-linux-gnu/libc.so.6) frame #35 <#35>: + 0x1f9ad7 (0x55953ec5fad7 in /tmp/miniconda3/envs/k2/bin/python3) Traceback (most recent call last): File "pruned_transducer_stateless2/train.py", line 997, in main() File "pruned_transducer_stateless2/train.py", line 988, in main mp.spawn(run, args=(world_size, args), nprocs=world_size, join=True) File "/tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes while not context.join(): File "/tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join raise ProcessRaisedException(msg, error_index, failed_process.pid) torch.multiprocessing.spawn.ProcessRaisedException: -- Process 1 terminated with the following error: Traceback (most recent call last): File "/tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap fn(i, *args) File "/tmp/icefall/egs/librispeech/ASR/pruned_transducer_stateless2/train.py", line 878, in run scan_pessimistic_batches_for_oom( File "/tmp/icefall/egs/librispeech/ASR/pruned_transducer_stateless2/train.py", line 964, in scan_pessimistic_batches_for_oom loss.backward() File "/tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/tensor.py", line 245, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/autograd/ *init*.py", line 145, in backward Variable._execution_engine.run_backward( RuntimeError: merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered — Reply to this email directly, view it on GitHub <#247 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLOYSXDPNCFXGMOCYM5LVE2YMTANCNFSM5QQNFPJQ> . You are receiving this because you commented.Message ID: ***@***.***>

ahazned · 2022-04-13T12:02:35Z

Thanks. I tried, but unfortunately it doesn't help.

danpovey · 2022-04-13T12:11:06Z

It's supposed to make it print a more detailed error message, not fix the issue.

danpovey · 2022-04-13T12:13:15Z

Anyway I think a version of k2 from March 14th is not recent enough to run the pruned_transducer_stateless2 recipe.
You may have to compile k2 from scratch; or use a more recent version if you can find one.

csukuangfj · 2022-04-13T12:29:23Z

@ahazned
Are you able to run the unit tests of k2? You can follow https://k2-fsa.github.io/k2/installation/for_developers.html to run the tests.

desh2608 · 2022-04-14T16:18:57Z

@csukuangfj I have the most recent versions of k2 and icefall (all tests are passing), but still get this error for larger batch sizes (>100s when training with 4 GPUs with 12G mem each). I am trying to run a pruned_transducer_stateless2 model on SPGISpeech.

danpovey · 2022-04-15T03:39:03Z

@desh2608 see if you can run the training inside cuda-gdb (but I'm not sure whether cuda-gdb is able to handle multiple training processes, and also whether it will be easy for you to install). If the problem can be reproduced with 1 job that might make it easier.
Also
export K2_SYNC_KERNELS=1
export K2_DISABLE_DEBUG=0
export CUDA_LAUNCH_BLOCKING=1
may help to make a problem visible easier.

ahazned · 2022-04-15T06:09:15Z

I successfully run "pruned_transducer_stateless2/train.py" with "--max-duration=300" when I use a newer K2 (1.14, Git date: Wed Apr 13 00:46:49 2022). I use two GPU's with 24GB mem each.

But one interesting thing is that I get different WERs on "egs/yesno/ASR/tdnn/train.py" with different K2/Pytorch/Cuda combinations. Not sure if this is expected.

k2 version: 1.14 | Git date: Wed Mar 16 03:16:05 2022 | PyTorch version used to build k2: 1.8.1+cu111
%WER 0.42% [1 / 240, 0 ins, 1 del, 0 sub ]

k2 version: 1.14 | Git date: Wed Apr 13 00:46:49 2022 | PyTorch version used to build k2: 1.11.0+cu102
%WER 2.50% [6 / 240, 5 ins, 1 del, 0 sub ]

k2 version: 1.14 | Git date: Wed Apr 13 00:46:49 2022 | PyTorch version used to build k2: 1.8.1+cu102
%WER 3.33% [8 / 240, 7 ins, 1 del, 0 sub ]

k2 version: 1.14 | Git date: Wed Apr 13 00:46:49 2022 | PyTorch version used to build k2: 1.11.0+cu113
%WER 2.50% [6 / 240, 5 ins, 1 del, 0 sub ]

danpovey · 2022-04-15T06:11:37Z

Different PyTorch versions may cause different random-number sequences; and there may be other reasons why they differ slightly. I think this is probably expected. The yesno data set is super tiny, so random noise is a larger factor than normal.

ahazned · 2022-04-15T06:12:39Z

Ok, thanks Dan.

csukuangfj · 2022-05-03T01:34:06Z

@desh2608

How did you install k2 ?
What is the output of python3 -m k2.version ?
What is the type of your GPU?
If you compiled k2 from souce, are you running on the same machine that you used to compile k2?

desh2608 · 2022-05-03T15:04:36Z

I think this is fixed now (although I don't know what fixed it). I just updated PyTorch from version 1.8.1 to 1.10.1 and pulled the latest k2 (v1.14), and compiled it from source in debug mode.

09:56 $ python -m k2.version
Collecting environment information...

k2 version: 1.14
Build type: Debug
Git SHA1: 1b29f0a946f50186aaa82df46a59f492ade9692b
Git date: Tue Apr 12 20:46:49 2022
Cuda used to build k2: 11.1
cuDNN used to build k2: 8.0.2
Python version used to build k2: 3.8
OS used to build k2: CentOS Linux release 7.5.1804 (Core)
CMake version: 3.22.1
GCC version: 7.2.0
CMAKE_CUDA_FLAGS:  --compiler-options -rdynamic --compiler-options -lineinfo -Wno-deprecated-gpu-targets  --expt-extended-lambda -gencode arch=compute_35,code=sm_35 --expt-extended-lambda -gencode arch=compute_50,code=sm_50 --expt-extended-lambda -gencode arch=compute_60,code=sm_60 --expt-extended-lambda -gencode arch=compute_61,code=sm_61 --expt-extended-lambda -gencode arch=compute_70,code=sm_70 --expt-extended-lambda -gencode arch=compute_75,code=sm_75 --expt-extended-lambda -gencode arch=compute_80,code=sm_80 --expt-extended-lambda -gencode arch=compute_86,code=sm_86 -D_GLIBCXX_USE_CXX11_ABI=0 --compiler-options -Wall  --compiler-options -Wno-strict-overflow  --compiler-options -Wno-unknown-pragmas
CMAKE_CXX_FLAGS:  -D_GLIBCXX_USE_CXX11_ABI=0 -Wno-unused-variable  -Wno-strict-overflow
PyTorch version used to build k2: 1.10.1+cu111
PyTorch is using Cuda: 11.1
NVTX enabled: True
With CUDA: True
Disable debug: False
Sync kernels : True
Disable checks: False

After this upgrade, I am able to train with a batch size of 250s, where earlier I was getting the weird memory issues even with a batch size of 100 (using 8 V100 GPUs). Perhaps there was an issue with PyTorch 1.8.1? It's hard to say.

I still get a CUDA error when I try to use batch size 300, but from PyTorch discussion forums, it seems to be related to OOM, although I was hoping it would be caught by scan_pessimistic_batches_for_oom().

2022-05-03 10:28:06,656 INFO [asr_datamodule.py:289] (7/8) About to create dev dataloader
2022-05-03 10:28:06,656 INFO [train.py:926] (7/8) Sanity check -- see if any of the batches in epoch 0 would cause OOM.
2022-05-03 10:40:03,363 INFO [distributed.py:874] (0/8) Reducer buckets have been rebuilt in this iteration.
2022-05-03 10:40:03,367 INFO [distributed.py:874] (1/8) Reducer buckets have been rebuilt in this iteration.
2022-05-03 10:40:03,371 INFO [distributed.py:874] (5/8) Reducer buckets have been rebuilt in this iteration.
2022-05-03 10:40:03,371 INFO [distributed.py:874] (7/8) Reducer buckets have been rebuilt in this iteration.
2022-05-03 10:40:03,371 INFO [distributed.py:874] (6/8) Reducer buckets have been rebuilt in this iteration.
2022-05-03 10:40:03,371 INFO [distributed.py:874] (3/8) Reducer buckets have been rebuilt in this iteration.
2022-05-03 10:40:03,371 INFO [distributed.py:874] (4/8) Reducer buckets have been rebuilt in this iteration.
2022-05-03 10:40:03,371 INFO [distributed.py:874] (2/8) Reducer buckets have been rebuilt in this iteration.
2022-05-03 10:40:46,352 INFO [train.py:710] (6/8) Epoch 0, batch 0, loss[loss=0.8929, simple_loss=1.786, pruned_loss=6.343, over 7364.00 frames.], tot_loss[loss=0.8929, simple_loss=1.786, pruned_loss=6.343, over 7364.00 frames.], batch size: 37, lr: 3.00e-03
2022-05-03 10:40:46,353 INFO [train.py:710] (3/8) Epoch 0, batch 0, loss[loss=0.8258, simple_loss=1.652, pruned_loss=6.268, over 7487.00 frames.], tot_loss[loss=0.8258, simple_loss=1.652, pruned_loss=6.268, over 7487.00 frames.], batch size: 20, lr: 3.00e-03
2022-05-03 10:40:46,353 INFO [train.py:710] (4/8) Epoch 0, batch 0, loss[loss=0.8831, simple_loss=1.766, pruned_loss=6.333, over 7294.00 frames.], tot_loss[loss=0.8831, simple_loss=1.766, pruned_loss=6.333, over 7294.00 frames.], batch size: 31, lr: 3.00e-03
2022-05-03 10:40:46,353 INFO [train.py:710] (5/8) Epoch 0, batch 0, loss[loss=0.9224, simple_loss=1.845, pruned_loss=6.351, over 7392.00 frames.], tot_loss[loss=0.9224, simple_loss=1.845, pruned_loss=6.351, over 7392.00 frames.], batch size: 52, lr: 3.00e-03
2022-05-03 10:40:46,353 INFO [train.py:710] (7/8) Epoch 0, batch 0, loss[loss=0.8968, simple_loss=1.794, pruned_loss=6.348, over 7478.00 frames.], tot_loss[loss=0.8968, simple_loss=1.794, pruned_loss=6.348, over 7478.00 frames.], batch size: 29, lr: 3.00e-03
2022-05-03 10:40:46,353 INFO [train.py:710] (2/8) Epoch 0, batch 0, loss[loss=0.9219, simple_loss=1.844, pruned_loss=6.435, over 7405.00 frames.], tot_loss[loss=0.9219, simple_loss=1.844, pruned_loss=6.435, over 7405.00 frames.], batch size: 23, lr: 3.00e-03
2022-05-03 10:40:46,353 INFO [train.py:710] (1/8) Epoch 0, batch 0, loss[loss=0.8198, simple_loss=1.64, pruned_loss=6.221, over 7314.00 frames.], tot_loss[loss=0.8198, simple_loss=1.64, pruned_loss=6.221, over 7314.00 frames.], batch size: 32, lr: 3.00e-03
2022-05-03 10:40:46,353 INFO [train.py:710] (0/8) Epoch 0, batch 0, loss[loss=0.8963, simple_loss=1.793, pruned_loss=6.408, over 7483.00 frames.], tot_loss[loss=0.8963, simple_loss=1.793, pruned_loss=6.408, over 7483.00 frames.], batch size: 21, lr: 3.00e-03
Traceback (most recent call last):
  File "pruned_transducer_stateless2/train.py", line 978, in <module>
    main()
  File "pruned_transducer_stateless2/train.py", line 969, in main
    mp.spawn(run, args=(world_size, args), nprocs=world_size, join=True)
  File "/home/hltcoe/draj/.conda/envs/scale/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/home/hltcoe/draj/.conda/envs/scale/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
    while not context.join():
  File "/home/hltcoe/draj/.conda/envs/scale/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 2 terminated with the following error:
Traceback (most recent call last):
  File "/home/hltcoe/draj/.conda/envs/scale/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
    fn(i, *args)
  File "/exp/draj/mini_scale_2022/icefall/egs/spgispeech/ASR/pruned_transducer_stateless2/train.py", line 882, in run
    train_one_epoch(
  File "/exp/draj/mini_scale_2022/icefall/egs/spgispeech/ASR/pruned_transducer_stateless2/train.py", line 676, in train_one_epoch
    scaler.scale(loss).backward()
  File "/home/hltcoe/draj/.conda/envs/scale/lib/python3.8/site-packages/torch/_tensor.py", line 307, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/hltcoe/draj/.conda/envs/scale/lib/python3.8/site-packages/torch/autograd/__init__.py", line 154, in backward
    Variable._execution_engine.run_backward(
RuntimeError: CUDA error: invalid configuration argument

danpovey · 2022-05-04T04:58:08Z

@csukuangfj I am thinking we should just make it the default that it prints out some details of the batch (e.g. dimensions and sentence-lengths at least; or perhaps the entire object), when we get an OOM error. This will make things like this easier to debug.

HOWEVER, desh, I'm not convinced that this actually is an OOM error. Try doing
export K2_SYNC_KERNELS=1
export CUDA_LAUNCH_BLOCKING=1
and rerunning, hopefully we'll get a more relevant stack trace.

desh2608 · 2022-05-04T14:38:09Z

HOWEVER, desh, I'm not convinced that this actually is an OOM error. Try doing export K2_SYNC_KERNELS=1 export CUDA_LAUNCH_BLOCKING=1 and rerunning, hopefully we'll get a more relevant stack trace.

Yeah, I already have the following variables set:

export K2_DISABLE_CHECKS=0
export K2_SYNC_KERNELS=1
export CUDA_LAUNCH_BLOCKING=1

but I didn't see any more details in the stack trace. I also printed out the batch when the error happened, but it looked similar to all other batches. I'll try to get it again when my current training ends and share the batch details here.

danpovey · 2022-05-05T02:51:33Z

OK, thanks. It would be appreciated if you could help us debug this.
Something else you can try is to get a gdb stack trace:
gdb --args python3 [args]
(gdb) catch throw
(gdb) r
... this may give more info.

wgb14 · 2022-05-19T21:56:43Z

Post the log when I set --max-duration 300 while training on GigaSpeech

2022-05-20 01:01:03,441 INFO [train_test.py:782] Training started
2022-05-20 01:01:03,449 INFO [train_test.py:792] Device: cuda:0
2022-05-20 01:01:03,496 INFO [train_test.py:801] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 500, 'reset_interval': 2000, 'valid_interval': 20000, 'feature_dim': 80, 'subsampling_factor': 4, 'encoder_dim': 512, 'nhead': 8, 'dim_feedforward': 2048, 'num_encoder_layers': 12, 'decoder_dim': 512, 'joiner_dim': 512, 'model_warm_step': 20000, 'env_info': {'k2-version': '1.15.1', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'ecfe7bd6d9189964bf3ff043038918d889a43185', 'k2-git-date': 'Tue May 10 10:57:55 2022', 'lhotse-version': '1.2.0.dev+git.a3d7b8e.clean', 'torch-version': '1.10.0', 'torch-cuda-available': True, 'torch-cuda-version': '11.1', 'python-version': '3.7', 'icefall-git-branch': 'master', 'icefall-git-sha1': 'f6ce135-dirty', 'icefall-git-date': 'Mon May 16 21:46:59 2022', 'icefall-path': '/userhome/user/guanbo/icefall_test', 'k2-path': '/opt/conda/lib/python3.7/site-packages/k2-1.15.1.dev20220519+cuda11.1.torch1.10.0-py3.7-linux-x86_64.egg/k2/__init__.py', 'lhotse-path': '/userhome/user/guanbo/lhotse/lhotse/__init__.py', 'hostname': 'e0e708b00d794011ec09cda0e7275cb175f4-chenx8564-0', 'IP address': '10.229.82.57'}, 'world_size': 1, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 30, 'start_epoch': 0, 'start_batch': 0, 'exp_dir': PosixPath('pruned_transducer_stateless2/exp_oom'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'initial_lr': 0.003, 'lr_batches': 5000, 'lr_epochs': 6, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'seed': 42, 'print_diagnostics': False, 'save_every_n': 8000, 'keep_last_k': 20, 'use_fp16': True, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 300, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'return_cuts': True, 'num_workers': 4, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'enable_musan': False, 'subset': 'XL', 'small_dev': False, 'blank_id': 0, 'vocab_size': 500}
2022-05-20 01:01:03,496 INFO [train_test.py:803] About to create model
2022-05-20 01:01:04,027 INFO [train_test.py:807] Number of model parameters: 78648040
2022-05-20 01:01:09,835 INFO [asr_datamodule.py:399] About to get train_XL cuts
2022-05-20 01:01:09,835 INFO [asr_datamodule.py:229] Disable MUSAN
2022-05-20 01:01:09,836 INFO [asr_datamodule.py:271] Disable SpecAugment
2022-05-20 01:01:09,836 INFO [asr_datamodule.py:273] About to create train dataset
2022-05-20 01:01:09,836 INFO [asr_datamodule.py:301] Using DynamicBucketingSampler.
2022-05-20 01:01:12,470 INFO [asr_datamodule.py:316] About to create train dataloader
2022-05-20 01:01:12,471 INFO [asr_datamodule.py:406] About to get dev cuts
2022-05-20 01:01:12,918 INFO [asr_datamodule.py:347] About to create dev dataset
2022-05-20 01:01:12,925 INFO [asr_datamodule.py:366] About to create dev dataloader
2022-05-20 01:01:12,925 INFO [train_test.py:959] Sanity check -- see if any of the batches in epoch 0 would cause OOM.
2022-05-20 01:39:10,638 INFO [train_test.py:936] Saving batch to pruned_transducer_stateless2/exp_oom/batch-f2d3f761-0ba3-6279-14cb-056407437c3b.pt
2022-05-20 01:39:10,777 INFO [train_test.py:942] features shape: torch.Size([273, 170, 80])
2022-05-20 01:39:10,781 INFO [train_test.py:946] num tokens: 1377
Traceback (most recent call last):
  File "./pruned_transducer_stateless2/train_test.py", line 1011, in <module>
    main()
  File "./pruned_transducer_stateless2/train_test.py", line 1004, in main
    run(rank=0, world_size=1, args=args)
  File "./pruned_transducer_stateless2/train_test.py", line 863, in run
    params=params,
  File "./pruned_transducer_stateless2/train_test.py", line 977, in scan_pessimistic_batches_for_oom
    loss.backward()
  File "/opt/conda/lib/python3.7/site-packages/torch/_tensor.py", line 307, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/opt/conda/lib/python3.7/site-packages/torch/autograd/__init__.py", line 156, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: CUDA error: invalid configuration argument
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

And the batch.pt: batch-f2d3f761-0ba3-6279-14cb-056407437c3b.zip

csukuangfj · 2022-05-19T23:41:34Z

The attached batch works perfectly for me.

Here is the change I made to train.py to run it.

diff --git a/egs/gigaspeech/ASR/pruned_transducer_stateless2/train.py b/egs/gigaspeech/ASR/pruned_transducer_stateless2/train.py
index 83ae255..b69b6fc 100755
--- a/egs/gigaspeech/ASR/pruned_transducer_stateless2/train.py
+++ b/egs/gigaspeech/ASR/pruned_transducer_stateless2/train.py
@@ -833,6 +833,24 @@ def run(rank, world_size, args):
     if params.print_diagnostics:
         diagnostic = diagnostics.attach_diagnostics(model)

+    pt_file = "./batch-f2d3f761-0ba3-6279-14cb-056407437c3b.pt"
+    batch = torch.load(pt_file)
+    with torch.cuda.amp.autocast(enabled=params.use_fp16):
+        loss, _ = compute_loss(
+            params=params,
+            model=model,
+            sp=sp,
+            batch=batch,
+            is_training=True,
+            warmup=0.0,
+        )
+    loss.backward()
+    optimizer.step()
+    optimizer.zero_grad()
+    logging.info(f"loss: {loss}")
+
+    return
+
     gigaspeech = GigaSpeechAsrDataModule(args)

     train_cuts = gigaspeech.train_cuts()

The command for training is

./pruned_transducer_stateless2/train.py

The output is

2022-05-20 07:39:02,040 INFO [train.py:782] Training started
2022-05-20 07:39:02,044 INFO [train.py:792] Device: cuda:0
2022-05-20 07:39:02,052 INFO [train.py:801] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 500, 'reset_interval': 2000, 'valid_interval': 20000, 'feature_dim': 80, 'subsampling_factor': 4, 'encoder_dim': 512, 'nhead': 8, 'dim_feedforward': 2048, 'num_encoder_layers': 12, 'decoder_dim': 512, 'joiner_dim': 512, 'model_warm_step': 20000, 'env_info': {'k2-version': '1.15.1', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'f8d2dba06c000ffee36aab5b66f24e7c9809f116', 'k2-git-date': 'Thu Apr 21 12:20:34 2022', 'lhotse-version': '1.1.0.dev+missing.version.file', 'torch-version': '1.10.0+cu102', 'torch-cuda-available': True, 'torch-cuda-version': '10.2', 'python-version': '3.8', 'icefall-git-branch': 'master', 'icefall-git-sha1': '2900ed8-dirty', 'icefall-git-date': 'Thu May 19 12:51:07
2022', 'icefall-path': '/ceph-fj/fangjun/open-source-2/icefall-master-3', 'k2-path': '/ceph-fj/fangjun/open-source-2/k2-multi-22/k2/python/k2/__init__.py', 'lhotse-path': '/ceph-fj/fangjun/open-source-2/lhotse-master/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-1-0307195509-54c966b95f-rtpfq', 'IP address': '10.177.22.9'}, 'world_size': 1, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 30, 'start_epoch': 0, 'start_batch': 0, 'exp_dir': PosixPath('pruned_transducer_stateless2/exp'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'initial_lr': 0.003, 'lr_batches': 5000, 'lr_epochs': 6, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'seed': 42, 'print_diagnostics': False, 'save_every_n': 8000, 'keep_last_k': 20, 'use_fp16': False, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 200.0, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'subset': 'XL', 'small_dev': False, 'blank_id': 0, 'vocab_size': 500}
2022-05-20 07:39:02,052 INFO [train.py:803] About to create model
2022-05-20 07:39:02,471 INFO [train.py:807] Number of model parameters: 78648040
2022-05-20 07:39:07,589 INFO [train.py:850] loss: 26499.20703125

pkufool · 2022-05-19T23:42:48Z

@wgb14 I think you can first try run with CUDA_LAUNCH_BLOCKING=1, it may get more informative stacktrace. You can also print out the error message in scan_pessimistic_batches_for_oom (i.e. the expection error). I saw you were using max-duration=300 and the batch size is 273, there may be too much padding and it raised an OOM.

csukuangfj · 2022-05-19T23:43:40Z

I notice that you are using

torch 1.10.0 + CUDA 11.1 + Python 3.7

while I am using torch + CUDA 10.2 + Python 3.8. I will try to switch CUDA 11.1 + Python 3.7 and run it again.

pkufool · 2022-05-19T23:46:44Z

@csukuangfj I think we can print out the expection error message here, even if it is not an OOM error.

icefall/egs/librispeech/ASR/pruned_transducer_stateless2/train.py

Lines 1000 to 1011 in 2900ed8

    
               optimizer.zero_grad() 
        
           except Exception as e: 
        
               if "CUDA out of memory" in str(e): 
        
                   logging.error( 
        
                       "Your GPU ran out of memory with the current " 
        
                       "max_duration setting. We recommend decreasing " 
        
                       "max_duration and trying again.\n" 
        
                       f"Failing criterion: {criterion} " 
        
                       f"(={crit_values[criterion]}) ..." 
        
                   ) 
        
               display_and_save_batch(batch, params=params, sp=sp) 
        
               raise

pkufool · 2022-05-19T23:49:08Z

I notice that you are using

torch 1.10.0 + CUDA 11.1 + Python 3.7

while I am using torch + CUDA 10.2 + Python 3.8. I will try to switch CUDA 11.1 + Python 3.7 and run it again.

BTW, he used mix-precision training.

wgb14 · 2022-05-19T23:53:10Z

The error message in scan_pessimistic_batches_for_oom:

Failing criterion: max_num_cuts (=273) ...

And in my previous experiments,

export CUDA_LAUNCH_BLOCKING=1
export K2_SYNC_KERNELS=1

didn't give me any additional information.

pkufool · 2022-05-20T00:26:48Z

The error message in scan_pessimistic_batches_for_oom:
Failing criterion: max_num_cuts (=273) ...

Does it print out by str(e)?

And, what is your memory size, I think fangjun has a 32GB v100.

wgb14 · 2022-05-20T00:34:59Z

The error message in scan_pessimistic_batches_for_oom:
Failing criterion: max_num_cuts (=273) ...
Does it print out by str(e)?

And, what is your memory size, I think fangjun has a 32GB v100.

No, this is from logging, after commenting out the line if "CUDA out of memory" in str(e):

2022-05-20 06:02:33,454 INFO [train_test.py:782] Training started
2022-05-20 06:02:33,462 INFO [train_test.py:792] Device: cuda:0
2022-05-20 06:02:33,501 INFO [train_test.py:801] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 500, 'reset_interval': 2000, 'valid_interval': 20000, 'feature_dim': 80, 'subsampling_factor': 4, 'encoder_dim': 512, 'nhead': 8, 'dim_feedforward': 2048, 'num_encoder_layers': 12, 'decoder_dim': 512, 'joiner_dim': 512, 'model_warm_step': 20000, 'env_info': {'k2-version': '1.15.1', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'ecfe7bd6d9189964bf3ff043038918d889a43185', 'k2-git-date': 'Tue May 10 10:57:55 2022', 'lhotse-version': '1.2.0.dev+git.a3d7b8e.clean', 'torch-version': '1.10.0', 'torch-cuda-available': True, 'torch-cuda-version': '11.1', 'python-version': '3.7', 'icefall-git-branch': 'master', 'icefall-git-sha1': 'f6ce135-dirty', 'icefall-git-date': 'Mon May 16 21:46:59 2022', 'icefall-path': '/userhome/user/guanbo/icefall_test', 'k2-path': '/opt/conda/lib/python3.7/site-packages/k2-1.15.1.dev20220519+cuda11.1.torch1.10.0-py3.7-linux-x86_64.egg/k2/__init__.py', 'lhotse-path': '/userhome/user/guanbo/lhotse/lhotse/__init__.py', 'hostname': 'bad3b4500d7bf011ec09cda0e7275cb175f4-chenx8564-0', 'IP address': '10.229.82.7'}, 'world_size': 1, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 30, 'start_epoch': 0, 'start_batch': 0, 'exp_dir': PosixPath('pruned_transducer_stateless2/exp_oom'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'initial_lr': 0.003, 'lr_batches': 5000, 'lr_epochs': 6, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'seed': 42, 'print_diagnostics': False, 'save_every_n': 8000, 'keep_last_k': 20, 'use_fp16': True, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 300, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'return_cuts': True, 'num_workers': 4, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'enable_musan': False, 'subset': 'XL', 'small_dev': False, 'blank_id': 0, 'vocab_size': 500}
2022-05-20 06:02:33,502 INFO [train_test.py:803] About to create model
2022-05-20 06:02:34,003 INFO [train_test.py:807] Number of model parameters: 78648040
2022-05-20 06:02:38,928 INFO [asr_datamodule.py:399] About to get train_XL cuts
2022-05-20 06:02:38,929 INFO [asr_datamodule.py:229] Disable MUSAN
2022-05-20 06:02:38,929 INFO [asr_datamodule.py:271] Disable SpecAugment
2022-05-20 06:02:38,929 INFO [asr_datamodule.py:273] About to create train dataset
2022-05-20 06:02:38,929 INFO [asr_datamodule.py:301] Using DynamicBucketingSampler.
2022-05-20 06:02:41,685 INFO [asr_datamodule.py:316] About to create train dataloader
2022-05-20 06:02:41,686 INFO [asr_datamodule.py:406] About to get dev cuts
2022-05-20 06:02:42,191 INFO [asr_datamodule.py:347] About to create dev dataset
2022-05-20 06:02:42,199 INFO [asr_datamodule.py:366] About to create dev dataloader
2022-05-20 06:02:42,199 INFO [train_test.py:959] Sanity check -- see if any of the batches in epoch 0 would cause OOM.
2022-05-20 06:40:54,103 ERROR [train_test.py:983] Your GPU ran out of memory with the current max_duration setting. We recommend decreasing max_duration and trying again.
Failing criterion: max_num_cuts (=273) ...
2022-05-20 06:40:54,122 INFO [train_test.py:936] Saving batch to pruned_transducer_stateless2/exp_oom/batch-f2d3f761-0ba3-6279-14cb-056407437c3b.pt
2022-05-20 06:40:55,000 INFO [train_test.py:942] features shape: torch.Size([273, 170, 80])
2022-05-20 06:40:55,003 INFO [train_test.py:946] num tokens: 1377
Traceback (most recent call last):
  File "./pruned_transducer_stateless2/train_test.py", line 1011, in <module>
    main()
  File "./pruned_transducer_stateless2/train_test.py", line 1004, in main
    run(rank=0, world_size=1, args=args)
  File "./pruned_transducer_stateless2/train_test.py", line 863, in run
    params=params,
  File "./pruned_transducer_stateless2/train_test.py", line 977, in scan_pessimistic_batches_for_oom
    loss.backward()
  File "/opt/conda/lib/python3.7/site-packages/torch/_tensor.py", line 307, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/opt/conda/lib/python3.7/site-packages/torch/autograd/__init__.py", line 156, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: CUDA error: invalid configuration argument
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

I'm also using Tesla V100-32GB

csukuangfj · 2022-05-20T00:39:57Z

@csukuangfj I think we can print out the expection error message here, even if it is not an OOM error.

It should be printed by Python, i.e., the one at the end of the logs:

RuntimeError: CUDA error: invalid configuration argument
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

csukuangfj · 2022-05-20T00:44:20Z

@wgb14 I can reproduce your error with torch 1.10.0 + CUDA 11.1.

Here is the log:

2022-05-20 08:40:40,258 INFO [train.py:782] Training started
2022-05-20 08:40:40,274 INFO [train.py:792] Device: cuda:0
2022-05-20 08:40:40,301 INFO [train.py:801] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 500, 'reset_interval': 2000, 'valid_interval': 20000, 'feature_dim': 80, 'subsampling_factor': 4, 'encoder_dim': 512, 'nhead': 8, 'dim_feedforward': 2048, 'num_encoder_layers': 12, 'decoder_dim': 512, 'joiner_dim': 512, 'model_warm_step': 20000, 'env_info': {'k2-version': '1.15.1', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'ecfe7bd6d9189964bf3ff043038918d889a43185', 'k2-git-date': 'Tue May 10 10:57:55 2022', 'lhotse-version': '1.1.0.dev+missing.version.file', 'torch-version': '1.10.0+cu111', 'torch-cuda-available': True, 'torch-cuda-version': '11.1', 'python-version': '3.7', 'icefall-git-branch': 'master', 'icefall-git-sha1': '2900ed8-dirty', 'icefall-git-date': 'Thu May 19 12:51:07
2022', 'icefall-path': '/ceph-fj/fangjun/open-source-2/icefall-master-3', 'k2-path': '/ceph-fj/fangjun/open-source-2/k2-master/k2/python/k2/__init__.py', 'lhotse-path': '/ceph-fj/fangjun/open-source-2/lhotse-master/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-7-0309102938-68688b4cbd-xhtcg', 'IP address': '10.48.32.137'}, 'world_size': 1, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 30, 'start_epoch': 0, 'start_batch': 0, 'exp_dir': PosixPath('pruned_transducer_stateless2/exp'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'initial_lr': 0.003, 'lr_batches': 5000, 'lr_epochs': 6, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'seed': 42, 'print_diagnostics': False, 'save_every_n': 8000, 'keep_last_k': 20, 'use_fp16': True, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 200.0, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'subset': 'XL', 'small_dev': False, 'blank_id': 0, 'vocab_size': 500}
2022-05-20 08:40:40,301 INFO [train.py:803] About to create model
2022-05-20 08:40:40,744 INFO [train.py:807] Number of model parameters: 78648040
Traceback (most recent call last):
  File "./pruned_transducer_stateless2/train.py", line 992, in <module>
    main()
  File "./pruned_transducer_stateless2/train.py", line 985, in main
    run(rank=0, world_size=1, args=args)
  File "./pruned_transducer_stateless2/train.py", line 847, in run
    loss.backward()
  File "/ceph-fj/fangjun/py37/lib/python3.7/site-packages/torch/_tensor.py", line 307, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/ceph-fj/fangjun/py37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 156, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: CUDA error: invalid configuration argument
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Note both --use-fp16=1 and --use-fp16=0 throw the same error.

I would suggest you to switch to torch 1.10.0 + CUDA 10.2

csukuangfj · 2022-05-20T00:47:26Z

I find that both @desh2608 and @ahazned are also using CUDA 11.1. Probably the issue is caused by CUDA 11.1. Switching to CUDA 10.2 may fix the issue, I think.

danpovey · 2022-05-20T02:09:15Z

@csukuangfj since you can repro the issue, perhaps you could try running in cuda-gdb?
This could be caused by asking for too many threads or something like that, which could potentially be in our code (but also could be in Torch).

danpovey · 2022-05-20T02:10:22Z

... catch throw in cuda-gdb might show where in the C++ it's failing. (if this is even necessary).
By default the exception gets caught by Python and printed out there, and it's that stack trace that we see.

csukuangfj · 2022-05-20T02:20:03Z

@csukuangfj since you can repro the issue, perhaps you could try running in cuda-gdb? This could be caused by asking for too many threads or something like that, which could potentially be in our code (but also could be in Torch).

Yes, I am trying it.

csukuangfj · 2022-05-20T02:25:17Z

Output of the following command:

cuda-gdb --args python3 ./pruned_transducer_stateless2/train.py

NVIDIA (R) CUDA Debugger
11.1 release
Portions Copyright (C) 2007-2020 NVIDIA Corporation
GNU gdb (GDB) 8.3.1
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python3...
(cuda-gdb) catch throw
Catchpoint 1 (throw)
(cuda-gdb) r
Starting program: /ceph-fj/fangjun/py37/bin/python3 ./pruned_transducer_stateless2/train.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[Detaching after fork from child process 1354964]
[New Thread 0x7ffef1dad700 (LWP 1354967)]
[New Thread 0x7ffef15ac700 (LWP 1354968)]
.... omit [New Thread xxx] here ....
[Thread 0x7ffe962fd700 (LWP 1355056) exited]
[Thread 0x7ffe95afc700 (LWP 1355057) exited]
... omit [Thread xxx exited] here
2022-05-20 10:17:36,560 INFO [train.py:782] Training started
[New Thread 0x7ffec6d9b700 (LWP 1355068)]
2022-05-20 10:17:36,563 INFO [train.py:792] Device: cuda:0
2022-05-20 10:17:36,566 INFO [train.py:801] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_i
dx_train': 0, 'log_interval': 500, 'reset_interval': 2000, 'valid_interval': 20000, 'feature_dim': 80, 'subsampling_factor': 4, 'encoder_dim': 512, '
nhead': 8, 'dim_feedforward': 2048, 'num_encoder_layers': 12, 'decoder_dim': 512, 'joiner_dim': 512, 'model_warm_step': 20000, 'env_info': {'k2-versi
on': '1.15.1', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'ecfe7bd6d9189964bf3ff043038918d889a43185', 'k2-git-date': 'Tue May 1
0 10:57:55 2022', 'lhotse-version': '1.1.0.dev+missing.version.file', 'torch-version': '1.10.0+cu111', 'torch-cuda-available': True, 'torch-cuda-vers
ion': '11.1', 'python-version': '3.7', 'icefall-git-branch': 'master', 'icefall-git-sha1': '2900ed8-dirty', 'icefall-git-date': 'Thu May 19 12:51:07
2022', 'icefall-path': '/ceph-fj/fangjun/open-source-2/icefall-master-3', 'k2-path': '/ceph-fj/fangjun/open-source-2/k2-master/k2/python/k2/__init__.
py', 'lhotse-path': '/ceph-fj/fangjun/open-source-2/lhotse-master/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-2-0307200233-b554c565c-lf9qd',
'IP address': '10.177.74.201'}, 'world_size': 1, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 30, 'start_epoch': 0, 'start_batch': 0, 'ex
p_dir': PosixPath('pruned_transducer_stateless2/exp'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'initial_lr': 0.003, 'lr_batches': 5000, 'lr_epoch
s': 6, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'seed': 42, 'print_diagnostics': False, 'sav
e_every_n': 8000, 'keep_last_k': 20, 'use_fp16': False, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 200.0, 'bucketing_sampler': True, 'n
um_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'return_cuts': True, 'num
_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'subset': 'XL', 'small_dev': False, 'blank_id': 0, 'voc
ab_size': 500}
2022-05-20 10:17:36,566 INFO [train.py:803] About to create model
2022-05-20 10:17:37,034 INFO [train.py:807] Number of model parameters: 78648040
[New Thread 0x7ffec959c700 (LWP 1355069)]

[New Thread 0x7ffecbd9d700 (LWP 1355071)]
warning: Cuda API error detected: cudaLaunchKernel returned (0x1)

warning: Cuda API error detected: cudaPeekAtLastError returned (0x1)

warning: Cuda API error detected: cudaPeekAtLastError returned (0x1)

warning: Cuda API error detected: cudaGetLastError returned (0x1)

warning: Cuda API error detected: cudaLaunchKernel returned (0x9)

warning: Cuda API error detected: cudaGetLastError returned (0x9)

[Switching to Thread 0x7ffecbd9d700 (LWP 1355071)]

Thread 95 "python3" hit Catchpoint 1 (exception thrown), 0x00007ffff1ce2d1d in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
(cuda-gdb)
(cuda-gdb) bt
#0  0x00007ffff1ce2d1d in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1  0x00007ffef68b25eb in at::native::embedding_backward_cuda_kernel(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, long
, int, bool, at::Tensor const&, at::Tensor const&, at::Tensor const&) ()
   from /ceph-fj/fangjun/py37/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so
#2  0x00007ffef688be07 in at::native::embedding_dense_backward_cuda(at::Tensor const&, at::Tensor const&, long, long, bool) ()
   from /ceph-fj/fangjun/py37/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so
#3  0x00007ffef7a9f269 in at::(anonymous namespace)::(anonymous namespace)::wrapper__embedding_dense_backward(at::Tensor const&, at::Tensor const&, l
ong, long, bool) () from /ceph-fj/fangjun/py37/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so
#4  0x00007ffef7a9f2bd in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Ten
sor (at::Tensor const&, at::Tensor const&, long, long, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper__embedding_dense_backward>,
at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, long, long, bool> >, at::Tensor (at::Tensor const&, at::Tensor const&
, long, long, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, long, long, bool) ()
   from /ceph-fj/fangjun/py37/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so
#5  0x00007fff475fc75c in at::_ops::embedding_dense_backward::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, long, long, bool)
 () from /ceph-fj/fangjun/py37/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so
#6  0x00007fff48f71375 in torch::autograd::VariableType::(anonymous namespace)::embedding_dense_backward(c10::DispatchKeySet, at::Tensor const&, at::
Tensor const&, long, long, bool) () from /ceph-fj/fangjun/py37/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so
#7  0x00007fff48f71914 in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Ten
sor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, long, long, bool), &torch::autograd::VariableType::(anonymous namespace)::embedding_d
ense_backward>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, long, long, bool> >, at::Tensor
(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, long, long, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at
::Tensor const&, long, long, bool) () from /ceph-fj/fangjun/py37/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so
#8  0x00007fff4764f2f5 in at::_ops::embedding_dense_backward::call(at::Tensor const&, at::Tensor const&, long, long, bool) ()
   from /ceph-fj/fangjun/py37/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so
#9  0x00007fff471525b4 in at::native::embedding_backward(at::Tensor const&, at::Tensor const&, long, long, bool, bool) ()
   from /ceph-fj/fangjun/py37/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so
#10 0x00007fff47bd4f57 in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Ten
sor (at::Tensor const&, at::Tensor const&, long, long, bool, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper__embedding_backward>,
at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, long, long, bool, bool> >, at::Tensor (at::Tensor const&, at::Tensor
const&, long, long, bool, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, long, long, bool, bool) ()
--Type <RET> for more, q to quit, c to continue without paging--
  om /ceph-fj/fangjun/py37/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so
#11 0x00007fff4764ec48 in at::_ops::embedding_backward::call(at::Tensor const&, at::Tensor const&, long, long, bool, bool) ()
   from /ceph-fj/fangjun/py37/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so
#12 0x00007fff48ed3c71 in torch::autograd::generated::EmbeddingBackward0::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) ()
   from /ceph-fj/fangjun/py37/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so
#13 0x00007fff495bcdc7 in torch::autograd::Node::operator()(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) ()
   from /ceph-fj/fangjun/py37/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so
#14 0x00007fff495b802b in torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::aut
ograd::InputBuffer&, std::shared_ptr<torch::autograd::ReadyQueue> const&) ()
   from /ceph-fj/fangjun/py37/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so
#15 0x00007fff495b8d5a in torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&) ()
   from /ceph-fj/fangjun/py37/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so
#16 0x00007fff495b0779 in torch::autograd::Engine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) ()
   from /ceph-fj/fangjun/py37/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so
#17 0x00007ffff0c72963 in torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) ()
   from /ceph-fj/fangjun/py37/lib/python3.7/site-packages/torch/lib/libtorch_python.so
#18 0x00007ffff1d0d6df in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#19 0x00007ffff7bbb6db in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#20 0x00007ffff713f71f in clone () from /lib/x86_64-linux-gnu/libc.so.6
(cuda-gdb)

Looks like the error is from PyTorch, not k2.

yuekaizhang · 2022-07-07T13:06:52Z

I encountered the same issue with Desh with torch1.7 k2 1.15, cuda 11.6. Update to torch 1.11 and install latest k2 from source fixed it.

desh2608 mentioned this issue May 2, 2022

SPGISpeech recipe #334

Merged

csukuangfj mentioned this issue May 5, 2022

Save batch to disk on OOM. #343

Merged

zzzacwork mentioned this issue Jun 23, 2022

ValueError: Specified device cuda:0 does not match device of data cuda:-2 #423

Closed

csukuangfj mentioned this issue Aug 5, 2022

OOM in rnn-t training #521

Closed

csukuangfj mentioned this issue Nov 23, 2022

RuntimeError: CUDA error: invalid configuration argument #702

Closed

Illegal memory error when training with multi-GPU #247

Illegal memory error when training with multi-GPU #247

Comments

desh2608 commented Mar 11, 2022

desh2608 commented Mar 11, 2022

danpovey commented Mar 12, 2022

desh2608 commented Mar 23, 2022

csukuangfj commented Mar 24, 2022

ahazned commented Apr 13, 2022 • edited Loading

danpovey commented Apr 13, 2022

ahazned commented Apr 13, 2022 • edited Loading

danpovey commented Apr 13, 2022 via email

ahazned commented Apr 13, 2022

danpovey commented Apr 13, 2022

danpovey commented Apr 13, 2022

csukuangfj commented Apr 13, 2022

desh2608 commented Apr 14, 2022 • edited Loading

danpovey commented Apr 15, 2022 • edited Loading

ahazned commented Apr 15, 2022 • edited Loading

danpovey commented Apr 15, 2022

ahazned commented Apr 15, 2022

csukuangfj commented May 3, 2022

desh2608 commented May 3, 2022

danpovey commented May 4, 2022

desh2608 commented May 4, 2022

danpovey commented May 5, 2022

wgb14 commented May 19, 2022 • edited Loading

csukuangfj commented May 19, 2022

pkufool commented May 19, 2022

csukuangfj commented May 19, 2022

pkufool commented May 19, 2022

pkufool commented May 19, 2022

wgb14 commented May 19, 2022

pkufool commented May 20, 2022 • edited Loading

wgb14 commented May 20, 2022

csukuangfj commented May 20, 2022

csukuangfj commented May 20, 2022

csukuangfj commented May 20, 2022

danpovey commented May 20, 2022

danpovey commented May 20, 2022

csukuangfj commented May 20, 2022

csukuangfj commented May 20, 2022

yuekaizhang commented Jul 7, 2022

ahazned commented Apr 13, 2022 •

edited

Loading

ahazned commented Apr 13, 2022 •

edited

Loading

desh2608 commented Apr 14, 2022 •

edited

Loading

danpovey commented Apr 15, 2022 •

edited

Loading

ahazned commented Apr 15, 2022 •

edited

Loading

wgb14 commented May 19, 2022 •

edited

Loading

pkufool commented May 20, 2022 •

edited

Loading