-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Illegal memory error when training with multi-GPU #247
Comments
The error went away on reducing |
Hm. It might be worthwhile trying to debug that a bit, e.g. see if you can do |
I get the same error even after adding |
You can use the steps in #142 (comment) |
Hi, any updates on this issue? I also get the same error on both single-gpu and multi-gpu setups unless I decrease "--max-duration" to 50. I've also tried K2_SYNC_KERNELS=1 and CUDA_LAUNCH_BLOCKING=1 but the problem continues. |
How up-to-date is your code? We haven't seen this type of error for a while on our end. |
Hi Dan, I cloned Icefall yesterday, my branch is up to date with 'origin/master' and k2 details are below. By the way I'm trying egs/librispeech/ASR/pruned_transducer_stateless2/train.py on Librispeech 100 hours.
Here is what I got:
|
Try doing
export K2_SYNC_KERNELS=1
and rerunning
…On Wed, Apr 13, 2022 at 7:51 PM ahazned ***@***.***> wrote:
Hi Dan,
I cloned Icefall yesterday and my branch is up to date with
'origin/master' and k2 details are below. By the way I'm trying
egs/librispeech/ASR/pruned_transducer_stateless2/train.py on Librispeech
100 hours.
/tmp/icefall$ git status
On branch master
Your branch is up to date with 'origin/master'.
python3 -m k2.version
Collecting environment information...
k2 version: 1.14
Build type: Release
Git SHA1: 6833270cb228aba7bf9681fccd41e2b52f7d984c
Git date: Wed Mar 16 03:16:05 2022
Cuda used to build k2: 11.1
cuDNN used to build k2: 8.0.4
Python version used to build k2: 3.8
OS used to build k2: Ubuntu 18.04.6 LTS
CMake version: 3.18.4
GCC version: 7.5.0
CMAKE_CUDA_FLAGS: --expt-extended-lambda -gencode
arch=compute_35,code=sm_35 --expt-extended-lambda -gencode
arch=compute_50,code=sm_50 --expt-extended-lambda -gencode
arch=compute_60,code=sm_60 --expt-extended-lambda -gencode
arch=compute_61,code=sm_61 --expt-extended-lambda -gencode
arch=compute_70,code=sm_70 --expt-extended-lambda -gencode
arch=compute_75,code=sm_75 --expt-extended-lambda -gencode
arch=compute_80,code=sm_80 --expt-extended-lambda -gencode
arch=compute_86,code=sm_86 -D_GLIBCXX_USE_CXX11_ABI=0 --compiler-options
-Wall --compiler-options -Wno-unknown-pragmas --compiler-options
-Wno-strict-overflow
CMAKE_CXX_FLAGS: -D_GLIBCXX_USE_CXX11_ABI=0 -Wno-strict-overflow
PyTorch version used to build k2: 1.8.1
PyTorch is using Cuda: 11.1
NVTX enabled: True
With CUDA: True
Disable debug: True
Sync kernels : False
Disable checks: False
Here is what I got:
python3 pruned_transducer_stateless2/train.py
--exp-dir=pruned_transducer_stateless2/exp_100h_ws1 --world-size 2
--num-epochs 40 --full-libri 0 --max-duration 300
/tmp/miniconda3/envs/k2/lib/python3.8/site-packages/lhotse/dataset/sampling/bucketing.py:96:
UserWarning: Lazy CutSet detected in BucketingSampler: we will read it into
memory anyway. Please use lhotse.dataset.DynamicBucketingSampler instead.
warnings.warn(
/tmp/miniconda3/envs/k2/lib/python3.8/site-packages/lhotse/dataset/sampling/bucketing.py:96:
UserWarning: Lazy CutSet detected in BucketingSampler: we will read it into
memory anyway. Please use lhotse.dataset.DynamicBucketingSampler instead.
warnings.warn(
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: an illegal memory access was encountered
Exception raised from create_event_internal at
/opt/conda/conda-bld/pytorch_1616554793803/work/c10/cuda/CUDACachingAllocator.cpp:733
(most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42
(0x7f0c4b9b82f2 in
/tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1 <#1>:
c10::detail::torchCheckFail(char const*, char const*, unsigned int,
std::string const&) + 0x5b (0x7f0c4b9b567b in
/tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #2 <#2>:
c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x809 (0x7f0c4bc11219
in
/tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #3 <#3>:
c10::TensorImpl::release_resources() + 0x54 (0x7f0c4b9a03a4 in
/tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #4 <#4>: + 0x6e0e5a
(0x7f0ca2916e5a in
/tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #5 <#5>: + 0x6e0ef1
(0x7f0ca2916ef1 in
/tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #6 <#6>: + 0x1a974a
(0x5568edb6a74a in /tmp/miniconda3/envs/k2/bin/python3)
frame #7 <#7>: + 0x10f660
(0x5568edad0660 in /tmp/miniconda3/envs/k2/bin/python3)
frame #8 <#8>: + 0x10f660
(0x5568edad0660 in /tmp/miniconda3/envs/k2/bin/python3)
frame #9 <#9>: + 0x10faf5
(0x5568edad0af5 in /tmp/miniconda3/envs/k2/bin/python3)
frame #10 <#10>: + 0x1a9727
(0x5568edb6a727 in /tmp/miniconda3/envs/k2/bin/python3)
frame #11 <#11>: + 0x110632
(0x5568edad1632 in /tmp/miniconda3/envs/k2/bin/python3)
frame #12 <#12>: + 0x110059
(0x5568edad1059 in /tmp/miniconda3/envs/k2/bin/python3)
frame #13 <#13>: + 0x110043
(0x5568edad1043 in /tmp/miniconda3/envs/k2/bin/python3)
frame #14 <#14>: + 0x112f68
(0x5568edad3f68 in /tmp/miniconda3/envs/k2/bin/python3)
frame #15 <#15>: + 0x1104af
(0x5568edad14af in /tmp/miniconda3/envs/k2/bin/python3)
frame #16 <#16>: + 0x1fe1f3
(0x5568edbbf1f3 in /tmp/miniconda3/envs/k2/bin/python3)
frame #17 <#17>:
_PyEval_EvalFrameDefault + 0x2681 (0x5568edb9a021 in
/tmp/miniconda3/envs/k2/bin/python3)
frame #18 <#18>:
_PyEval_EvalCodeWithName + 0x260 (0x5568edb8d600 in
/tmp/miniconda3/envs/k2/bin/python3)
frame #19 <#19>:
_PyFunction_Vectorcall + 0x534 (0x5568edb8eb64 in
/tmp/miniconda3/envs/k2/bin/python3)
frame #20 <#20>:
_PyEval_EvalFrameDefault + 0x4c0 (0x5568edb97e60 in
/tmp/miniconda3/envs/k2/bin/python3)
frame #21 <#21>:
_PyFunction_Vectorcall + 0x1b7 (0x5568edb8e7e7 in
/tmp/miniconda3/envs/k2/bin/python3)
frame #22 <#22>:
_PyEval_EvalFrameDefault + 0x71b (0x5568edb980bb in
/tmp/miniconda3/envs/k2/bin/python3)
frame #23 <#23>:
_PyEval_EvalCodeWithName + 0x260 (0x5568edb8d600 in
/tmp/miniconda3/envs/k2/bin/python3)
frame #24 <#24>:
_PyFunction_Vectorcall + 0x594 (0x5568edb8ebc4 in
/tmp/miniconda3/envs/k2/bin/python3)
frame #25 <#25>:
_PyEval_EvalFrameDefault + 0x1510 (0x5568edb98eb0 in
/tmp/miniconda3/envs/k2/bin/python3)
frame #26 <#26>:
_PyEval_EvalCodeWithName + 0x260 (0x5568edb8d600 in
/tmp/miniconda3/envs/k2/bin/python3)
frame #27 <#27>: PyEval_EvalCode +
0x23 (0x5568edb8eeb3 in /tmp/miniconda3/envs/k2/bin/python3)
frame #28 <#28>: + 0x242622
(0x5568edc03622 in /tmp/miniconda3/envs/k2/bin/python3)
frame #29 <#29>: + 0x2531d2
(0x5568edc141d2 in /tmp/miniconda3/envs/k2/bin/python3)
frame #30 <#30>: PyRun_StringFlags
+ 0x7a (0x5568edc16e0a in /tmp/miniconda3/envs/k2/bin/python3)
frame #31 <#31>:
PyRun_SimpleStringFlags + 0x3c (0x5568edc16e6c in
/tmp/miniconda3/envs/k2/bin/python3)
frame #32 <#32>: Py_RunMain +
0x15b (0x5568edc177db in /tmp/miniconda3/envs/k2/bin/python3)
frame #33 <#33>: Py_BytesMain +
0x39 (0x5568edc17c29 in /tmp/miniconda3/envs/k2/bin/python3)
frame #34 <#34>:
__libc_start_main + 0xe7 (0x7f0cd469fc87 in /lib/x86_64-linux-gnu/libc.so.6)
frame #35 <#35>: + 0x1f9ad7
(0x5568edbbaad7 in /tmp/miniconda3/envs/k2/bin/python3)
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: an illegal memory access was encountered
Exception raised from create_event_internal at
/opt/conda/conda-bld/pytorch_1616554793803/work/c10/cuda/CUDACachingAllocator.cpp:733
(most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42
(0x7f27956ae2f2 in
/tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1 <#1>:
c10::detail::torchCheckFail(char const*, char const*, unsigned int,
std::string const&) + 0x5b (0x7f27956ab67b in
/tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #2 <#2>:
c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x809 (0x7f2795907219
in
/tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #3 <#3>:
c10::TensorImpl::release_resources() + 0x54 (0x7f27956963a4 in
/tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #4 <#4>: + 0x6e0e5a
(0x7f27ec60ce5a in
/tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #5 <#5>: + 0x6e0ef1
(0x7f27ec60cef1 in
/tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #6 <#6>: + 0x1a974a
(0x55953ec0f74a in /tmp/miniconda3/envs/k2/bin/python3)
frame #7 <#7>: + 0x10f660
(0x55953eb75660 in /tmp/miniconda3/envs/k2/bin/python3)
frame #8 <#8>: + 0x10f660
(0x55953eb75660 in /tmp/miniconda3/envs/k2/bin/python3)
frame #9 <#9>: + 0x10faf5
(0x55953eb75af5 in /tmp/miniconda3/envs/k2/bin/python3)
frame #10 <#10>: + 0x1a9727
(0x55953ec0f727 in /tmp/miniconda3/envs/k2/bin/python3)
frame #11 <#11>: + 0x110632
(0x55953eb76632 in /tmp/miniconda3/envs/k2/bin/python3)
frame #12 <#12>: + 0x110059
(0x55953eb76059 in /tmp/miniconda3/envs/k2/bin/python3)
frame #13 <#13>: + 0x110043
(0x55953eb76043 in /tmp/miniconda3/envs/k2/bin/python3)
frame #14 <#14>: + 0x112f68
(0x55953eb78f68 in /tmp/miniconda3/envs/k2/bin/python3)
frame #15 <#15>: + 0x1104af
(0x55953eb764af in /tmp/miniconda3/envs/k2/bin/python3)
frame #16 <#16>: + 0x1fe1f3
(0x55953ec641f3 in /tmp/miniconda3/envs/k2/bin/python3)
frame #17 <#17>:
_PyEval_EvalFrameDefault + 0x2681 (0x55953ec3f021 in
/tmp/miniconda3/envs/k2/bin/python3)
frame #18 <#18>:
_PyEval_EvalCodeWithName + 0x260 (0x55953ec32600 in
/tmp/miniconda3/envs/k2/bin/python3)
frame #19 <#19>:
_PyFunction_Vectorcall + 0x534 (0x55953ec33b64 in
/tmp/miniconda3/envs/k2/bin/python3)
frame #20 <#20>:
_PyEval_EvalFrameDefault + 0x4c0 (0x55953ec3ce60 in
/tmp/miniconda3/envs/k2/bin/python3)
frame #21 <#21>:
_PyFunction_Vectorcall + 0x1b7 (0x55953ec337e7 in
/tmp/miniconda3/envs/k2/bin/python3)
frame #22 <#22>:
_PyEval_EvalFrameDefault + 0x71b (0x55953ec3d0bb in
/tmp/miniconda3/envs/k2/bin/python3)
frame #23 <#23>:
_PyEval_EvalCodeWithName + 0x260 (0x55953ec32600 in
/tmp/miniconda3/envs/k2/bin/python3)
frame #24 <#24>:
_PyFunction_Vectorcall + 0x594 (0x55953ec33bc4 in
/tmp/miniconda3/envs/k2/bin/python3)
frame #25 <#25>:
_PyEval_EvalFrameDefault + 0x1510 (0x55953ec3deb0 in
/tmp/miniconda3/envs/k2/bin/python3)
frame #26 <#26>:
_PyEval_EvalCodeWithName + 0x260 (0x55953ec32600 in
/tmp/miniconda3/envs/k2/bin/python3)
frame #27 <#27>: PyEval_EvalCode +
0x23 (0x55953ec33eb3 in /tmp/miniconda3/envs/k2/bin/python3)
frame #28 <#28>: + 0x242622
(0x55953eca8622 in /tmp/miniconda3/envs/k2/bin/python3)
frame #29 <#29>: + 0x2531d2
(0x55953ecb91d2 in /tmp/miniconda3/envs/k2/bin/python3)
frame #30 <#30>: PyRun_StringFlags
+ 0x7a (0x55953ecbbe0a in /tmp/miniconda3/envs/k2/bin/python3)
frame #31 <#31>:
PyRun_SimpleStringFlags + 0x3c (0x55953ecbbe6c in
/tmp/miniconda3/envs/k2/bin/python3)
frame #32 <#32>: Py_RunMain +
0x15b (0x55953ecbc7db in /tmp/miniconda3/envs/k2/bin/python3)
frame #33 <#33>: Py_BytesMain +
0x39 (0x55953ecbcc29 in /tmp/miniconda3/envs/k2/bin/python3)
frame #34 <#34>:
__libc_start_main + 0xe7 (0x7f281e395c87 in /lib/x86_64-linux-gnu/libc.so.6)
frame #35 <#35>: + 0x1f9ad7
(0x55953ec5fad7 in /tmp/miniconda3/envs/k2/bin/python3)
Traceback (most recent call last):
File "pruned_transducer_stateless2/train.py", line 997, in
main()
File "pruned_transducer_stateless2/train.py", line 988, in main
mp.spawn(run, args=(world_size, args), nprocs=world_size, join=True)
File
"/tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/multiprocessing/spawn.py",
line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon,
start_method='spawn')
File
"/tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/multiprocessing/spawn.py",
line 188, in start_processes
while not context.join():
File
"/tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/multiprocessing/spawn.py",
line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 1 terminated with the following error:
Traceback (most recent call last):
File
"/tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/multiprocessing/spawn.py",
line 59, in _wrap
fn(i, *args)
File
"/tmp/icefall/egs/librispeech/ASR/pruned_transducer_stateless2/train.py",
line 878, in run
scan_pessimistic_batches_for_oom(
File
"/tmp/icefall/egs/librispeech/ASR/pruned_transducer_stateless2/train.py",
line 964, in scan_pessimistic_batches_for_oom
loss.backward()
File
"/tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/tensor.py", line
245, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph,
inputs=inputs)
File "/tmp/miniconda3/envs/k2/lib/python3.8/site-packages/torch/autograd/
*init*.py", line 145, in backward
Variable._execution_engine.run_backward(
RuntimeError: merge_sort: failed to synchronize: cudaErrorIllegalAddress:
an illegal memory access was encountered
—
Reply to this email directly, view it on GitHub
<#247 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLOYSXDPNCFXGMOCYM5LVE2YMTANCNFSM5QQNFPJQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Thanks. I tried, but unfortunately it doesn't help. |
It's supposed to make it print a more detailed error message, not fix the issue. |
Anyway I think a version of k2 from March 14th is not recent enough to run the pruned_transducer_stateless2 recipe. |
@ahazned |
@csukuangfj I have the most recent versions of k2 and icefall (all tests are passing), but still get this error for larger batch sizes (>100s when training with 4 GPUs with 12G mem each). I am trying to run a pruned_transducer_stateless2 model on SPGISpeech. |
@desh2608 see if you can run the training inside cuda-gdb (but I'm not sure whether cuda-gdb is able to handle multiple training processes, and also whether it will be easy for you to install). If the problem can be reproduced with 1 job that might make it easier. |
I successfully run "pruned_transducer_stateless2/train.py" with "--max-duration=300" when I use a newer K2 (1.14, Git date: Wed Apr 13 00:46:49 2022). I use two GPU's with 24GB mem each. But one interesting thing is that I get different WERs on "egs/yesno/ASR/tdnn/train.py" with different K2/Pytorch/Cuda combinations. Not sure if this is expected.
|
Different PyTorch versions may cause different random-number sequences; and there may be other reasons why they differ slightly. I think this is probably expected. The yesno data set is super tiny, so random noise is a larger factor than normal. |
Ok, thanks Dan. |
|
I think this is fixed now (although I don't know what fixed it). I just updated PyTorch from version 1.8.1 to 1.10.1 and pulled the latest k2 (v1.14), and compiled it from source in debug mode. 09:56 $ python -m k2.version
Collecting environment information...
k2 version: 1.14
Build type: Debug
Git SHA1: 1b29f0a946f50186aaa82df46a59f492ade9692b
Git date: Tue Apr 12 20:46:49 2022
Cuda used to build k2: 11.1
cuDNN used to build k2: 8.0.2
Python version used to build k2: 3.8
OS used to build k2: CentOS Linux release 7.5.1804 (Core)
CMake version: 3.22.1
GCC version: 7.2.0
CMAKE_CUDA_FLAGS: --compiler-options -rdynamic --compiler-options -lineinfo -Wno-deprecated-gpu-targets --expt-extended-lambda -gencode arch=compute_35,code=sm_35 --expt-extended-lambda -gencode arch=compute_50,code=sm_50 --expt-extended-lambda -gencode arch=compute_60,code=sm_60 --expt-extended-lambda -gencode arch=compute_61,code=sm_61 --expt-extended-lambda -gencode arch=compute_70,code=sm_70 --expt-extended-lambda -gencode arch=compute_75,code=sm_75 --expt-extended-lambda -gencode arch=compute_80,code=sm_80 --expt-extended-lambda -gencode arch=compute_86,code=sm_86 -D_GLIBCXX_USE_CXX11_ABI=0 --compiler-options -Wall --compiler-options -Wno-strict-overflow --compiler-options -Wno-unknown-pragmas
CMAKE_CXX_FLAGS: -D_GLIBCXX_USE_CXX11_ABI=0 -Wno-unused-variable -Wno-strict-overflow
PyTorch version used to build k2: 1.10.1+cu111
PyTorch is using Cuda: 11.1
NVTX enabled: True
With CUDA: True
Disable debug: False
Sync kernels : True
Disable checks: False After this upgrade, I am able to train with a batch size of 250s, where earlier I was getting the weird memory issues even with a batch size of 100 (using 8 V100 GPUs). Perhaps there was an issue with PyTorch 1.8.1? It's hard to say. I still get a CUDA error when I try to use batch size 300, but from PyTorch discussion forums, it seems to be related to OOM, although I was hoping it would be caught by
|
@csukuangfj I am thinking we should just make it the default that it prints out some details of the batch (e.g. dimensions and sentence-lengths at least; or perhaps the entire object), when we get an OOM error. This will make things like this easier to debug. HOWEVER, desh, I'm not convinced that this actually is an OOM error. Try doing |
Yeah, I already have the following variables set: export K2_DISABLE_CHECKS=0
export K2_SYNC_KERNELS=1
export CUDA_LAUNCH_BLOCKING=1 but I didn't see any more details in the stack trace. I also printed out the batch when the error happened, but it looked similar to all other batches. I'll try to get it again when my current training ends and share the batch details here. |
OK, thanks. It would be appreciated if you could help us debug this. |
Post the log when I set
And the |
The attached batch works perfectly for me. Here is the change I made to train.py to run it. diff --git a/egs/gigaspeech/ASR/pruned_transducer_stateless2/train.py b/egs/gigaspeech/ASR/pruned_transducer_stateless2/train.py
index 83ae255..b69b6fc 100755
--- a/egs/gigaspeech/ASR/pruned_transducer_stateless2/train.py
+++ b/egs/gigaspeech/ASR/pruned_transducer_stateless2/train.py
@@ -833,6 +833,24 @@ def run(rank, world_size, args):
if params.print_diagnostics:
diagnostic = diagnostics.attach_diagnostics(model)
+ pt_file = "./batch-f2d3f761-0ba3-6279-14cb-056407437c3b.pt"
+ batch = torch.load(pt_file)
+ with torch.cuda.amp.autocast(enabled=params.use_fp16):
+ loss, _ = compute_loss(
+ params=params,
+ model=model,
+ sp=sp,
+ batch=batch,
+ is_training=True,
+ warmup=0.0,
+ )
+ loss.backward()
+ optimizer.step()
+ optimizer.zero_grad()
+ logging.info(f"loss: {loss}")
+
+ return
+
gigaspeech = GigaSpeechAsrDataModule(args)
train_cuts = gigaspeech.train_cuts() The command for training is ./pruned_transducer_stateless2/train.py The output is
|
@wgb14 I think you can first try run with |
I notice that you are using
while I am using torch + CUDA 10.2 + Python 3.8. I will try to switch CUDA 11.1 + Python 3.7 and run it again. |
@csukuangfj I think we can print out the expection error message here, even if it is not an OOM error. icefall/egs/librispeech/ASR/pruned_transducer_stateless2/train.py Lines 1000 to 1011 in 2900ed8
|
BTW, he used mix-precision training. |
The error message in
And in my previous experiments, export CUDA_LAUNCH_BLOCKING=1
export K2_SYNC_KERNELS=1 didn't give me any additional information. |
Does it print out by And, what is your memory size, I think fangjun has a 32GB v100. |
No, this is from logging, after commenting out the line
I'm also using Tesla V100-32GB |
It should be printed by Python, i.e., the one at the end of the logs:
|
@wgb14 I can reproduce your error with torch 1.10.0 + CUDA 11.1. Here is the log:
Note both I would suggest you to switch to torch 1.10.0 + CUDA 10.2 |
@csukuangfj since you can repro the issue, perhaps you could try running in cuda-gdb? |
... |
Yes, I am trying it. |
Output of the following command: cuda-gdb --args python3 ./pruned_transducer_stateless2/train.py
Looks like the error is from PyTorch, not k2. |
I encountered the same issue with Desh with torch1.7 k2 1.15, cuda 11.6. Update to torch 1.11 and install latest k2 from source fixed it. |
I am facing the following error when training with multiple GPUs (on the same node). I am not sure if this is icefall related, but I thought maybe someone has seen it before? (I also tried running with
CUDA_LAUNCH_BLOCKING=1
but got the same error message.When I train on single GPU, it seems to be working fine:
The text was updated successfully, but these errors were encountered: