Skip to content

[DLRM/Pytorch] Cuda error: illegal memory access after changing embedding size to 64 #830

@junshi15

Description

@junshi15

Related to DLRM/Pytorch

Describe the bug
Changed embedding size to 64 (default 128)
Changed the last layer of bottom MLP size to 64 (default 128)
This caused crash as shown below.

Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/opt/conda/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/workspace/dlrm/dlrm/scripts/main.py", line 519, in <module>
    app.run(main)
  File "/opt/conda/lib/python3.6/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/opt/conda/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/workspace/dlrm/dlrm/scripts/main.py", line 264, in main
    train(model, loss_fn, optimizer, data_loader_train, data_loader_test, scaled_lr)
  File "/workspace/dlrm/dlrm/scripts/main.py", line 361, in train
    loss.backward()
  File "/opt/conda/lib/python3.6/site-packages/torch/tensor.py", line 184, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/opt/conda/lib/python3.6/site-packages/torch/autograd/__init__.py", line 123, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`
Exception raised from createCublasHandle at ../aten/src/ATen/cuda/CublasHandlePool.cpp:8 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6b (0x7ff5f440a82b in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x327d0c2 (0x7ff4bbe1c0c2 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #2: at::cuda::getCurrentCUDABlasHandle() + 0xb82 (0x7ff4bbe1d9d2 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #3: <unknown function> + 0x326945f (0x7ff4bbe0845f in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #4: at::native::addmm_out_cuda_impl(at::Tensor&, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar, c10::Scalar) + 0x78e (0x7ff4bacef5ee in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #5: at::native::mm_cuda(at::Tensor const&, at::Tensor const&) + 0x15b (0x7ff4bacf04bb in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #6: <unknown function> + 0x3293808 (0x7ff4bbe32808 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0x330f734 (0x7ff4bbeae734 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #8: <unknown function> + 0x2ba029b (0x7ff537b0b29b in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #9: <unknown function> + 0x7a8224 (0x7ff535713224 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #10: at::Tensor c10::Dispatcher::call<at::Tensor, at::Tensor const&, at::Tensor const&>(c10::OperatorHandle const&, at::Tensor const&, at::Tensor const&) const + 0xc5 (0x7ff5c6f346e5 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #11: <unknown function> + 0x28fe447 (0x7ff537869447 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #12: torch::autograd::generated::AddmmBackward::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) + 0x155 (0x7ff5378aeca5 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #13: <unknown function> + 0x2ee2f75 (0x7ff537e4df75 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #14: torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&, std::shared_ptr<torch::autograd::ReadyQueue> const&) + 0x1808 (0x7ff537e48f68 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #15: torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&, bool) + 0x551 (0x7ff537e49e01 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #16: torch::autograd::Engine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&) + 0xa3 (0x7ff537e3f863 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #17: torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&) + 0x50 (0x7ff5c7236b20 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #18: <unknown function> + 0xbd6df (0x7ff5f4af76df in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #19: <unknown function> + 0x76db (0x7ff5fffcf6db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #20: clone + 0x3f (0x7ff5ffcf888f in /lib/x86_64-linux-gnu/libc.so.6)

terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: an illegal memory access was encountered
Exception raised from create_event_internal at ../c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6b (0x7ff5f440a82b in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0xc10 (0x7ff5f41a5500 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7ff5f43f2c9d in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #3: <unknown function> + 0x59f1e2 (0x7ff5c724b1e2 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #16: __libc_start_main + 0xe7 (0x7ff5ffbf8b97 in /lib/x86_64-linux-gnu/libc.so.6)

Fatal Python error: Aborted

Thread 0x00007ff59fda0700 (most recent call first):

Thread 0x00007ff56b58b700 (most recent call first):

Current thread 0x00007ff6003fc740 (most recent call first):
Aborted

To Reproduce
use the command line:
--embedding_dim 64 --bottom_mlp_sizes 512,256,64

Expected behavior
it should not crash.

Environment
Please provide at least:

  • Container version (e.g. pytorch:20.06-py3):
  • GPUs in the system: (e.g. 1x Tesla V100 32GB):
  • CUDA driver version (e.g. 418.67):

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions