Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensor.to("cuda:1") core dumped #7095

Closed
Flowingsun007 opened this issue Dec 23, 2021 · 2 comments · Fixed by #7159
Closed

Tensor.to("cuda:1") core dumped #7095

Flowingsun007 opened this issue Dec 23, 2021 · 2 comments · Fixed by #7159
Assignees
Labels

Comments

@Flowingsun007
Copy link
Contributor

Flowingsun007 commented Dec 23, 2021

Summary

When i try to move a tensor from device 0 to device 1 than print it,i found it bug,
tensor.to("cuda") is normal,tensor.to("cuda:1") or "cuda:x" will bug,
and the bug only in print(tensor),when print(tensor.numpy()) than everything is ok!

Code to reproduce bug

>>> import torch
>>> x = torch.tensor([[1., 2.], [3., 4.]])
>>> x
tensor([[1., 2.],
        [3., 4.]])
>>> x.to("cuda:1")
tensor([[1., 2.],
        [3., 4.]], device='cuda:1')
>>> import oneflow as flow
>>> x = flow.tensor([[1., 2.], [3., 4.]])
>>> x.to("cuda")
tensor([[1., 2.],
        [3., 4.]], device='cuda:0', dtype=oneflow.float32)
>>> x.to("cuda:1")
F1226 16:15:23.055996 876333 memcpy.cpp:38] Check failed: cudaMemcpyAsync(dst, src, count, cudaMemcpyDefault, cuda_stream->cuda_stream()) : an illegal memory access was encountered (700) 
*** Check failure stack trace: ***
    @     0x7f263d8929a0  google::LogMessage::Fail()
    @     0x7f263d8928db  google::LogMessage::SendToLog()
    @     0x7f263d89220c  google::LogMessage::Flush()
    @     0x7f263d89581a  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f263463cb14  oneflow::ep::primitive::(anonymous namespace)::MemcpyImpl::Launch()
    @     0x7f263579c855  oneflow::AutoMemcpy()
    @     0x7f263579ce29  oneflow::SyncAutoMemcpy()
    @     0x7f26412c6644  oneflow::OfBlob::AutoMemCopyTo<>()
    @     0x7f26412ae29e  oneflow::BlobBufferCopyUtil<>::To()
    @     0x7f26412aa4df  oneflow::BlobNumpyCopyUtil<>::To()
    @     0x7f26412b9298  _ZZZN7oneflow3one33CopyBetweenMirroredTensorAndNumpyIiEENS_5MaybeIvvEERKSt10shared_ptrINS0_6TensorEEP7_objectPFS3_mRKNS_13NumPyArrayPtrEERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEbENKUlmE_clEmENKUlPKcE_clESQ_
    @     0x7f26412b91ff  _ZZN7oneflow3one33CopyBetweenMirroredTensorAndNumpyIiEENS_5MaybeIvvEERKSt10shared_ptrINS0_6TensorEEP7_objectPFS3_mRKNS_13NumPyArrayPtrEERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEbENKUlmE_clEm
    @     0x7f26412f1dfd  _ZNSt17_Function_handlerIFvmEZN7oneflow3one33CopyBetweenMirroredTensorAndNumpyIiEENS1_5MaybeIvvEERKSt10shared_ptrINS2_6TensorEEP7_objectPFS5_mRKNS1_13NumPyArrayPtrEERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEbEUlmE_E9_M_invokeERKSt9_Any_dataOm
    @     0x7f26332b2019  std::function<>::operator()()
    @     0x7f263493eae9  _ZZN7oneflow19InstructionsBuilder24SyncAccessBlobByCallbackISt10shared_ptrINS_3one14MirroredTensorEEEENS_5MaybeIvvEET_RKS2_INS_11SpinCounterEES2_ISt8functionIFvmEEERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEENKUlmE_clEm
    @     0x7f263495ae04  _ZNSt17_Function_handlerIFvmEZN7oneflow19InstructionsBuilder24SyncAccessBlobByCallbackISt10shared_ptrINS1_3one14MirroredTensorEEEENS1_5MaybeIvvEET_RKS4_INS1_11SpinCounterEES4_ISt8functionIS0_EERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEUlmE_E9_M_invokeERKSt9_Any_dataOm
    @     0x7f26332b2019  std::function<>::operator()()
    @     0x7f26332ad04d  oneflow::vm::AccessBlobByCallbackInstructionType::Compute()
    @     0x7f2636a715d3  oneflow::vm::CudaStreamType::Compute()
    @     0x7f26332adf08  oneflow::vm::StreamType::Compute()
    @     0x7f2636a9e3ff  oneflow::vm::StreamType::Run()
    @     0x7f2636ac7bad  oneflow::vm::VirtualMachineEngine::DispatchInstruction()
    @     0x7f2636ac7801  oneflow::vm::VirtualMachineEngine::DispatchAndPrescheduleInstructions()
    @     0x7f2636ac9818  oneflow::vm::VirtualMachineEngine::Schedule()
    @     0x7f2636ab3fa4  oneflow::VirtualMachine::Loop()
    @     0x7f2636ac253f  _ZSt13__invoke_implIvMN7oneflow14VirtualMachineEFvRKSt8functionIFvvEEEPS1_JS4_EET_St21__invoke_memfun_derefOT0_OT1_DpOT2_
    @     0x7f2636ac22f5  _ZSt8__invokeIMN7oneflow14VirtualMachineEFvRKSt8functionIFvvEEEJPS1_S4_EENSt15__invoke_resultIT_JDpT0_EE4typeEOSB_DpOSC_
    @     0x7f2636ac2101  _ZNSt6thread8_InvokerISt5tupleIJMN7oneflow14VirtualMachineEFvRKSt8functionIFvvEEEPS3_S6_EEE9_M_invokeIJLm0ELm1ELm2EEEEvSt12_Index_tupleIJXspT_EEE
    @     0x7f2636ac2019  _ZNSt6thread8_InvokerISt5tupleIJMN7oneflow14VirtualMachineEFvRKSt8functionIFvvEEEPS3_S6_EEEclEv
    @     0x7f2636ac1fa8  _ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJMN7oneflow14VirtualMachineEFvRKSt8functionIFvvEEEPS4_S7_EEEEE6_M_runEv
    @     0x7f261abb7de4  (unknown)
    @     0x7f2642dad609  start_thread
Aborted (core dumped)

After trying this commit:#5783 , it most likely tensor print function has some bug

Same code can reproduce it

import oneflow as flow

x1 = flow.tensor([[1., 2.], [3., 4.]], device="cpu")
print("x1 >>>>>>>>>>>> \n", x1)
x2 = x1.to("cuda:1")
print("x2.numpy() >>>>>>>> \n", x2.numpy())
print("x2 >>>>>>>>>>> \n", x2)

output

x1 >>>>>>>>>>>> 
 tensor([[1., 2.],
        [3., 4.]], dtype=oneflow.float32)
x2.numpy() >>>>>>>> 
 [[1. 2.]
 [3. 4.]]
x2 >>>>>>>>>>>
F1226 15:53:55.720588 2838954 new_kernel_util.cu:24] Check failed: cudaMemcpyAsync(dst, src, sz, cudaMemcpyDefault, ctx->cuda_stream()) : an illegal memory access was encountered (700) 
*** Check failure stack trace: ***
    @     0x7fd092f00c76  google::LogMessage::Fail()
    @     0x7fd092f00bb1  google::LogMessage::SendToLog()
    @     0x7fd092f004e2  google::LogMessage::Flush()
    @     0x7fd092f03ad0  google::LogMessageFatal::~LogMessageFatal()
    @     0x7fd08b3b3344  oneflow::Memcpy<>()
    @     0x7fd08d526127  oneflow::AutoMemcpy()
    @     0x7fd08d526184  oneflow::SyncAutoMemcpy()
    @     0x7fd08b18aa82  oneflow::OfBlob::AutoMemCopyTo<>()
    @     0x7fd08b1842ed  _ZZN7oneflow19OfBlob_CopyToBufferIfEEvmN8pybind117array_tIT_Li16EEEENKUlvE_clEv
    @     0x7fd08b194442  _ZNSt17_Function_handlerIFvvEZN7oneflow19OfBlob_CopyToBufferIfEEvmN8pybind117array_tIT_Li16EEEEUlvE_E9_M_invokeERKSt9_Any_data
    @     0x7fd08afc34e8  std::function<>::operator()()
    @     0x7fd08b22c7dc  oneflow::GILForeignLockHelper::WithScopedAcquire()
    @     0x7fd08b1843b0  oneflow::OfBlob_CopyToBuffer<>()
    @     0x7fd08b16e595  _ZZZN7oneflow3one12_GLOBAL__N_133CopyBetweenMirroredTensorAndNumpyIfEENS_5MaybeIvvEERKSt10shared_ptrINS0_6TensorEEN8pybind117array_tIT_Li16EEEPFvmSD_ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEENKUlPNS_19InstructionsBuilderEE_clESP_ENKUlmE_clEm
    @     0x7fd08b17999b  _ZNSt17_Function_handlerIFvmEZZN7oneflow3one12_GLOBAL__N_133CopyBetweenMirroredTensorAndNumpyIfEENS1_5MaybeIvvEERKSt10shared_ptrINS2_6TensorEEN8pybind117array_tIT_Li16EEEPFvmSF_ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEENKUlPNS1_19InstructionsBuilderEE_clESR_EUlmE_E9_M_invokeERKSt9_Any_dataOm
    @     0x7fd08c7fdfb1  std::function<>::operator()()
    @     0x7fd08c7f9d8a  oneflow::vm::AccessBlobByCallbackInstructionType::Compute()
    @     0x7fd08d8756ad  oneflow::vm::CudaStreamType::Compute()
    @     0x7fd08c7faee0  oneflow::vm::StreamType::Compute()
    @     0x7fd08d8b4f6d  oneflow::vm::StreamType::Run()
    @     0x7fd08d8c9d55  oneflow::vm::VirtualMachine::DispatchAndPrescheduleInstructions()
    @     0x7fd08d8caee9  oneflow::vm::VirtualMachine::Schedule()
    @     0x7fd08d89b615  oneflow::OneflowVM::Loop()
    @     0x7fd08d8a75ba  _ZSt13__invoke_implIvMN7oneflow9OneflowVMEFvvEPS1_JEET_St21__invoke_memfun_derefOT0_OT1_DpOT2_
    @     0x7fd08d8a7407  _ZSt8__invokeIMN7oneflow9OneflowVMEFvvEJPS1_EENSt15__invoke_resultIT_JDpT0_EE4typeEOS6_DpOS7_
    @     0x7fd08d8a7293  _ZNSt6thread8_InvokerISt5tupleIJMN7oneflow9OneflowVMEFvvEPS3_EEE9_M_invokeIJLm0ELm1EEEEvSt12_Index_tupleIJXspT_EEE
    @     0x7fd08d8a71e5  _ZNSt6thread8_InvokerISt5tupleIJMN7oneflow9OneflowVMEFvvEPS3_EEEclEv
    @     0x7fd08d8a7174  _ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJMN7oneflow9OneflowVMEFvvEPS4_EEEEE6_M_runEv
    @     0x7fd02490fde4  (unknown)
    @     0x7fd09a38b609  start_thread
    @     0x7fd09a2b2293  clone
    @              (nil)  (unknown)
Aborted (core dumped)

System Information

  • What is your OneFlow installation (pip, source, dockerhub):
  • OS: ubuntu
  • OneFlow version (run python3 -m oneflow --doctor):
version: 0.6.0+cu112.git.b4da856819
git_commit: b4da856819
cmake_build_type: Debug
rdma: False
mlir: False
  • Python version: 3.8.8
  • CUDA driver version: Driver Version: 495.44
@liufengwei0103
Copy link
Contributor

liufengwei0103 commented Dec 27, 2021

'Slice method' will fail after tensor is sent to "cuda:1" by 'to method', tensor_str needs 'slice method', so looks like failure comes from tensor_str.
We can reproduce it by below code.

import oneflow as flow
x = flow.tensor([1, 2], dtype=flow.int32)
y = x.to("cuda:1")
y[0]

@Flowingsun007

@liufengwei0103
Copy link
Contributor

liufengwei0103 commented Dec 29, 2021

import oneflow as flow
x = flow.tensor([1, 2], dtype=flow.int64)
y = x.to(device="cuda:1", dtype=flow.int32, copy=False)

run above code, 'to' method self will fail. it looks like 'to' method has bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants