Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error flow on Rocky 8 #5520

Open
bludvigsen opened this issue Aug 9, 2024 · 5 comments
Open

Error flow on Rocky 8 #5520

bludvigsen opened this issue Aug 9, 2024 · 5 comments

Comments

@bludvigsen
Copy link

bludvigsen commented Aug 9, 2024

Hi, I am getting the following error, any ideas? I am using Rocky 8 Linux...

(base) [bjolud@hpcopm01 IVAR_AASEN]$ uname -a
Linux hpcopm01 4.18.0-477.10.1.el8_8.x86_64 #1 SMP Tue May 16 11:38:37 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
================ Starting main simulation loop ===============


Report step  0/819 at day 0/6947, date = 24-Dec-2016
Using Newton nonlinear solver.
Restart file written for report step   0/819, date = 24-Dec-2016 00:00:00

Starting time step 0, stepsize 0.0416667 days, at day 0/0.0416667, date = 24-Dec-2016
/opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/shared_ptr_base.h:1349: std::__shared_ptr_access<_Tp, _Lp, <anonymous>, <anonymous> >::element_type& std::__shared_ptr_access<_Tp, _Lp, <anonymous>, <anonymous> >::operator*() const [with _Tp = Dune::Communication<int>; __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic; bool <anonymous> = false; bool <anonymous> = false; element_type = Dune::Communication<int>]: Assertion '_M_get() != nullptr' failed.
Aborted (core dumped)
@blattms
Copy link
Member

blattms commented Aug 9, 2024

This looks like a serious bug (dereferencing a shared_ptr<Dune::Communication> that does contain a nullptr). Can you give a little more detail:

  • What version of flow?
  • What command line parameters are you using
  • Is this a parallel run (mpirun -np 3 flow or similar)
  • Do you use GPU

To find the problem we will need to be able to replicate this somehow.

Note to other developers and myself: There seems no reason to put Dune::Communication into a shared_ptr. It is a light-weight object that can easily by copied.
Here it even seems to be a serial run because of Dune::Communication<int>.
Places where this might happen:

$ grep -r -n "shared_ptr" opm | grep -i Comm
opm/simulators/linalg/ExtractParallelGridInformationToISTL.cpp:24:#include <dune/common/shared_ptr.hh>
opm/simulators/linalg/ExtractParallelGridInformationToISTL.cpp:40:        anyComm=std::any(Opm::ParallelISTLInformation(Dune::stackobject_to_shared_ptr(idx),
opm/simulators/linalg/ISTLSolver.hpp:627:        std::shared_ptr< CommunicationType > comm_;
opm/simulators/linalg/OwningBlockPreconditioner.hpp:84:std::shared_ptr<OwningBlockPreconditioner<OriginalPreconditioner, Comm>>
opm/simulators/linalg/PressureBhpTransferPolicy.hpp:38:                                     std::shared_ptr<Communication>& commRW,
opm/simulators/linalg/PressureBhpTransferPolicy.hpp:270:    std::shared_ptr<Communication> coarseLevelCommunication_;
opm/simulators/linalg/PressureTransferPolicy.hpp:183:    std::shared_ptr<Communication> coarseLevelCommunication_;
opm/simulators/linalg/WellOperators.hpp:32:#include <dune/common/shared_ptr.hh>
opm/simulators/linalg/WellOperators.hpp:165:                            const std::shared_ptr<communication_type>& comm = {})
opm/simulators/linalg/WellOperators.hpp:224:    std::shared_ptr<communication_type> comm_;
opm/simulators/linalg/WellOperators.hpp:358:        : A_( Dune::stackobject_to_shared_ptr(A) ), comm_(comm)
opm/simulators/linalg/bda/opencl/openclBISAI.cpp:55:          std::shared_ptr<cl::CommandQueue>& queue_)
opm/simulators/linalg/bda/opencl/openclCPR.cpp:26:#include <dune/common/shared_ptr.hh>
opm/simulators/linalg/bda/opencl/openclCPR.cpp:53:setOpencl(std::shared_ptr<cl::Context>& context_, std::shared_ptr<cl::CommandQueue>& queue_) {
opm/simulators/linalg/bda/opencl/openclPreconditioner.cpp:58:          std::shared_ptr<cl::CommandQueue>& queue_)
opm/simulators/linalg/bda/opencl/openclPreconditioner.hpp:36:    std::shared_ptr<cl::CommandQueue> queue;
opm/simulators/linalg/bda/opencl/openclPreconditioner.hpp:50:    virtual void setOpencl(std::shared_ptr<cl::Context>& context, std::shared_ptr<cl::CommandQueue>& queue);
opm/simulators/linalg/bda/opencl/openclSolverBackend.hpp:111:    std::shared_ptr<cl::CommandQueue> queue{};
opm/simulators/linalg/bda/opencl/openclSolverBackend.hpp:156:                   std::shared_ptr<cl::CommandQueue>& queue);
opm/simulators/linalg/bda/opencl/openclBISAI.hpp:114:                   std::shared_ptr<cl::CommandQueue>& queue) override;
opm/simulators/linalg/bda/opencl/openclCPR.hpp:93:                   std::shared_ptr<cl::CommandQueue>& queue) override;
opm/simulators/linalg/bda/opencl/openclSolverBackend.cpp:242:          std::shared_ptr<cl::CommandQueue>& queue_)
opm/simulators/linalg/bda/CprCreation.cpp:26:#include <dune/common/shared_ptr.hh>
opm/simulators/linalg/bda/rocm/rocsparseCPR.cpp:26:#include <dune/common/shared_ptr.hh>
opm/simulators/linalg/cuistl/CuBlockPreconditioner.hpp:22:#include <dune/common/shared_ptr.hh>
opm/simulators/linalg/cuistl/CuBlockPreconditioner.hpp:50:    CuBlockPreconditioner(const std::shared_ptr<P>& p, const std::shared_ptr<const communication_type>& c)
opm/simulators/linalg/cuistl/CuBlockPreconditioner.hpp:56:    CuBlockPreconditioner(const std::shared_ptr<P>& p, const communication_type& c)
opm/simulators/linalg/cuistl/CuBlockPreconditioner.hpp:58:        , m_communication(Dune::stackobject_to_shared_ptr(c))
opm/simulators/linalg/cuistl/CuBlockPreconditioner.hpp:122:    std::shared_ptr<const communication_type> m_communication;
opm/simulators/linalg/cuistl/CuOwnerOverlapCopy.hpp:388:    CuOwnerOverlapCopy(std::shared_ptr<GPUSender<field_type, OwnerOverlapCopyCommunicationType>> sender) : m_sender(sender){}
opm/simulators/linalg/cuistl/CuOwnerOverlapCopy.hpp:410:    std::shared_ptr<GPUSender<field_type, OwnerOverlapCopyCommunicationType>> m_sender;
opm/simulators/linalg/cuistl/SolverAdapter.hpp:186:            std::shared_ptr<Opm::cuistl::GPUSender<real_type, typename Operator::communication_type>> gpuComm;

@bludvigsen
Copy link
Author

Hi, below some more info.

(base) [bjolud@hpcopm01 ~]$ flow --version
flow 2024.04

No mpi, no GPU.

These are the files generated (note the EGRID file was generated by OPM, I had to make GRDECL files as input as GDFILE did not work. I tried to attache the PRT and DBG files to this message but it did not allow me to.

-rw-r--r-- 1 bjolud ecl 45208808 Aug 9 15:02 IVAR_AASEN_2021_REF_3DEC2021.INIT
-rw-r--r-- 1 bjolud ecl 50836016 Aug 9 15:02 IVAR_AASEN_2021_REF_3DEC2021.EGRID
-rw-r--r-- 1 bjolud ecl 9808 Aug 9 15:02 IVAR_AASEN_2021_REF_3DEC2021.RFT
-rw-r--r-- 1 bjolud ecl 906883 Aug 9 15:02 IVAR_AASEN_2021_REF_3DEC2021.PRT
-rw-r--r-- 1 bjolud ecl 882759 Aug 9 15:02 IVAR_AASEN_2021_REF_3DEC2021.DBG
(base) [bjolud@hpcopm01 IVAR_AASEN]$

What would be the best course of action for me to proceed with the model? At the moment I do not need all the options in the model; could there be a workaround to avoid the error; remove wells, build coarser grid, less faults, etc.?

It is a quite complicated model using a lot of options in Eclipse. The grid has more than 100 faults and the number of NNCs is very large. There are many long horizontal wells using COMPLUMP and other well options. I did get around all the error messages by editing the input file, but there are still warnings and I will remove all those and see if it helps.

Regards,
Bjørn Egil

@bludvigsen
Copy link
Author

Just some additional info is that I have looked at the grid and properties generated by OPM with ResInsight and it looks fine.

@bska
Copy link
Member

bska commented Aug 13, 2024

Just some additional info is that I have looked at the grid and properties generated by OPM with ResInsight and it looks fine.

Thanks a lot for the additional information, this is good to know. I do believe you've come across a programming error within the simulator and I would really like to understand the underlying problem. That said, we may have to take the discussion off-line, especially if the model is not fully public. Please feel free to reach out to me by e-mail (Bard.Skaflestad@sintef.no) if you would like to discuss further.

@bludvigsen
Copy link
Author

Ok I have sent an email to your SINTEF address.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants