Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issues with exceptions thrown by Ipopt #9

Closed
mlubin opened this issue Jun 17, 2014 · 29 comments
Closed

issues with exceptions thrown by Ipopt #9

mlubin opened this issue Jun 17, 2014 · 29 comments

Comments

@mlubin
Copy link
Member

mlubin commented Jun 17, 2014

This came out of debugging https://groups.google.com/forum/#!topic/julia-opt/J77CzzhNP4g, where JuMP was returning inf or NaN when evaluating a function for numerical reasons. Ipopt already has error handling for this case: https://projects.coin-or.org/Ipopt/browser/trunk/Ipopt/src/Algorithm/IpOrigIpoptNLP.cpp#L496 and throws an exception. (I confirmed that the macro is properly expanded and an exception is indeed thrown.) The C wrapper does seem to have code to catch and display exceptions, but it's never reached. Instead, I've seen two different behaviors:

  • On julianightlies PPA, a core dump occurs. Here's the backtrace:
Program received signal SIGABRT, Aborted.
0x00007ffff6892f79 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56  ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007ffff6892f79 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007ffff6896388 in __GI_abort () at abort.c:89
#2  0x00007ffff5a25521 in _Unwind_Resume () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#3  0x00007ffff1ec75b8 in Ipopt::OrigIpoptNLP::f (this=0x45c7dc0, x=...)
    at IpOrigIpoptNLP.cpp:499
#4  0x00007ffff1e7dbc8 in Ipopt::IpoptCalculatedQuantities::trial_f (this=0x53701e0)
    at IpIpoptCalculatedQuantities.cpp:612
#5  0x00007ffff1e7f153 in Ipopt::IpoptCalculatedQuantities::trial_barrier_obj (
    this=0x53701e0) at IpIpoptCalculatedQuantities.cpp:811
#6  0x00007ffff1e65274 in Ipopt::FilterLSAcceptor::CheckAcceptabilityOfTrialPoint (
    this=0x5371990, alpha_primal_test=1) at IpFilterLSAcceptor.cpp:307
#7  0x00007ffff1e51fb0 in Ipopt::BacktrackingLineSearch::DoBacktrackingLineSearch (
    this=0x5371ac0, skip_first_trial_point=false, alpha_primal=@0x7fffffffc568: 1, 
    corr_taken=@0x7fffffffc333: false, soc_taken=@0x7fffffffc334: false, 
    n_steps=@0x7fffffffc33c: 0, evaluation_error=@0x7fffffffc550: false, 
    actual_delta=...) at IpBacktrackingLineSearch.cpp:726
#8  0x00007ffff1e5074f in Ipopt::BacktrackingLineSearch::FindAcceptableTrialPoint (
    this=0x5371ac0) at IpBacktrackingLineSearch.cpp:462
#9  0x00007ffff1e72886 in Ipopt::IpoptAlgorithm::ComputeAcceptableTrialPoint (
    this=0x5371cf0) at IpIpoptAlg.cpp:542
#10 0x00007ffff1e71863 in Ipopt::IpoptAlgorithm::Optimize (this=0x5371cf0, 
    isResto=false) at IpIpoptAlg.cpp:346
#11 0x00007ffff1ddf759 in Ipopt::IpoptApplication::call_optimize (this=0x6042030)
    at IpIpoptApplication.cpp:882
#12 0x00007ffff1dde893 in Ipopt::IpoptApplication::OptimizeNLP (this=0x6042030, 
    nlp=..., alg_builder=...) at IpIpoptApplication.cpp:769
#13 0x00007ffff1dde591 in Ipopt::IpoptApplication::OptimizeNLP (this=0x6042030, 
    nlp=...) at IpIpoptApplication.cpp:732
#14 0x00007ffff1dde178 in Ipopt::IpoptApplication::OptimizeTNLP (this=0x6042030, 
    tnlp=...) at IpIpoptApplication.cpp:711
#15 0x00007ffff1de8cb8 in IpoptSolve (ipopt_problem=0x6041da0, x=0x4ece2d8, 
    g=0xc80b610, obj_val=0x53d3d00, mult_g=0x0, mult_x_L=0x0, mult_x_U=0x0, 
    user_data=0xb087df0) at IpStdCInterface.cpp:272
#16 0x00007ffff7e379a0 in ?? ()
#17 0x0000000000000000 in ?? ()
  • On julia built from source on the same machine, the exception seems to be just ignored and Ipopt continues to iterate.

Neither of these is the right behavior.
@tkelman, have you seen this before?
@Keno @vtjnash, is there any reason why C++ exceptions thrown inside a C wrapper called from Julia wouldn't work correctly?

@tkelman
Copy link
Contributor

tkelman commented Jun 17, 2014

That does look eerily familiar, but I can't quite place where from. I would occasionally have some strange things happen with exceptions when I was trying to statically link the various gcc runtime libs, but that's not what's happening here AFAICT. Would want to make sure Ipopt and Julia are all linked against the same libgcc.

I'd usually advise reformulating, rescaling, and/or adding constraints to avoid the inf/nan in the first place, but yeah core dumps shouldn't happen.

@mlubin
Copy link
Member Author

mlubin commented Jun 17, 2014

Both Ipopt and PPA julia seem to be linked against the same version of libgcc (libgcc_s.so.1) and libstdc++ (libstdc++.so.6).

CC @staticfloat

@staticfloat
Copy link
Contributor

That's pretty bizarre. Looks like Unwind_Resume() is deciding that there's nothing to catch this exception, and so gives up and throws in the towel. I suppose it would be cool if you could step into things with gdb and try to inspect error handling that has/hasn't been setup, but I don't know how to do that.

@mlubin
Copy link
Member Author

mlubin commented Jun 17, 2014

After some more digging, I think the behavior of the source-compiled Julia is correct, Ipopt has its own mechanism for handling exceptions and doesn't immediately abort. It's just the PPA julia with the weird behavior. Some more debugging notes:

  • Exceptions are properly caught when the handler is in the same function, e.g.:
try{
    throw 10;
} catch (int e) {
    printf("Got it\n");
}

works fine.

  • Exceptions are not caught (instead, core dump) when the handler is in a function one level above.
  • I couldn't reproduce with a simple standalone test case.
  • gdb reports the wrong line number in the backtrace (a few lines after the exception was thrown, past code that wasn't executed)
  • Valgrind reports some interesting errors:
==14358== Invalid read of size 8
==14358==    at 0x67C4D67: _Ux86_64_setcontext (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
==14358==    by 0xFFEE0D5: Ipopt::IpoptCalculatedQuantities::curr_f() (IpIpoptCalculatedQuantities.cpp:569)
==14358==    by 0xFFEE248: Ipopt::IpoptCalculatedQuantities::unscaled_curr_f() (IpIpoptCalculatedQuantities.cpp:581)
==14358==    by 0x1002F41D: Ipopt::OptimalityErrorConvergenceCheck::CheckConvergence(bool) (IpOptErrorConvCheck.cpp:191)
==14358==    by 0xFFE1E2C: Ipopt::IpoptAlgorithm::Optimize(bool) (IpIpoptAlg.cpp:293)
==14358==    by 0xFF4F878: Ipopt::IpoptApplication::call_optimize() (IpIpoptApplication.cpp:882)
==14358==    by 0xFF4E972: Ipopt::IpoptApplication::OptimizeNLP(Ipopt::SmartPtr<Ipopt::NLP> const&, Ipopt::SmartPtr<Ipopt::AlgorithmBuilder>&) (IpIpoptApplication.cpp:769)
==14358==    by 0xFF4E670: Ipopt::IpoptApplication::OptimizeNLP(Ipopt::SmartPtr<Ipopt::NLP> const&) (IpIpoptApplication.cpp:732)
==14358==    by 0xFF4E217: Ipopt::IpoptApplication::OptimizeTNLP(Ipopt::SmartPtr<Ipopt::TNLP> const&) (IpIpoptApplication.cpp:711)
==14358==    by 0xFF58DF7: IpoptSolve (IpStdCInterface.cpp:272)

Is there any difference in the libunwind version used by the PPA and the one used in source builds?

@staticfloat
Copy link
Contributor

The libunwind used by the PPA is libunwind8 amd64 1.1-2ubuntu3, you can check out a buildlog to see what gets installed.

@Keno
Copy link

Keno commented Jun 17, 2014

We have a local patch to libunwind. Maybe that's the issue?

@mlubin
Copy link
Member Author

mlubin commented Jun 17, 2014

From the discussion in JuliaLang/julia#3469, it didn't seem like that patch is related to exception handling.

@Keno
Copy link

Keno commented Jun 17, 2014

Depends. If it unwinds incorrectly and can't find the exception handler, you're in trouble.

@mlubin
Copy link
Member Author

mlubin commented Jun 17, 2014

How can I refresh an existing julia build to use the system unwind? USE_SYSTEM_LIBUNWIND=1 in Make.user doesn't seem to be enough.

@Keno
Copy link

Keno commented Jun 17, 2014

Maybe distclean libunwind and/or deleting usr/lib

@mlubin
Copy link
Member Author

mlubin commented Jun 17, 2014

    LINK usr/lib/libjulia.so
/usr/bin/ld: cannot find -lunwind-generic
/usr/bin/ld: cannot find -lunwind
collect2: error: ld returned 1 exit status

I have /usr/lib/x86_64-linux-gnu/libunwind.so.8.

@mlubin
Copy link
Member Author

mlubin commented Jun 17, 2014

Scratch that, I didn't have libunwind8-dev installed.

@mlubin
Copy link
Member Author

mlubin commented Jun 18, 2014

I can reproduce the issue when using the system libunwind and a julia source build. Our patch to libunwind was accepted upstream: http://git.savannah.gnu.org/gitweb/?p=libunwind.git;a=commitdiff;h=4509adb85303afb471fbd10733f044535da5b1cc, so maybe we can hope for the fix to be distributed in a year or so....
Given that ubuntu 14.04 will be around for quite a while, could we include this patch in the PPA?

@staticfloat
Copy link
Contributor

Can you try the libunwind I just uploaded to the PPA? It should just auto-overwrite yours on an apt-get update && apt-get upgrade, if you're on trusty.

@mlubin
Copy link
Member Author

mlubin commented Jun 22, 2014

Unfortunately doesn't seem to fix the issue. Backtrace is the same.

@staticfloat
Copy link
Contributor

Just for completeness, what version do you have installed? "apt-cache show
libunwind8" should give you a version number somewhere.
-E

On Sun, Jun 22, 2014 at 4:10 PM, Miles Lubin notifications@github.com
wrote:

Unfortunately doesn't seem to fix the issue. Backtrace is the same.


Reply to this email directly or view it on GitHub
#9 (comment).

@mlubin
Copy link
Member Author

mlubin commented Jun 23, 2014

I have libunwind8 1.1-2.2ubuntu4 installed.

@mlubin
Copy link
Member Author

mlubin commented Dec 8, 2014

Bump. This still seems to be broken: https://groups.google.com/forum/#!topic/julia-opt/YhuEzKdfpWw

@ibell
Copy link

ibell commented Feb 23, 2015

Bump. Still an issue with the PPA, while packaged 0.3.6 seems to bubble/catch exceptions properly (see JuliaLang/julia#10273). Any way to fix the PPA?

@staticfloat
Copy link
Contributor

I will try to take a look at this later this week. This is a tough one because last I looked I couldn't figure out what was different between the packaged libunwind and our from-source built libunwind. Hopefully with fresh eyes, this will be easier for me to spot.

@ibell
Copy link

ibell commented Feb 24, 2015

Thanks!

On Mon, Feb 23, 2015 at 5:37 PM, Elliot Saba notifications@github.com
wrote:

I will try to take a look at this later this week. This is a tough one
because last I looked I couldn't figure out what was different between the
packaged libunwind and our from-source built libunwind.


Reply to this email directly or view it on GitHub
#9 (comment).

@staticfloat
Copy link
Contributor

Can someone give me a simple testcase I can run to make sure I've reproduced the failure locally?

@JonWel
Copy link

JonWel commented Feb 24, 2015

Here is a simple test for the CoolProp case (not linked to Ipopt appart that it is the same error with the Julia ppa):

Installation of Julia with the ppa:

sudo add-apt-repository ppa:staticfloat/julianightlies
sudo add-apt-repository ppa:staticfloat/julia-deps
sudo apt install julia

Download the CoolProp library:
http://sourceforge.net/projects/coolprop/files/CoolProp/nightly/shared_library/Linux/64bit/libCoolProp.so/download

Then in Julia started in the same folder than this library:

julia> push!(DL_LOAD_PATH,".")
1-element Array{Union(ASCIIString,UTF8String),1}:
"."

julia> ccall( (:PropsSI, "libCoolProp"), Cdouble, (Ptr{Uint8},Ptr{Uint8},Cdouble,Ptr{Uint8},Cdouble,Ptr{Uint8}), "T","P",101325.,"Q",0.,"Water")
    373.12429584768836

julia> ccall( (:PropsSI, "libCoolProp"), Cdouble, (Ptr{Uint8},Ptr{Uint8},Cdouble,Ptr{Uint8},Cdouble,Ptr{Uint8}), "T","P",-101325.,"Q",0.,"Water")

signal (6): Aborted
gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
_Unwind_Resume at /lib/x86_64-linux-gnu/libgcc_s.so.1 (unknown line)
_ZN8CoolProp16_PropsSI_outputsERNSt3tr110shared_ptrINS_13AbstractStateEEESt6vectorINS_16output_parameterESaIS6_EENS_11input_pairsERKS5_IdSaIdEESD_RS5_ISB_SaISB_EE at ./libCoolProp.so (unknown line)
_ZN8CoolProp13_PropsSImultiERKSt6vectorISsSaISsEERKSsRKS0_IdSaIdEES6_SA_S6_S4_SA_RS0 _IS8_SaIS8_EE at ./libCoolProp.so (unknown line)
_ZN8CoolProp7PropsSIERKSsS1_dS1_dS1_ at ./libCoolProp.so (unknown line)
PropsSI at ./libCoolProp.so (unknown line)
anonymous at no file:0
unknown function (ip: -998904628)
jl_f_top_eval at /usr/bin/../lib/x86_64-linux-gnu/julia/libjulia.so (unknown line)
eval_user_input at REPL.jl:53
jlcall_eval_user_input_20160 at  (unknown line)
jl_apply_generic at /usr/bin/../lib/x86_64-linux-gnu/julia/libjulia.so (unknown line)
anonymous at task.jl:95
jl_handle_stack_switch at /usr/bin/../lib/x86_64-linux-gnu/julia/libjulia.so (unknown line)
julia_trampoline at /usr/bin/../lib/x86_64-linux-gnu/julia/libjulia.so (unknown line)
unknown function (ip: 4199613)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 4199667)
unknown function (ip: 0)
Aborted (core dumped)

The first call check if it works, the second to check if the problem is still there (it should gives Inf instead of the Aborted).

I hope this is simple enough

@staticfloat
Copy link
Contributor

Alright, I've narrowed this down to the fact that if you turn on --enable-cxx-exceptions and --enable-shared at the same time, and then link against the dynamic library, things go south. Linking against the static library or setting --disable-cxx-exceptions fixes things.

For now, the solution I'll use is to link Julia against libunwind statically by building it on the buildd servers alongside Julia herself. We should hopefully see a new PPA build in the next few days, right now I'm running into an issue in how the docs are being generated that's stopping the builds from publishing correctly.

@mlubin
Copy link
Member Author

mlubin commented Mar 14, 2015

@staticfloat, does the latest PPA incorporate this fix? We just got another report of the issue: https://groups.google.com/forum/#!topic/julia-opt/meobh1XWnj0.

@staticfloat
Copy link
Contributor

@mlubin Ah, the nightlies had it, but looks like the stable builds still had the same error. This has been rectified.

@JonWel
Copy link

JonWel commented Mar 14, 2015

This is now fixed for me for both ppa since:
0.3.6-depsfix9-utopic
0.4.0-1953~ubuntu14.10.1

None of them crash any more.

@staticfloat
Copy link
Contributor

Julia 0.3.6-depxfix9-utopic should be available, just do an apt-get update
and apt-get upgrade.
-E

On Sat, Mar 14, 2015 at 3:55 AM, JonWel notifications@github.com wrote:

As of Julia 0.3.6-depsfix7-utopic ppa, I still have the crash (I guess the
next one will be fine).
Julia 0.4.0-1955~ubuntu14.10.1 ppa do not crash.


Reply to this email directly or view it on GitHub
#9 (comment).

@mlubin
Copy link
Member Author

mlubin commented Mar 16, 2015

Seems to be fixed now, thanks @staticfloat!

@mlubin mlubin closed this as completed Mar 16, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

6 participants