Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault when running on_connection_complete from a sysimage #13

Closed
Octogonapus opened this issue Jul 28, 2022 · 12 comments
Closed

Segfault when running on_connection_complete from a sysimage #13

Octogonapus opened this issue Jul 28, 2022 · 12 comments
Assignees

Comments

@Octogonapus
Copy link
Owner

Octogonapus commented Jul 28, 2022

When attempting to connect to an endpoint for which the client is not authorized, libawscrt internally invokes on_connection_complete which can go one of two ways.

  1. It ultimately prints an error to the user (this is the expected behavior):
      nested task error: Connection failed. AWS Error 5134: libaws-c-mqtt: AWS_ERROR_MQTT_UNEXPECTED_HANGUP, The connection was closed unexpectedly.
      Stacktrace:
       [1] macro expansion
         @ ~/.julia/packages/AWSCRT/XV7k2/src/AWSMQTT.jl:440 [inlined]
       [2] (::AWSCRT.var"#42#46"{AWSCRT.MQTTConnection, Base.RefValue{LibAWSCRT.aws_mqtt_connection_options}, Base.RefValue{ForeignCallbacks.ForeignToken}, ForeignCallbacks.ForeignCallback{AWSCRT.OnConnectionCompleteMsg}, Channel{Any}})()
         @ AWSCRT ./task.jl:429
  1. It segfaults when running in a sysimage with a particular method precompiled. This is the precompile statement causing the problem.

I have created an MWE here (though it requires some real cloud resources): https://github.com/Octogonapus/awscrt_segfault_mwe

@Octogonapus Octogonapus self-assigned this Jul 28, 2022
@Octogonapus
Copy link
Owner Author

@vchuravy I would really appreciate it if you could take a short look at that MWE and reply with anything you noticed or thought of. This one has @IanButterworth and I stumped. Also, if you have any recommendations as to where I should direct my debugging efforts, or other things to try, that is also helpful. Thank you!

@vchuravy
Copy link

Might be similar to JuliaGPU/CUDA.jl#1314

@IanButterworth
Copy link

IanButterworth commented Jul 28, 2022

Specifically this reproducer JuliaGPU/CUDA.jl#1314 (comment)

Look for jl_capi. They're meant to be called from C

@IanButterworth
Copy link

Alternative strategy, only call the cfunctions during __init__, don't bake them into the sysimage

@Octogonapus
Copy link
Owner Author

I rebuilt the base sysimage and the Foo sysimage on 1.9.0-DEV commit e2a8a4e6b3bb333fdab5a5c9a023fe96e2f39c92. I get the same segfault.

Thread 3 received signal SIGSEGV, Segmentation fault.
[Switching to Thread 53278.53368]
0x00007fd305fa4a15 in jlcapi_on_connection_complete_34435.clone_1 () from /home/salmon/Documents/leuko/awscrt_segfault_mwe/sysimage/sysimage.so
(rr) bt 5
#0  0x00007fd305fa4a15 in jlcapi_on_connection_complete_34435.clone_1 () from /home/salmon/Documents/leuko/awscrt_segfault_mwe/sysimage/sysimage.so
#1  0x00007fd2f70f91bf in s_mqtt_client_shutdown () from /home/salmon/.julia/artifacts/c8c44de5d9660b058d222a5cb7023b97d928d825/lib/libawscrt.so
#2  0x00007fd2f70dc058 in s_on_client_channel_on_shutdown () from /home/salmon/.julia/artifacts/c8c44de5d9660b058d222a5cb7023b97d928d825/lib/libawscrt.so
#3  0x00007fd2f709b4de in s_run_all () from /home/salmon/.julia/artifacts/c8c44de5d9660b058d222a5cb7023b97d928d825/lib/libawscrt.so
#4  0x00007fd2f70e4338 in s_main_loop () from /home/salmon/.julia/artifacts/c8c44de5d9660b058d222a5cb7023b97d928d825/lib/libawscrt.so
(More stack frames follow...)

@Octogonapus
Copy link
Owner Author

Octogonapus commented Jul 29, 2022

Alternative strategy, only call the cfunctions during __init__, don't bake them into the sysimage

__init__ is just a regular function so I don't understand what you mean. Calling something inside __init__ doesn't prevent it from being cached in the sysimage.

@IanButterworth
Copy link

__init__() isn't called during sysimage generation AFAIK

@Octogonapus
Copy link
Owner Author

__init__() isn't called during sysimage generation AFAIK

It is called and I have seen values in it get cached in the past. I will do something else to lift the cfunctions out of the sysimage.

@Octogonapus
Copy link
Owner Author

Octogonapus commented Jul 29, 2022

Actually, I don't think I can get the cfunctions out of the sysimage because they depend on values created at runtime. E.g.

function connect(
    ...
    on_connection_interrupted::Union{OnConnectionInterrupted,Nothing} = nothing,
    ...
)
    ...
        on_connection_interrupted_cb = @cfunction(on_connection_interrupted, Cvoid, (Ptr{aws_mqtt_client_connection}, Cint, Ptr{Cvoid}))

It seems that we simply can't put AWSCRT.jl in the sysimage.

@Octogonapus
Copy link
Owner Author

There is no segfault when using JuliaLang/julia#45447

@Octogonapus
Copy link
Owner Author

Should have closed this earlier but this is fixed in 1.9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants