Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IJulia sometimes hangs on startup/kernal restarts #311

Closed
dhoegh opened this issue May 20, 2015 · 11 comments
Closed

IJulia sometimes hangs on startup/kernal restarts #311

dhoegh opened this issue May 20, 2015 · 11 comments

Comments

@dhoegh
Copy link
Contributor

dhoegh commented May 20, 2015

I am using IPython 3, Julia 0.3.8. Sometimes the IJulia kernal just hangs and using 12% cpu. The end of the output look like the following when capture_stderr = false and verbose = true:

SENDING IPython Msg [ idents status ] {
  header = ["msg_id"=>"84574c20-3d90-4424-b6aa-aece75adf3ab","msg_type"=>"status","username"=>"jlkernel","session"=>"09CDB13962AC4
84589AD562249BB9436"],
  metadata = Dict{String,Any}(),
  content = ["execution_state"=>"idle"]
}
got msg part 09CDB13962AC484589AD562249BB9436
got msg part <IDS|MSG>
RECEIVED IPython Msg [ idents 09CDB13962AC484589AD562249BB9436 ] {
  header = ["msg_id"=>"F1C7B50DFF184AE08E99D889820E7BC0","msg_type"=>"comm_msg","username"=>"username","version"=>"5.0","session"=
>"09CDB13962AC484589AD562249BB9436"],
  metadata = Dict{String,Any}(),
  content = ["comm_id"=>"d9ae84c8-be1d-4709-bf77-91f3bab59af3","data"=>["mime"=>"image/png","action"=>"subscribe_mime"]]
}
SENDING IPython Msg [ idents status ] {
  header = ["msg_id"=>"c4754db4-3eb7-4342-b1ee-639154b57856","msg_type"=>"status","username"=>"jlkernel","session"=>"09CDB13962AC4
84589AD562249BB9436"],
  metadata = Dict{String,Any}(),
  content = ["execution_state"=>"busy"]
}
SENDING IPython Msg [ idents status ] {
  header = ["msg_id"=>"118c854f-cda5-4cea-a699-793174dacbf9","msg_type"=>"status","username"=>"jlkernel","session"=>"09CDB13962AC4
84589AD562249BB9436"],
  metadata = Dict{String,Any}(),
  content = ["execution_state"=>"idle"]
}
got msg part ???J?
got msg part <IDS|MSG>
RECEIVED IPython Msg [ idents ???J? ] {
  header = ["msg_id"=>"ba55bcff-571c-4ff2-b06b-bf5237340f83","msg_type"=>"shutdown_request","username"=>"username","date"=>"2015-0
5-20T20:14:39.449122","version"=>"5.0","session"=>"26a443b7-cf83-416b-a894-69cbddda18c1"],
  metadata = Dict{String,Any}(),
  content = ["restart"=>true]
}
SENDING IPython Msg [ idents status ] {
  header = ["msg_id"=>"948ea887-1442-4d19-9728-6b205c207137","msg_type"=>"status","username"=>"jlkernel","session"=>"09CDB13962AC4
84589AD562249BB9436"],
  metadata = Dict{String,Any}(),
  content = ["execution_state"=>"busy"]
}
SENDING IPython Msg [ idents ???J? ] {
  header = ["msg_id"=>"caba56ca-9833-4be2-8bf0-e609c400790c","msg_type"=>"shutdown_reply","username"=>"username","session"=>"26a44
3b7-cf83-416b-a894-69cbddda18c1"],
  metadata = Dict{String,Any}(),
  content = ["restart"=>true]
}
error in running finalizer: Base.ArgumentError(msg="ref of NULL PyObject")
[I 20:14:39.756 NotebookApp] Kernel restarted: 82eb40fd-c045-4d31-ab33-9726d88ad8c8
[I 20:14:45.151 NotebookApp] Saving file at /3dmodel/Dynamics/Dynamic crane on 180deg.ipynb
[I 20:14:50.341 NotebookApp] Saving file at /3dmodel/Dynamics/Dynamic crane on 180deg.ipynb

It looks like some kind of error on the Julia side due to Base.ArgumentError(msg="ref of NULL PyObject")

@dhoegh
Copy link
Contributor Author

dhoegh commented May 20, 2015

Bonus info, the above is produced with precompiled PyCall and Color.

I have now tried to recompile the sysimg without PyCall and Color. The following error sometimes occur when restarting the kernel.

SENDING IPython Msg [ idents ???J? ] {
  header = ["msg_id"=>"71cb0641-a0e8-4d50-9f96-50f608715a40","msg_type"=>"shutdown_reply","username"=>"username","session"=>"8c019
e8b-7731-4fb6-a469-7d72e62f8542"],
  metadata = Dict{String,Any}(),
  content = ["restart"=>true]
}

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x6b3c6c27 -- PyOS_URandom at C:\Anaconda3\python34.dll (unknown line)
PyOS_URandom at C:\Anaconda3\python34.dll (unknown line)
PyOS_URandom at C:\Anaconda3\python34.dll (unknown line)
PyOS_URandom at C:\Anaconda3\python34.dll (unknown line)
pycall at C:\Users\Hoegh\.julia\v0.3\PyCall\src\PyCall.jl:73
close at C:\Users\Hoegh\.julia\v0.3\PyPlot\src\PyPlot.jl:377
jl_apply_generic at C:\Users\Hoegh\Julia-0.3.8\bin\libjulia.dll (unknown line)
jl_profile_is_running at C:\Users\Hoegh\Julia-0.3.8\bin\libjulia.dll (unknown line)
jl_gc_run_all_finalizers at C:\Users\Hoegh\Julia-0.3.8\bin\libjulia.dll (unknown line)
uv_atexit_hook at C:\Users\Hoegh\Julia-0.3.8\bin\libjulia.dll (unknown line)
jl_exit at C:\Users\Hoegh\Julia-0.3.8\bin\libjulia.dll (unknown line)
shutdown_request at C:\Users\Hoegh\.julia\v0.3\IJulia\src\handlers.jl:44
jlcall_shutdown_request_4043 at  (unknown line)
jl_apply_generic at C:\Users\Hoegh\Julia-0.3.8\bin\libjulia.dll (unknown line)
eventloop at C:\Users\Hoegh\.julia\v0.3\IJulia\src\IJulia.jl:123
jl_apply_generic at C:\Users\Hoegh\Julia-0.3.8\bin\libjulia.dll (unknown line)
anonymous at task.jl:340
jl_handle_stack_switch at C:\Users\Hoegh\Julia-0.3.8\bin\libjulia.dll (unknown line)
julia_trampoline at C:\Users\Hoegh\Julia-0.3.8\bin\libjulia.dll (unknown line)
unknown function (ip: 4202784)
unknown function (ip: 4199370)
unknown function (ip: 4199672)
BaseThreadInitThunk at C:\Windows\system32\kernel32.dll (unknown line)
RtlUserThreadStart at C:\Windows\SYSTEM32\ntdll.dll (unknown line)
RtlUserThreadStart at C:\Windows\SYSTEM32\ntdll.dll (unknown line)

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x6bfa13d8 -- jl_profile_is_running at C:\Users\Hoegh\Julia-0.3.8\bin\libjulia.dll (unkno
wn line)
jl_profile_is_running at C:\Users\Hoegh\Julia-0.3.8\bin\libjulia.dll (unknown line)
jl_gc_run_all_finalizers at C:\Users\Hoegh\Julia-0.3.8\bin\libjulia.dll (unknown line)
uv_atexit_hook at C:\Users\Hoegh\Julia-0.3.8\bin\libjulia.dll (unknown line)
jl_exit at C:\Users\Hoegh\Julia-0.3.8\bin\libjulia.dll (unknown line)
jl_throw_in_ctx at C:\Users\Hoegh\Julia-0.3.8\bin\libjulia.dll (unknown line)
RtlDecodePointer at C:\Windows\SYSTEM32\ntdll.dll (unknown line)
RtlUnwindEx at C:\Windows\SYSTEM32\ntdll.dll (unknown line)
KiUserExceptionDispatcher at C:\Windows\SYSTEM32\ntdll.dll (unknown line)
PyOS_URandom at C:\Anaconda3\python34.dll (unknown line)
PyOS_URandom at C:\Anaconda3\python34.dll (unknown line)
PyOS_URandom at C:\Anaconda3\python34.dll (unknown line)
pycall at C:\Users\Hoegh\.julia\v0.3\PyCall\src\PyCall.jl:73
close at C:\Users\Hoegh\.julia\v0.3\PyPlot\src\PyPlot.jl:377
jl_apply_generic at C:\Users\Hoegh\Julia-0.3.8\bin\libjulia.dll (unknown line)
jl_profile_is_running at C:\Users\Hoegh\Julia-0.3.8\bin\libjulia.dll (unknown line)
jl_gc_run_all_finalizers at C:\Users\Hoegh\Julia-0.3.8\bin\libjulia.dll (unknown line)
uv_atexit_hook at C:\Users\Hoegh\Julia-0.3.8\bin\libjulia.dll (unknown line)
jl_exit at C:\Users\Hoegh\Julia-0.3.8\bin\libjulia.dll (unknown line)
shutdown_request at C:\Users\Hoegh\.julia\v0.3\IJulia\src\handlers.jl:44
jlcall_shutdown_request_4043 at  (unknown line)
jl_apply_generic at C:\Users\Hoegh\Julia-0.3.8\bin\libjulia.dll (unknown line)
eventloop at C:\Users\Hoegh\.julia\v0.3\IJulia\src\IJulia.jl:123
jl_apply_generic at C:\Users\Hoegh\Julia-0.3.8\bin\libjulia.dll (unknown line)
anonymous at task.jl:340
jl_handle_stack_switch at C:\Users\Hoegh\Julia-0.3.8\bin\libjulia.dll (unknown line)
julia_trampoline at C:\Users\Hoegh\Julia-0.3.8\bin\libjulia.dll (unknown line)
unknown function (ip: 4202784)
unknown function (ip: 4199370)
unknown function (ip: 4199672)
BaseThreadInitThunk at C:\Windows\system32\kernel32.dll (unknown line)
RtlUserThreadStart at C:\Windows\SYSTEM32\ntdll.dll (unknown line)
RtlUserThreadStart at C:\Windows\SYSTEM32\ntdll.dll (unknown line)
[I 20:34:12.735 NotebookApp] Kernel restarted: d0741be8-8e65-4511-b732-b0b9557947e1
PROFILE = ["key"=>"0a556107-de33-47bb-b20e-43d8d3a13bca","transport"=>"tcp","signature_scheme"=>"hmac-sha256","shell_port"=>58219,
"hb_port"=>58223,"control_port"=>58222,"ip"=>"127.0.0.1","stdin_port"=>58221,"iopub_port"=>58220]
SENDING IPython Msg [ idents status ] {
  header = ["msg_id"=>"831267af-b2a0-43f1-b6c0-0d29c09e135f","msg_type"=>"status","username"=>"jlkernel","session"=>"????"],
  metadata = Dict{String,Any}(),
  content = ["execution_state"=>"starting"]
}
Starting kernel event loops.
got msg part C6C9A878108D4B41877AF2A0C89623FF
got msg part <IDS|MSG>
RECEIVED IPython Msg [ idents C6C9A878108D4B41877AF2A0C89623FF ] {
  header = ["msg_id"=>"3141216C4A224222B040C867C1224D60","msg_type"=>"kernel_info_request","username"=>"username","version"=>"5.0"
,"session"=>"C6C9A878108D4B41877AF2A0C89623FF"],
  metadata = Dict{String,Any}(),
  content = Dict{String,Any}()
}

@stevengj
Copy link
Member

@dhoegh, it looks like you are still using PyCall, since the above error mentions PyCall? (IJulia itself does not use PyCall.)

Probably this is not an IJulia-specific problem, but rather a problem with precompiling PyCall?

@dhoegh
Copy link
Contributor Author

dhoegh commented May 21, 2015

Maybe, but sometimes do the IJulia kernel just hang while starting before running anything using PyCall. No one of the above are traces where the kernel hangs at startup but rather at restart. Can it somehow be related to, I run the IPython notebook server from a python subprocess that starts the server at windows startup?

@dhoegh
Copy link
Contributor Author

dhoegh commented May 21, 2015

Ok, I now caught one where the kernel just hanged at start-up. unfortunately the message is not very helpful.

[I 19:05:04.922 NotebookApp] Kernel started: ca75846f-cd86-47fa-859b-b0f5b2e75991
[W 19:05:04.927 NotebookApp] 404 GET /kernelspecs/julia-0.3/logo-64x64.png (::1) 2.00ms referer=http://localhost:8888/notebooks/3d
model/Dynamics/Dynamic%20crane%20on%20180deg.ipynb
[W 19:05:15.015 NotebookApp] Timeout waiting for kernel_info reply from ca75846f-cd86-47fa-859b-b0f5b2e75991

@dhoegh
Copy link
Contributor Author

dhoegh commented May 21, 2015

Just for the completeness, it could maybe be JuliaLang/julia/issues/9242 even though I have not observed it in the REPL for many months.

@stevengj
Copy link
Member

See also the debugging IJulia instructions on how to potentially get more information.

@stevengj
Copy link
Member

Can you try writing a Windows batch script or something like that that just launches the REPL a few thousand times, to see if you can replicate this in the REPL by itself?

@dhoegh
Copy link
Contributor Author

dhoegh commented May 21, 2015

I had already changed capture_stderr = false and `verbose = true.
I have tried to run the following batch file with no resulting hangs. I have not tried it on 0.3.3 to see if it results in a hang there

FOR /L %%i IN (1,1,100) DO (
  REM ECHO %%i
  C:\Users\Hoegh\Julia-0.3.8\bin\julia.exe -e "println(%%i)"
)
pause

@maximerischard
Copy link

I think ipython/ipython#7776 is related. The timeout seems to be causing issues, although in theory ipython should be able to recover. What was happening to me is that because of the initial timeout, when the page is reloaded or the kernel is restarted, ipython fails to adapt the protocol from version 5.0 to version 4.0. The timeout is simply caused by julia taking more than 10 seconds to warm up (which sadly is normal). Adding

c.NotebookApp.tornado_settings.update( {'kernel_info_timeout': 60} )

to ~/.ipython/profile_julia/ipython_config.py therefore fixed this and related issues for me. Maybe this should be part of the default IJulia configuration, since slow startup times are the norm.

I still think there's a bug in ipython or ijulia (more likely ipython), but I stopped digging at this point.

@dhoegh
Copy link
Contributor Author

dhoegh commented Jun 17, 2015

@maximerischard, sounds good, it makes sense because I have observed it occurred more often when I had IJulia to print debug information. I will give your advice a test and return with a pull request to set it as default option if it works.

@stevengj
Copy link
Member

Closing as it seems this hasn't been replicated for a while. Perhaps when we switched over to protocol version 5 the problem went away?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants