Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle deserialization errors in RemoteExceptions #20231

Closed
amitmurthy opened this issue Jan 25, 2017 · 0 comments · Fixed by #20276
Closed

Handle deserialization errors in RemoteExceptions #20231

amitmurthy opened this issue Jan 25, 2017 · 0 comments · Fixed by #20276
Labels
parallelism Parallel or distributed computation

Comments

@amitmurthy
Copy link
Contributor

If an exception type (or any data fields in the exception) thrown in a worker is not present on the destination node, deserialization of the exception results in a loss of the original error and the stacktrace (if the original exception is wrapped in a CapturedException).

The root cause is shown below:

julia> remotecall_fetch(()->eval(quote
         type Foo <: Exception
           x
         end
         throw(Foo(1))
       end), 2)
ERROR: UndefVarError: Foo not defined
 in deserialize_datatype(::Base.ClusterSerializer{TCPSocket}) at ./serialize.jl:841
 in handle_deserialize(::Base.ClusterSerializer{TCPSocket}, ::Int32) at ./serialize.jl:580
 in deserialize(::Base.ClusterSerializer{TCPSocket}, ::DataType) at ./serialize.jl:909
 in deserialize_datatype(::Base.ClusterSerializer{TCPSocket}) at ./serialize.jl:856
 in handle_deserialize(::Base.ClusterSerializer{TCPSocket}, ::Int32) at ./serialize.jl:580
 in deserialize(::Base.ClusterSerializer{TCPSocket}, ::DataType) at ./serialize.jl:909
 in deserialize_datatype(::Base.ClusterSerializer{TCPSocket}) at ./serialize.jl:856
 in handle_deserialize(::Base.ClusterSerializer{TCPSocket}, ::Int32) at ./serialize.jl:580
 in deserialize_msg(::Base.ClusterSerializer{TCPSocket}, ::Type{Base.ResultMsg}) at ./multi.jl:120
 in deserialize_msg(::Base.ClusterSerializer{TCPSocket}) at ./multi.jl:130
 in message_handler_loop(::TCPSocket, ::TCPSocket, ::Bool) at ./multi.jl:1371
 in process_tcp_streams(::TCPSocket, ::TCPSocket, ::Bool) at ./multi.jl:1328
 in (::Base.##505#506{TCPSocket,TCPSocket,Bool})() at ./event.jl:73
Stacktrace:
 [1] #remotecall_fetch#493(::Array{Any,1}, ::Function, ::Function, ::Base.Worker) at ./multi.jl:1093
 [2] remotecall_fetch(::Function, ::Base.Worker) at ./multi.jl:1085
 [3] #remotecall_fetch#496(::Array{Any,1}, ::Function, ::Function, ::Int64) at ./multi.jl:1106
 [4] remotecall_fetch(::Function, ::Int64) at ./multi.jl:1106

While it is difficult to recover a valid object after a deserialization error, we could at least capture the type of the original error, say a BoundsError as part of the deserialization error stack.

Since remotecalls always wrap the remote exception in a RemoteException we can capture a bit more information of the original error. We could

  • add a short stringified description of the original error type and the first line from the original callstack as the first field to be deserialized in a specific deserialize(s::AbstractSerializer, t::Type{T}) where T <: RemoteException implementation.
  • This implementation will catch any further deserialization errors and rethrow it with the short stringified desc of the original error added.

This way some information of the original error is retained even in the event of deserialization errors.

Ref : #20027 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parallelism Parallel or distributed computation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant