Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error loading module on remote SSH workers #26975

Closed
jebej opened this issue May 4, 2018 · 8 comments
Closed

Error loading module on remote SSH workers #26975

jebej opened this issue May 4, 2018 · 8 comments
Labels
parallelism Parallel or distributed computation

Comments

@jebej
Copy link
Contributor

jebej commented May 4, 2018

On julia 0.6.2, from a Windows machine, and with SSH remote workers on CentOS, it seems that I cannot load modules remotely, due to some missing anonymous function, which makes it hard to track down the issue. I have asked about this on Discourse, but no solution has been found.

From what I understand, code loading from the master node should still work.

@vchuravy
Copy link
Member

vchuravy commented May 4, 2018

Sounds like a dup of #26805, which will be fixed by #26813 (I just haven't gotten around to adding Tests)

@jebej
Copy link
Contributor Author

jebej commented May 4, 2018

Is there anyway I can verify that? Note that loading the module locally prior does not help:

julia> using OrdinaryDiffEq

julia> addprocs([("admin@some.server",2)],exename="julia",dir="/home/admin",tunnel=true)
2-element Array{Int64,1}:
 2
 3

julia> @everywhere using OrdinaryDiffEq
ERROR: On worker 2:
UndefVarError: ##695#697 not defined
deserialize_datatype at .\serialize.jl:973
handle_deserialize at .\serialize.jl:677
deserialize at .\serialize.jl:637
handle_deserialize at .\serialize.jl:684
deserialize_msg at .\distributed\messages.jl:98
message_handler_loop at .\distributed\process_messages.jl:161
process_tcp_streams at .\distributed\process_messages.jl:118
#99 at .\event.jl:73
#remotecall_fetch#141(::Array{Any,1}, ::Function, ::Function, ::Base.Distributed.Worker, ::Expr, ::Vararg{Expr,N} where N) at .\distributed\remotecall.jl:354
remotecall_fetch(::Function, ::Base.Distributed.Worker, ::Expr, ::Vararg{Expr,N} where N) at .\distributed\remotecall.jl:346
#remotecall_fetch#144(::Array{Any,1}, ::Function, ::Function, ::Int64, ::Expr, ::Vararg{Expr,N} where N) at .\distributed\remotecall.jl:367
remotecall_fetch(::Function, ::Int64, ::Expr, ::Vararg{Expr,N} where N) at .\distributed\remotecall.jl:367
(::##11#13)() at .\distributed\macros.jl:102

...and 1 more exception(s).

Stacktrace:
 [1] sync_end() at .\task.jl:287
 [2] macro expansion at .\distributed\macros.jl:112 [inlined]
 [3] anonymous at .\<missing>:?

@vchuravy
Copy link
Member

vchuravy commented May 4, 2018 via email

@vchuravy
Copy link
Member

vchuravy commented May 4, 2018 via email

@jebej
Copy link
Contributor Author

jebej commented May 4, 2018

Can you post a minimal working example? Something I can run myself?

Is what I just posted not sufficient? You would need access to remote SSH workers, since it all works fine with local workers.

@jebej
Copy link
Contributor Author

jebej commented May 4, 2018

Is there any way I could try manually loading the code on the workers to see where/why it fails?

@jebej
Copy link
Contributor Author

jebej commented May 7, 2018

The other thing you can try to do is to check if the PR I mentioned fixes
your issue.

I can try this fix as soon as it makes its way in a nightly, but it does seem like it is a different problem.

@kshyatt kshyatt added the parallelism Parallel or distributed computation label May 28, 2018
@simonbyrne
Copy link
Contributor

Code loading has changed significantly from 0.6. Please open a new issue if this is still a problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parallelism Parallel or distributed computation
Projects
None yet
Development

No branches or pull requests

4 participants