-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cross platform issues with Remote Workers / SSH Cluster Manager / Native Dependecies #22
Comments
I believe it is possible to run all nodes, including node 1, remotely, and connect your local REPL to the remote node, thereby avoiding mixed-platform issues. @Keno are there instructions on how to do this? |
Hi, Thanks, I actually saw the thread where the "repl into a remote" stuff was first done (Very cool!). I believe the link is: JuliaLang/julia#3655 There is also just starting node 1 via ssh/tmux, or running a remote IJulia.... For my particular work flow, I want to use the the remote boxes to condense and summarize a large amount of distributed data and then deliver it to my local box for more detailed analysis, keeping the entire flow as interactive as possible, going back and forth many times (so ultimately, the point is to flexibly get interesting subsets of data from one site to another, which makes a purely remote solution not so good) I managed to hack things enough to get it to work for now. I don't endorse what follows, but just so it's documented in case others come across this issue:
(The exact details will depend on what version of hdf5 you have installed and so forth)
So that is quite horrible and will likely crumble with every little change. On the flip side I was able to drive 60 workers on 7 boxes from my mac and everything seemed to work amazingly well in terms of connection times, throughput, and so forth, and so, for what it's worth, I'm a happy customer ! In case anyone is interested, it turns out remote workers slurp .juliarc.jl from node1, which has the potential for many odd issues... thanks! |
I tried to work around this in JuliaPackaging/BinDeps.jl#130 but did not completely follow through revising for the comments (yet). I'd also be interested in making the the cross-platform experience as seamless as possible - I'll try to continue with that PR as soon as I can. |
Cc @amitmurthy |
I've been driving linux boxes from a mac for a few weeks now, and despite my ridiculous hacks it's been extremely useful. My two cents is that rather than adjusting BinDeps and other packages to work around these issues, it might be better/cleaner to be able to launch workers with a command line switch that has them simply load julia code from their local drive rather than slurping from node1 -- I'm already rsync'ing datasets and non-julia code to various nodes so there is not much convenience gained on my end by the current behavior. |
Is this still a pain to get working? |
yes. comes up on discourse every few weeks. |
Hi -- Using a head node (i.e. procid == 1) that is a mac on v0.3.6, I am trying to use linux based workers using SSHClusterManager.
I experience problems with e.g.
using HDF5
--- the basic cause seems to be thatinclude
to node 1 (include
==include_from_node1
)so my linux boxes complain when the cannot locate the mac dylib
I've thought a bit about how to resolve this, but nothing obvious and elegant pops to mind. (For now I've just hacked my deps.jl on my mac to support both OS X and linux)
Have others seen this kind of issue? Is there some simple way to have the workers not pull code from node1 but simply rely on the locally installed packages?
I was thinking of hacking
include_from_node1
in the.juliarc.jl
on the linux boxes to simply not pull code from node1, but that seems a bit drastic -- any thoughts about whether this would work?As an aside, while I can understand the motivation for
include
,using
etc to work by delegating to node1 (e.g, simplify the need for code distribution), it does seems a bit difficult to do robustly or in a way that will scale nicely to dozens or hundreds of workers....thanks
The text was updated successfully, but these errors were encountered: