-
Notifications
You must be signed in to change notification settings - Fork 296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster workflow feature to allow shell commands or script to run before remote server setup (e.g. slurm) (wrap install script) #1722
Comments
I managed to modify the CTRL+F -> I've confirmed that this correctly starts the VS Code Remote SSH server on a compute node. Now I am running into a port-forwarding issue, possibly related to issue #92. Our compute nodes have the ports used by VS Code Remote SSH disabled, so there isn't an easy way around this issue. Thanks for the hard work on this so far! This extension has extraordinary potential. Being able to run and modify a Jupyter notebook remotely on our cluster, while using intellisense and gitlens, AND conda environment detection and dynamic swapping, all in a single application for FREE is incredible. |
Do you mean that port forwarding for ssh is disabled on that server? Or are you able to forward some other port over an ssh connection to that server? |
Port forwarding for SSH is not disabled on any part of our cluster. I am not intentionally attempting to forward any other ports to the server. I was using To address that issue, and to clarify our workflow some, we are using Slurm. It is highly preferred to have tasks running within a job context so that login node resources aren't being consumed. To do that, we create a job using As a side note, it is also possible to provide the argument |
What code do you mean by "that code"? I don't think the issue you point to is related. We run the installer script essentially like It sounds more like you need a way to wrap the full installer script in a custom command, like |
Yes to your last question, ideally with the ability to customize the wrapping command. |
This would be a important feature for vscode-remote. I am currently trying to use vscode to run some interactive python code in a shared cluster and the only way of doing it is by using the srun command of slurm. I'll try to find a workaround, but I think there really is a user case for this feature request. |
I've got the same issue, but with using LSF instead of SLURM. Basically, the only way this can work is if all subprocesses for servers and user terminals are strictly forked children from the original seed shell acquired from LSF/SLURM/whatever job manager you are using. A hacky workaround may be to use something like Paramiko to start a mini-SSH server from the seed shell and then login to this mini server directly from VS Code (assuming there isn't a firewall blocking you, but obviously reverse SSH tunnels can be used to get around that). |
Another possible resolution to this issue is by enabling a direct connection to the remote server.
That way, no ssh is required at all and it can work on login-restricted hosts. |
A slight variant on this: I would like to be able to get the target address for SSH from a script (think cat'ing a file that is semi-frequently updated with the address of a dynamic resource). Currently I am using a |
@wwarriner Is the issue you are referring to the same one as the one on this stack overflow SO question? It sounds like we are having a similar problem, when I spin an interactive job and try to run my debugger, I can't do it because it goes back to the head node and tries to run things there. |
The problem is more serious than I thought. I can't run the debugger in the interactive session but I can't even "Run Without Debugging" without it switching to the Python Debug Console on it's own. So that means I have to run things manually with What I am doing is switching my terminal to the |
Am I reading this right that currently the only way to have the language server run a compute node rather than the head/login node is to modify (I'm also using slurm and the python language server eating up 300GB on the head node disrupts the whole department). |
I'm curious if this is on the roadmap for the near future. With my university going entirely remote for the foreseeable future, being able to use this extension to work on the cluster would be absolutely amazing. |
Yes, I also want this feature a lot with universities going remote due to COVID-19 |
how do you do that? Have you tried it? |
No capacity to address this in the near future but I am interested to hear how the cluster setup works for other users - if anyone is not using |
I guess this is related. I would like VS code clients (e.g., julia client) to have an option to start in the Slurm job I am currently in and not in the login node. |
Another option I've seen used at some high-performane computing centers is to use coder/code-server (as mentioned above), but rather than SSH tunnel to a code-server instance running in a (Slurm) job on a compute node, instead launch it as an Open OnDemand interactive app (customizing the example [OSC/bc_osc_codeserver](https://github.com/OSC/bc_osc_codeserver app for your HPC cluster's environment) and use OOD's reverse-proxy capabilities (described in the architecture page) to route HTTP traffic to the code-server process on the compute node. This approach has limitations (e.g., requires a sysadmin to update the code-server version, and it seems subsequent code-server versions manage to periodically break |
My use case has been resolved with the advent of Remote Tunnels. Here is my process, assuming code CLI is installed and on the path.
Looks like this "just works" now. I was able to connect to a new node on our cluster using a previous tunnel session stored locally. |
i don't understand. What happens if the sbatch jobs finished (i.e. got canceled, etc.) does the tunnel magically restart the job? |
this looks great! If it works reliable, this should be in VSCode Remote Extensions... |
@tzom I'll admit this isn't exactly the solution I had in mind when I created this issue 3.5 years ago, but it certainly eases the barrier to entry and greatly simplifies the workflow I use. Yes, a job needs to be created manually. On our system we can create sbatch jobs that run for 12 hours, which is plenty for a working day. I don't run long-running batch tasks in the same jobs I do development work, so I don't need the job for longer than that. @roblourens I do have one request that would be what I see as a natural extension of Tunnels. I see that Tunnels sends data through an Azure service. For our institution, this would be challenging to get approved as part of working with PHI/HIPAA and even what we classify as sensitive data, due to the "unknown" nature of that Azure intermediate. I am aware that the code is Open Source, but we can't see what is actually processing our requests, it is all taken on trust. To work with PHI, that's not enough. Is there a way we can have the simplicity of remote tunnels, but entirely contained within a service we control? |
Not currently, the only solution for that is SSH |
Thanks for the additional info @wwarriner!
This sounds like it would be tracked by this feature request: microsoft/vscode#168492. And if you'd like further info on how tunnels are secured, we have a section in our docs: https://code.visualstudio.com/docs/remote/tunnels#_how-are-tunnels-secured. |
@bamurtaugh Thank you for the link, I also found #7527 via the issue you linked, which appears to be requesting a feature like I'll keep an eye out for both! I appreciate the link for tunnel security. Personally it sounds reasonable, though I am not an expert. Perhaps the following will sound familiar. At our institution, any applications touching PHI and HIPAA data must be approved. Part of the approval is a security review, where the entire proposed network sequence is inspected. With a tunnel, routing information is no longer open to inspection by our firewall (which would raise eyebrows for PHI/HIPAA data), and being routed through a third-party domain. The simple solution is to not use a tunnel for PHI/HIPAA data. But I can see a future where development work could be adjacent enough to PHI/HIPAA data where this might come under scrutiny. Having the option to start our own SSH Server gives us a more palatable routing configuration. I would also hazard a guess that the folks at https://github.com/OSC/Open-OnDemand would be pleased at the idea of making the proposed VSCode SSH Server an interactive app on their service. I know that re-hosting might violate the current ToS, but a carve-out is something to think about for the future in terms of VSCode use within Academic Research Computing development. |
I used the suggestion of appealing people inside VScode, but I got the following error message, how should I solve the problem. The terminal process failed to launch: A native exception occurred during launch (forkpty(3) failed.). |
@George-du can you please share some more info of which suggestion / specific steps you tried? |
/ |
I've done it! I'm back with another ridiculous solution, but this one is hopefully better than my last one ... it's not finished yet but it at least allows commands to be run before the remote server is set up 🎉🥳 Given that you can define the path to your ssh binary in vscode's remote ssh settings, I decided to wrap it in a bash script. The script:
The code is in a repo here and is still very prototype. It may leave jobs running etc. I just wanted to share that I got the basics working. I understand this was a solved problem for many (running vscode connected to a compute node), but for particular Slurm configurations (with no pam_slurm_adopt or use_interactive_step setting), it would mean that when someone sshs into a compute node it wouldn't attach them to their job and cgroups wouldn't apply (I think that's the case ...). This behaviour makes it difficult to get vscode server running in the right place with the right cgroup restrictions. I'm planning on wrapping this up into an extension where you could interactively build the ssh_config host entry (including RemoteCommand) in a panel on the left hand side of vscode, and click connect, and it would do the steps I said above. I have no idea if this is possible though. A note to Microsoft: It would have really helped to have extension.js and localServer.js open sourced/developed on this GitHub repo. I instead had find the files myself and reformat the Javascript so I could study the connection process. I was saddened to see it come out at >50000 lines in one file. I'm not sure if single extension files are a requirement of vscode extensions, but it still made me sad 😣😢 |
@wwarriner I followed your steps and it works really nicely! But one issue is that every time I restart the computer node and the remote version of VS code, it asks me to authenticate my github or Microsoft account as shown in the below screenshot. Is there a way to avoid this? |
Thanks for this @simonbyrne! This is still the best solution for me so far. Only problem is the SLURM* environment variables are not passed to the proxy ssh session. Anyone solved this? |
@eugeneteoh We have implemented a hackish solution to get the #!/bin/bash
#SBATCH --time=01:00:00
#SBATCH --job-name="code-tunnel"
#SBATCH --signal=B:TERM@60 # tells the controller
# to send SIGTERM to the job 60 secs
# before its time ends to give it a
# chance for better cleanup.
cleanup() {
echo "Caught signal - removing SLURM env file"
rm -f ~/.code-tunnel-env.bash
}
# Trap the timeout signal (SIGTERM) and call the cleanup function
trap 'cleanup' SIGTERM
# store SLURM variables to file
env | awk -F= '$1~/^SLURM_/{print "export "$0}' > ~/.code-tunnel-env.bash
/usr/sbin/sshd -D -p 2222 -f /dev/null -h ${HOME}/.ssh/id_ecdsa &
wait Then have users add the following line to their # source slurm environment if we're connecting through code-tunnel
[ -f ~/.code-tunnel-env.bash ] && source ~/.code-tunnel-env.bash |
^ I just tested and it works. Thanks :) |
Hi all, feel free to ask me to be quiet about this in this thread, but I've managed to get my vscode slurm ssh wrapper script working really well now! It is a wrapper around
Finally this script is where I want it to be! I even battled through the Powershell to make a Windows version. Next I just need to make it into an extension that probes your cluster's slurm config and generates the appropriate cluster-specific ssh configs for different slurm resource combos on-the-fly in a GUI :-D |
I tried to run this script on a windows 10 laptop that connects via WSL2 to a Linux server and I just get a "Could not establish connection to "server_name": spawn UNKNOWN. The "Installation" was as described in the readme with the appropriate path changes in Remote-SSH: Settings -> Remote.SSH: Path and the changes to the ssh config. Using the.sh and the .ps1 scripts does not work, the error remains the same. I did not really work with a cluster before, so I do not really know where to even start to resolve these issues. |
Would you mind expanding on the full procedure. I'm not sure I understand. So the SBATCH part reserves resources for the vs-code server to run on and saves some string that helps identify the node we got that we later want the vs-code server to start running on. What do we do with the part?
What does something like that mean? Exactly this or should the real host be in there somewhere. Is Is a change in VS-code required (mentioned in the thread but not specifically in this solution)? |
Hi, sorry I haven't replied to this already, and feel free to open an issue in my repo to not clog up the thread. I did the windows version as a little extra/bonus (as not many people need it where I work) so it isn't as mature/robust as the linux solution. I have however pushed some fixes + some guidance for windows. It should be: edit |
This is great. Made me wonder whether this is possible and it seemingly is? I haven't tested it yet. |
On some HPC clusters, a password is needed to be able to proxyjump to the compute nodes. Is there not a way around this? Like somehow running the vscode server upon allocation? |
I'd hope it pops up a little window or you can type the password into the terminal? Without agent forwarding I have to type my password a bunch of times to get it to connect to the compute node. |
I want to be able to connect to our institution's cluster using VS Code Remote SSH without the server running on a compute node instead of the login node. The preferred workflow is to SSH into the login node and then use a command to allocate a job and spin up an interactive shell on a compute node, and then run any further tasks from there. VS Code Remote SSH doesn't appear to have a feature that facilitates this workflow. I want to be able to inject the spin-up command immediately after SSH'ing into the cluster, but before the VS Code server is set up on the cluster, and before any other tasks are run.
The text was updated successfully, but these errors were encountered: