Docs currently suggest to use VSCode in the bastion node #62

pierreglaser · 2024-07-09T08:55:56Z

Hey, and thanks for putting this documentation together! It maybe worth bookmarking the link on SWC's #computing channel to make it easily discoverable.

In the "Remote development" section of the docs, it is more or less suggested to use VSCode in the bastion node:

Then, when you click on the “Open a Remote Window” button in the bottom left corner of the VS Code window, you will see a list of the SSH hosts you have configured in your ~/.ssh/config file. You can then select the host you want to connect to - e.g. swc-gateway.

But if many users start doing this, the bastion node could run into memory errors due to many memory-hungry VSCode apps being opened at the same time. Could the docs be updated to instead recommend and explain how to set-up VSCode into a compute node? A guide to do exactly this was actually put together by Cristofer Holobetz (to find it, search "cristofer holobetz pdf" on slack). Thanks!

niksirbi · 2024-07-09T10:09:53Z

Thanks @pierreglaser, we've also noticed the issue with our remote development guide, and we're discussing changes to that here.

We are considering to update or even remove that section, because it's indeed misleading.

Regarding Cristofer's guide, I have some reservations. It's indeed possible to ssh into a compute node and do remote develpment via VSCode in this way. However, this way of running jobs is not really controlled/limited via SLURM and could lead to consuming all resources in a node. At least that's what I've understood from my discussions with @adamltyson on this topic.

It would be great to come up a remote development solution that doesn't burden the bastion node and also respects SLURM.

niksirbi · 2024-07-09T10:19:39Z

Other potential solutions for remote development:

Some version of what you describe here with JupyterLab. I've tried these instructions, and they work. They should also respect SLURM resource allocation because you're are starting the JupyterLab session with srun.
Using a Jupyer notebook/lab app running via OpenOnDemand. @lauraporta in our team has had success with this and it could be the more user-friendly way to go, provided that we work out the kinks and document the workflow.

pierreglaser · 2024-07-09T10:36:13Z

Thanks for the quick answer!

However, this way of running jobs is not really controlled/limited via SLURM and could lead to consuming all resources in a node. At least that's what I've understood from my discussions with @adamltyson on this topic.

I'm not sure about this: Cristofer's tutorial clearly states that the node in which VSCode will be started should be obtained through SLURM, using srun for instance. Did you have anything else in mind? I think that this part of the docs is very useful and I recommend keeping it, unless there are clear drawbacks (which right now I don't see once this bastion node issue is addressed).

adamltyson · 2024-07-09T10:45:23Z

Very possible that I'm wrong, but I don't understand how just by SSH-ing to a node, somehow that workload is therefore monitored by the SLURM job scheduler. It may be possible if you request the entire node, but then I'm not sure if SLURM will be able to kill the job etc.

niksirbi · 2024-07-09T10:51:58Z

I think if you start an interactive job via srun, and then ssh into it using the node name, you are indeed using the node you requested, but I don't know if the constraints on memory, cores etc are respected in that case. For example if you do srun --mem 8G and then ssh into that node, what guarantees that you won't exceed the 8G?

Anyhow, I'll message Cristofer so he can also participate in the discussion. I think he may have asked the scientific computing team about this.

lauraporta · 2024-07-09T11:07:44Z

In the case that @niksirbi is right, I've found another possible solution: start a job that runs sshd and connect vscode to it. In this way the resources used by vscode will be effectively controlled by slurm. I didn't test this solution yet.
Also, open ondemand offers code server: vscode accessed via the browser running within a slurm job. I was interested in installing it some time ago.

pierreglaser · 2024-07-09T11:11:50Z

but I don't understand how just by SSH-ing to a node, somehow that workload is therefore monitored by the SLURM job scheduler

The VSCode app has to be ran in a compute node obtained through slurm, as stated in cristofer's tutorial. However, to port forward VSCode back to your local machine, you have to start an ssh process from your machine to the compute node. This ssh process won't run anything, it just allows ports to be forwarded, which requires much more complex solutions to be done via slurm.

For example if you do srun --mem 8G and then ssh into that node, what guarantees that you won't exceed the 8G?

As the slurm app on the SWC cluster is currently configured, there is a memory limit enforcment mechanism through the cgroup plugin, so yes, you are guaranteed to not exceed 8G.

niksirbi · 2024-07-10T18:52:21Z

You may very well be right @pierreglaser, but I don't sufficiently understand the internals of SLURM, cgroup, VScode's remote ssh plugin and their interactions to be confident about it. We may have to do some tests to confirm and consult Alex and John about it. Assuming we can confirm this, I'm happy to update the VSCode instructions according to @cristofer-holobetz 's guide.

pierreglaser · 2024-07-11T10:32:29Z

Mmmh. Note that regardless of whether I'm correct, Cristofer's solution is an improvement over what is currently officially suggested (use VSCode on the login node). So not sure why we should delay moving forward with this solution.

adamltyson · 2024-07-11T10:36:32Z

I think it's important we only document things we know to be correct. It's unlikely that users will regularly consult this documentation to change their workflows.

@niksirbi for now, shall we just remove this section?

niksirbi · 2024-07-11T10:55:50Z

For now I suggest the following:

Remove the "remote development" section for now, to minimise the damage from additional people reading it and applying it as is
Apply the other small fixes suggested in Updates to ssh howto guide #61 that could help with reducing the workload on the login nodes
Open a new issue for coming up with a proper long-form "Remote development with VSCode" guide, with input from Pierre and Cristofer

I can get this done this week if you agree.

adamltyson · 2024-07-11T11:04:52Z

Sounds good to me. Thanks!

pierreglaser · 2024-07-12T20:15:29Z

I looked more into this issue: it turns out VSCode remote SSH mode does not use SLURM. Cristofer tutorial required you to ask for a compute node via slurm prior to connecting to the said node using VSCode, which led me to think otherwise. But this step is not required as VSCode just starts its own ssh connection.

As the link @lauraporta referenced link shows, this seems to be a well-documented issue on the VSCode side with only partial fixes existing. One option uses sshd within a slurm-allocated compute node (the one Laura mentioned), but the SLURM environment variables are not inherited by the new connections, and require additional hacks to be fully functional, so not ideal.

Another option is to use code-server (a program which serves VSCode in a webapp) in compute nodes, and use VSCode in your local machine's browser. Unlike other alternatives, the steps to get setup are very simple:

install code-server by downloading the binaries (alternatively we could ask IT to set it up globally on all nodes)

export VERSION=4.91.0
mkdir -p ~/.local/lib ~/.local/bin
curl -fL https://github.com/coder/code-server/releases/download/v$VERSION/code-server-$VERSION-linux-amd64.tar.gz \
  | tar -C ~/.local/lib -xz
mv ~/.local/lib/code-server-$VERSION-linux-amd64 ~/.local/lib/code-server-$VERSION
ln -s ~/.local/lib/code-server-$VERSION/bin/code-server ~/.local/bin/code-server
PATH="~/.local/bin:$PATH"

ask slurm for an interactive node via srun (srun --pty /bin/bash -l), find it's host name
start code-server on some port (code-server --bind-addr=localhost:8081)
port forward between the compute node and your machine's: ssh pierreg@<compute-node-hostname> -J hpc-gw1 -N -L 8081:localhost:8081
open localhost:8081 on google chrome (flawless code-server UI for as much as I tried. I tried other browsers prior to this one and the code-server UI was buggy).

The in-browser experience is pretty-much feature-complete since VSCode is ran under the hood (You can install extensions, start a terminal etc).

code-server seems widespread, and the solution is both robust and respects SLURM. WDYT?

niksirbi · 2024-07-15T07:42:23Z

Thanks a lot for investigating this @pierreglaser!

I gave it a shot and it indeed seems to work just fine (incl. from Firefox). This is definitely an improvement on the previous guide, so I'll write up something and have it tested by a few more people.

If all seems well we can ask IT to centrally install code-server, which will make the instructions even simpler.

adamltyson · 2024-07-15T10:19:32Z

If we're asking IT to install stuff centrally, is it worth just going straight for VSCode via OOD?

niksirbi · 2024-07-15T14:36:26Z

Well the two things are complementary. If people would like to use a VSCode app via OOD, the IT would have to anyway centrally install code-server and then link it to OOD.
Pierre's instructions provide a way to use code-server directly, without making it an OOD app. In a way OOD is an abstraction layer that will make this procedure more user-friendly, and can additionally serve as an entry-point to other apps like Jupyter Lab.

So installing code-server (+ the how to guide that comes with it) is a stepping stone towards full OOD functionality, not opposed to it.

adamltyson · 2024-07-15T14:40:09Z

Cool, I assumed that the existing VSCode OOD app worked some other way, so it would be duplicating effort for IT.

niksirbi · 2024-07-15T14:55:48Z

From what I can find browing online, using code-server seems to be the most popular choice for creating a VSCode app for OOD, see https://discourse.openondemand.org/t/vscode-showcase/2256

pierreglaser added the enhancement New feature or request label Jul 9, 2024

niksirbi mentioned this issue Jul 9, 2024

Updates to ssh howto guide #61

Closed

niksirbi mentioned this issue Jul 15, 2024

Update ssh setup guide #64

Merged

3 tasks

lauraporta mentioned this issue Nov 12, 2024

Add guide for vscode with slurm #67

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docs currently suggest to use VSCode in the bastion node #62

Docs currently suggest to use VSCode in the bastion node #62

pierreglaser commented Jul 9, 2024 •

edited

Loading

niksirbi commented Jul 9, 2024

niksirbi commented Jul 9, 2024

pierreglaser commented Jul 9, 2024

adamltyson commented Jul 9, 2024

niksirbi commented Jul 9, 2024 •

edited

Loading

lauraporta commented Jul 9, 2024

pierreglaser commented Jul 9, 2024 •

edited

Loading

niksirbi commented Jul 10, 2024

pierreglaser commented Jul 11, 2024

adamltyson commented Jul 11, 2024

niksirbi commented Jul 11, 2024

adamltyson commented Jul 11, 2024

pierreglaser commented Jul 12, 2024 •

edited

Loading

niksirbi commented Jul 15, 2024 •

edited

Loading

adamltyson commented Jul 15, 2024

niksirbi commented Jul 15, 2024

adamltyson commented Jul 15, 2024

niksirbi commented Jul 15, 2024

Docs currently suggest to use VSCode in the bastion node #62

Docs currently suggest to use VSCode in the bastion node #62

Comments

pierreglaser commented Jul 9, 2024 • edited Loading

niksirbi commented Jul 9, 2024

niksirbi commented Jul 9, 2024

pierreglaser commented Jul 9, 2024

adamltyson commented Jul 9, 2024

niksirbi commented Jul 9, 2024 • edited Loading

lauraporta commented Jul 9, 2024

pierreglaser commented Jul 9, 2024 • edited Loading

niksirbi commented Jul 10, 2024

pierreglaser commented Jul 11, 2024

adamltyson commented Jul 11, 2024

niksirbi commented Jul 11, 2024

adamltyson commented Jul 11, 2024

pierreglaser commented Jul 12, 2024 • edited Loading

niksirbi commented Jul 15, 2024 • edited Loading

adamltyson commented Jul 15, 2024

niksirbi commented Jul 15, 2024

adamltyson commented Jul 15, 2024

niksirbi commented Jul 15, 2024

pierreglaser commented Jul 9, 2024 •

edited

Loading

niksirbi commented Jul 9, 2024 •

edited

Loading

pierreglaser commented Jul 9, 2024 •

edited

Loading

pierreglaser commented Jul 12, 2024 •

edited

Loading

niksirbi commented Jul 15, 2024 •

edited

Loading