This document serves as a guide for server configuration in NUS School of Computing. This is mostly used by Prof. Bressan's research group.
Any connection to the servers listed below can be done as following:
- Directly when you are in the School of Computing network. Not possible outside SoC (even non-SoC NUS network)
- Access via an intermediary server (suna for staff or sunfire for student). You need to have an SoC account for this (for example: ssh username@sunfire.comp.nus.edu.sg, where username is your SoC account). Not possible outside Singapore (?)
- Access through SoC VPN (https://webvpn.comp.nus.edu.sg/). You need to have NUSNET account for this. Available anywhere in the world (?). Reference: https://dochub.comp.nus.edu.sg/cf/guides/network/vpn (tag: forticlient)
- Suyeong (suyeong.d2.comp.nus.edu.sg) - 2 GeForce GTX1080 GPUs, 64 GB RAM, 12 processors
- Yuna (yuna.d2.comp.nus.edu.sg) - 1 GeForce GTX1080Ti GPU, 64 GB RAM, 16 processors
- Fajrian (fajrian-e2s2-1.d1.comp.nus.edu.sg) - 1 GPU GeForce GTX1080Ti, 32 GB RAM, 8 processors
- taeyeon (taeyeon.d2.comp.nus.edu.sg) - 1 A100 PCIe Tensor Core GPU, 64 processors
-
Generate ssh key
Windows: Step 1 of https://system.cs.kuleuven.be/cs/system/security/ssh/setupkeys/putty-with-key.html#s1 Then in puttygen, select the
Conversions
tab and convert the format of the key. This is done byimport
-ing your current.ppk
file, thenExport
asOpenSSH key
(again inConversions
tab).OR Windows: in powershell, use the command
ssh-keygen
. Navigate to the folder where your ssh keys have been generated. You should see two files. The first isid_rsa
and the second isid_rsa.pub
. The latter is your public key.Linux and Mac: https://www.ssh.com/ssh/keygen/
-
Give public key (e.g. id_rsa.pub) to the system administrator.
-
Wait for system administrator to add you to the server.
-
SSH to the server.
Windows: Step 3 of https://system.cs.kuleuven.be/cs/system/security/ssh/setupkeys/putty-with-key.html#s3
Linux and Mac: ssh username@suyeong.d2.comp.nus.edu.sg in terminal.
-
You won't have sudo access. Contact system administrator if there is a need to install something on the server with sudo access.
-
Check the server environment setup below.
Users may use this vpn to access the SOC network: https://webvpn.comp.nus.edu.sg/
Try to use vim or gedit for the editor. You can also use Visual Studio code with remote setup to the server.
For Windows or Mac, you can use Filezilla (https://filezilla-project.org/) for file transferring between server and your local computer.
Try not to use the system's Python. We recommend to use pyenv.
Follow the "Basic GitHub checkout" installation here: https://github.com/pyenv/pyenv#installation and install the Python versions that you want.
After that you can use pip command without sudo access.
Do not forget to use pip install tensorflow-gpu to use GPU with Tensorflow.
If you want to use Jupyter notebook on the server, you need to do a port forwarding.
-
On the server, start the IPython notebooks server:
jupyter notebook --no-browser --port=8889
-
On your computer, start an SSH tunnel:
ssh -N -f -L localhost:8888:localhost:8889 remote_user@remote_host
-
Now open your browser on the computer and type in the address bar:
localhost:8888
Check the GPU usage by using the command nvidia-smi. If you are using Tensorflow or PyTorch, try to not to fully use the GPU so we can share with other users (use CUDA visible devices or dynamic GPU allocation in Tensorflow).
If there are multiple GPUs, consider setting memory growth on one of the GPUs following:
physical_devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], True)
https://stackoverflow.com/questions/60048292/how-to-set-dynamic-memory-growth-on-tf-2-1
If there is only one GPU, consider setting memory growth on that GPU, to use only a fraction of the GPU, following:
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
https://stackoverflow.com/questions/34199233/how-to-prevent-tensorflow-from-allocating-the-totality-of-a-gpu-memory
to check that the memory growth is limited:
tf.config.experimental.get_memory_usage('GPU:0')
https://www.tensorflow.org/api_docs/python/tf/config/experimental/get_memory_usage
- Create user: sudo adduser username by adding a new file for the user in root
- Copy ssh public key from user to /home/username/.ssh/authorized_keys (ensure public key is formatted
ssh_rsa <public key, in one line> <username>
, then usenano authorized_keys
) - Add username to the end of the line of /etc/ssh/sshd_config
- Restart sshd: sudo systemctl restart ssh.service
- mkdir /home/user/.ssh
- chmod 700 /home/user/.ssh
- cd /home/user/.ssh
- nano authorized_keys (this creates the authorized_keys file)
- chmod 600 authorized_keys
- chown -R user:user /home/user/.ssh
- sudo systemctl restart ssh.service
- Create a virtual environment
python3 -m venv <name_of_venv>
exec "$SHELL"
source <name_of_venv>/bin/activate
pip install jupyter
(and any other libraries)ipython kernel install --user --name=<name_of_venv>
jupyter notebook --no-browser --port xxxx
Source: https://dochub.comp.nus.edu.sg/cf/services/compute-cluster
Prerequisite: Have SoC account (automatically created for you if you are an SoC student/staff, you need to request to the Prof if you are a guest)
- Open https://mysoc.nus.edu.sg/app/myacct/
- Go to "My SoC Resources" menu.
- Select "Services".
- Enable "The SoC Compute Cluster" service.
The server that you can login into can be seen here: https://dochub.comp.nus.edu.sg/cf/guides/compute-cluster/hardware
If you are outside SoC network, access options are via an intermediary server (such as suna or sunfire) OR use SoC VPN to make the initial virtual jump into SoC first.
Guide for running MPI program: https://dochub.comp.nus.edu.sg/cf/guides/compute-cluster/tembusu/mpi
- Go to root, and check usage via
df -lh
. - Create and move files to
/mnt/data
for the user a. Make directory for the usermkdir /mnt/data/<user>
b. Move files (for example, moving all files for one usermv <user> /mnt/data
) c. Change the folder's permission to the userchown <user>:<user> /mnt/data/<user>
- Use
du -sh *
to check the server memory usage of individual users
du -a /home |sort -n -r |head -n 10