-
-
Notifications
You must be signed in to change notification settings - Fork 3
Using SILNLP on the ORU Titan Server
To use ORU's Titan Server when running clearml experiments, you can simply set the queue name for the task to oru
. This will send it to the clearml agent on the ORU server, which will allocate a compute node and submit the task as an sbatch job. You'll still be able to view the experiment running in the ClearML Web UI and abort it if needed, just like normal.
By default, the time limit for each task is set to 18 hours. However, this can be changed by editing the task in the ClearML Web UI. In the task's CONFIGURATION > User Properties > Properties section, add a new property called time_limit
and provide the new time limit in the format hrs:min:sec (e.g. 01:00:00).
Login in at https://ood.orca.oru.edu/pun/sys/dashboard and start a Jupyter Lab session in the Interactive apps > Jupyter Notebook
tab using account "sil," partition "gpu," some number of hours, and 1 node. Once in the session, open up a terminal to complete the rest of the setup.
mkdir -p /home/user/.cache/silnlp/experiments
mkdir /home/user/.cache/silnlp/projects
Fill in your ClearML and AWS credentials in the corresponding variables.
echo 'export SIL_NLP_CACHE_EXPERIMENT_DIR="/home/user/.cache/silnlp/experiments"' >> ~/.bashrc
echo 'export SIL_NLP_CACHE_PROJECT_DIR="/home/user/.cache/silnlp/projects"' >> ~/.bashrc
echo 'export SIL_NLP_DATA_PATH="/aqua-ml-data"' >> ~/.bashrc
echo 'export CLEARML_API_HOST="https://api.sil.hosted.allegro.ai"' >> ~/.bashrc
echo 'export CLEARML_API_ACCESS_KEY="xxxxx"' >> ~/.bashrc
echo 'export CLEARML_API_SECRET_KEY="xxxxx"' >> ~/.bashrc
echo 'export AWS_ACCESS_KEY_ID="xxxxx"' >> ~/.bashrc
echo 'export AWS_SECRET_ACCESS_KEY="xxxxx"' >> ~/.bashrc
Instructions from https://docs.anaconda.com/free/miniconda/#quick-command-line-install.
mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-py38_23.11.0-2-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh
~/miniconda3/bin/conda init bash
Restart the terminal with exec "$SHELL"
so the environment variables and conda setup take effect. If the terminal launches in the base conda environment (if the command line is preceded by (base)
), exit out of it with conda deactivate
before creating the silnlp conda environment.
conda create -n silnlp python=3.8.10
conda activate silnlp
echo 'export PYTHONPATH=' >> ~/.bashrc
curl -sSL https://install.python-poetry.org | python3 - --version 1.7.1
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
conda install git
git clone https://github.com/sillsdev/silnlp.git
cd silnlp
poetry install
After completing the setup steps, restart the terminal again (exec "$SHELL"
). Each time you open a new terminal or start a new session, you will automatically be put into the base conda environment. To switch to the silnlp environment, run conda dectivate
followed by conda activate silnlp
.
You will have to disable gradient checkpointing for experiments to run on the Titan server, but by the time someone reads this, that might not be true.