Skip to content

Commit

Permalink
Converts container.sh into Python utilities (#539)
Browse files Browse the repository at this point in the history
This PR replicates the capabilities of `container.sh` in Python, and
breaks its capabilities up into several files under a new directory
`/docker/isaaclab_container_utils`, as well as a superior interface in
`container.py`. The intention of this change is to make our
container-mediating code more easily readable, debuggable, and
modifiable. It is also done in the hopes that it can be more easily
distributed as we see a desire from users to [compose and modify our
setup](#455). It also has the
additional benefit of needing fewer sudo installs because of Python's
native yaml handling.

The central class, `IsaacLabContainerInterface`, contains a lot of the
original utility of the script, and several of the current
`container.py` scripts options simply configure it and call a method.
@pascal-roth `apptainer_utils.py` and the `./container.py job/push`
logic are separated out, I'm curious what you think of this delineation.
I also haven't been able to fully test that end of things as I don't
have a cluster to use, though I did verify that it worked to the extent
I could.

I will update the docs when I have received approval

<!-- As you go through the list, delete the ones that are not
applicable. -->

- Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- This change requires a documentation update

- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./isaaclab.sh --format`
- [x] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have run all the tests with `./isaaclab.sh --test` and they pass
- [ ] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [x] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there

---------

Signed-off-by: Hunter Hansen <50837800+hhansen-bdai@users.noreply.github.com>
Signed-off-by: James Smith <142246516+jsmith-bdai@users.noreply.github.com>
Co-authored-by: James Smith <142246516+jsmith-bdai@users.noreply.github.com>
Co-authored-by: David Hoeller <dhoeller@nvidia.com>
  • Loading branch information
3 people authored and Mayankm96 committed Aug 14, 2024
1 parent 1b8e2c0 commit f565c33
Show file tree
Hide file tree
Showing 12 changed files with 796 additions and 479 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
**/*.sif
docker/exports/
docker/.container.yaml
docker/.container.cfg

# IDE
**/.idea/
Expand Down
21 changes: 0 additions & 21 deletions docker/.env.base
Original file line number Diff line number Diff line change
Expand Up @@ -12,24 +12,3 @@ DOCKER_ISAACSIM_ROOT_PATH=/isaac-sim
DOCKER_ISAACLAB_PATH=/workspace/isaaclab
# Docker user directory - by default this is the root user's home directory
DOCKER_USER_HOME=/root

###
# Cluster specific settings
###

# Job scheduler used by cluster.
# Currently supports PBS and SLURM
CLUSTER_JOB_SCHEDULER=SLURM
# Docker cache dir for Isaac Sim (has to end on docker-isaac-sim)
# e.g. /cluster/scratch/$USER/docker-isaac-sim
CLUSTER_ISAAC_SIM_CACHE_DIR=/some/path/on/cluster/docker-isaac-sim
# Isaac Lab directory on the cluster (has to end on isaaclab)
# e.g. /cluster/home/$USER/isaaclab
CLUSTER_ISAACLAB_DIR=/some/path/on/cluster/isaaclab
# Cluster login
CLUSTER_LOGIN=username@cluster_ip
# Cluster scratch directory to store the SIF file
# e.g. /cluster/scratch/$USER
CLUSTER_SIF_PATH=/some/path/on/cluster/
# Python executable within Isaac Lab directory to run with the submitted job
CLUSTER_PYTHON_EXECUTABLE=source/standalone/workflows/rsl_rl/train.py
17 changes: 17 additions & 0 deletions docker/cluster/.env.cluster
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
###
# Cluster specific settings
###

# Docker cache dir for Isaac Sim (has to end on docker-isaac-sim)
# e.g. /cluster/scratch/$USER/docker-isaac-sim
CLUSTER_ISAAC_SIM_CACHE_DIR=/some/path/on/cluster/docker-isaac-sim
# Isaac Lab directory on the cluster (has to end on isaaclab)
# e.g. /cluster/home/$USER/isaaclab
CLUSTER_ISAACLAB_DIR=/some/path/on/cluster/isaaclab
# Cluster login
CLUSTER_LOGIN=username@cluster_ip
# Cluster scratch directory to store the SIF file
# e.g. /cluster/scratch/$USER
CLUSTER_SIF_PATH=/some/path/on/cluster/
# Python executable within Isaac Lab directory to run with the submitted job
CLUSTER_PYTHON_EXECUTABLE=source/standalone/workflows/rsl_rl/train.py
188 changes: 188 additions & 0 deletions docker/cluster/cluster_interface.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,188 @@
#!/usr/bin/env bash

#==
# Configurations
#==

# Exits if error occurs
set -e

# Set tab-spaces
tabs 4

# get script directory
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"

#==
# Functions
#==
# Function to check docker versions
# If docker version is more than 25, the script errors out.
check_docker_version() {
# check if docker is installed
if ! command -v docker &> /dev/null; then
echo "[Error] Docker is not installed! Please check the 'Docker Guide' for instruction." >&2;
exit 1
fi
# Retrieve Docker version
docker_version=$(docker --version | awk '{ print $3 }')
apptainer_version=$(apptainer --version | awk '{ print $3 }')

# Check if version is above 25.xx
if [ "$(echo "${docker_version}" | cut -d '.' -f 1)" -ge 25 ]; then
echo "[ERROR]: Docker version ${docker_version} is not compatible with Apptainer version ${apptainer_version}. Exiting."
exit 1
else
echo "[INFO]: Building singularity with docker version: ${docker_version} and Apptainer version: ${apptainer_version}."
fi
}

# Checks if a docker image exists, otherwise prints warning and exists
check_image_exists() {
image_name="$1"
if ! docker image inspect $image_name &> /dev/null; then
echo "[Error] The '$image_name' image does not exist!" >&2;
echo "[Error] You might be able to build it with /IsaacLab/docker/container.py." >&2;
exit 1
fi
}

# Check if the singularity image exists on the remote host, otherwise print warning and exit
check_singularity_image_exists() {
image_name="$1"
if ! ssh "$CLUSTER_LOGIN" "[ -f $CLUSTER_SIF_PATH/$image_name.tar ]"; then
echo "[Error] The '$image_name' image does not exist on the remote host $CLUSTER_LOGIN!" >&2;
exit 1
fi
}

submit_job() {

echo "[INFO] Arguments passed to job script ${@}"

case $CLUSTER_JOB_SCHEDULER in
"SLURM")
CMD=sbatch
job_script_file=submit_job_slurm.sh
;;
"PBS")
CMD=bash
job_script_file=submit_job_pbs.sh
;;
*)
echo "[ERROR] Unsupported job scheduler specified: '$CLUSTER_JOB_SCHEDULER'. Supported options are: ['SLURM', 'PBS']"
exit 1
;;
esac

ssh $CLUSTER_LOGIN "cd $CLUSTER_ISAACLAB_DIR && $CMD $CLUSTER_ISAACLAB_DIR/docker/cluster/$job_script_file \"$CLUSTER_ISAACLAB_DIR\" \"isaac-lab-$profile\" ${@}"
}

#==
# Main
#==

#!/bin/bash

help() {
echo -e "\nusage: $(basename "$0") [-h] <command> [<profile>] [<job_args>...] -- Utility for interfacing between IsaacLab and compute clusters."
echo -e "\noptions:"
echo -e " -h Display this help message."
echo -e "\ncommands:"
echo -e " push [<profile>] Push the docker image to the cluster."
echo -e " job [<profile>] [<job_args>] Submit a job to the cluster."
echo -e "\nwhere:"
echo -e " <profile> is the optional container profile specification. Defaults to 'base'."
echo -e " <job_args> are optional arguments specific to the job command."
echo -e "\n" >&2
}

# Parse options
while getopts ":h" opt; do
case ${opt} in
h )
help
exit 0
;;
\? )
echo "Invalid option: -$OPTARG" >&2
help
exit 1
;;
esac
done
shift $((OPTIND -1))

# Check for command
if [ $# -lt 1 ]; then
echo "Error: Command is required." >&2
help
exit 1
fi

command=$1
shift
profile="base"

case $command in
push)
if [ $# -gt 1 ]; then
echo "Error: Too many arguments for push command." >&2
help
exit 1
fi
[ $# -eq 1 ] && profile=$1
echo "Executing push command"
[ -n "$profile" ] && echo "Using profile: $profile"
if ! command -v apptainer &> /dev/null; then
echo "[INFO] Exiting because apptainer was not installed"
echo "[INFO] You may follow the installation procedure from here: https://apptainer.org/docs/admin/main/installation.html#install-ubuntu-packages"
exit
fi
# Check if Docker image exists
check_image_exists isaac-lab-$profile:latest
# Check if Docker version is greater than 25
check_docker_version
# source env file to get cluster login and path information
source $SCRIPT_DIR/.env.cluster
# make sure exports directory exists
mkdir -p /$SCRIPT_DIR/exports
# clear old exports for selected profile
rm -rf /$SCRIPT_DIR/exports/isaac-lab-$profile*
# create singularity image
# NOTE: we create the singularity image as non-root user to allow for more flexibility. If this causes
# issues, remove the --fakeroot flag and open an issue on the IsaacLab repository.
cd /$SCRIPT_DIR/exports
APPTAINER_NOHTTPS=1 apptainer build --sandbox --fakeroot isaac-lab-$profile.sif docker-daemon://isaac-lab-$profile:latest
# tar image (faster to send single file as opposed to directory with many files)
tar -cvf /$SCRIPT_DIR/exports/isaac-lab-$profile.tar isaac-lab-$profile.sif
# make sure target directory exists
ssh $CLUSTER_LOGIN "mkdir -p $CLUSTER_SIF_PATH"
# send image to cluster
scp $SCRIPT_DIR/exports/isaac-lab-$profile.tar $CLUSTER_LOGIN:$CLUSTER_SIF_PATH/isaac-lab-$profile.tar
;;
job)
[ $# -ge 1 ] && profile=$1 && shift
job_args="$@"
echo "Executing job command"
[ -n "$profile" ] && echo "Using profile: $profile"
[ -n "$job_args" ] && echo "Job arguments: $job_args"
source $SCRIPT_DIR/.env.cluster
# Check if singularity image exists on the remote host
check_singularity_image_exists isaac-lab-$profile
# make sure target directory exists
ssh $CLUSTER_LOGIN "mkdir -p $CLUSTER_ISAACLAB_DIR"
# Sync Isaac Lab code
echo "[INFO] Syncing Isaac Lab code..."
rsync -rh --exclude="*.git*" --filter=':- .dockerignore' /$SCRIPT_DIR/.. $CLUSTER_LOGIN:$CLUSTER_ISAACLAB_DIR
# execute job script
echo "[INFO] Executing job script..."
# check whether the second argument is a profile or a job argument
submit_job $job_args
;;
*)
echo "Error: Invalid command: $command" >&2
help
exit 1
;;
esac
1 change: 1 addition & 0 deletions docker/cluster/run_singularity.sh
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ setup_directories() {
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"

# load variables to set the Isaac Lab path on the cluster
source $SCRIPT_DIR/.env.cluster
source $SCRIPT_DIR/../.env.base

# make sure that all directories exists in cache directory
Expand Down
69 changes: 69 additions & 0 deletions docker/container.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
#!/usr/bin/env python3

# Copyright (c) 2022-2024, The Isaac Lab Project Developers.
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause

import argparse
import shutil
from pathlib import Path

from utils import x11_utils
from utils.isaaclab_container_interface import IsaacLabContainerInterface


def main():
parser = argparse.ArgumentParser(description="Utility for using Docker with Isaac Lab.")
subparsers = parser.add_subparsers(dest="command", required=True)

# We have to create a separate parent parser for common options to our subparsers
parent_parser = argparse.ArgumentParser(add_help=False)
parent_parser.add_argument("profile", nargs="?", default="base", help="Optional container profile specification.")

subparsers.add_parser(
"start", help="Build the docker image and create the container in detached mode.", parents=[parent_parser]
)
subparsers.add_parser(
"enter", help="Begin a new bash process within an existing Isaac Lab container.", parents=[parent_parser]
)
subparsers.add_parser(
"copy", help="Copy build and logs artifacts from the container to the host machine.", parents=[parent_parser]
)
subparsers.add_parser("stop", help="Stop the docker container and remove it.", parents=[parent_parser])

args = parser.parse_args()

if not shutil.which("docker"):
raise RuntimeError("Docker is not installed! Please check the 'Docker Guide' for instruction.")

# Creating container interface
ci = IsaacLabContainerInterface(context_dir=Path(__file__).resolve().parent, profile=args.profile)

print(f"[INFO] Using container profile: {ci.profile}")
if args.command == "start":
print(f"[INFO] Building the docker image and starting the container {ci.container_name} in the background...")
x11_outputs = x11_utils.x11_check(ci.statefile)
if x11_outputs is not None:
(x11_yaml, x11_envar) = x11_outputs
ci.add_yamls += x11_yaml
ci.environ.update(x11_envar)
ci.start()
elif args.command == "enter":
print(f"[INFO] Entering the existing {ci.container_name} container in a bash session...")
x11_utils.x11_refresh(ci.statefile)
ci.enter()
elif args.command == "copy":
print(f"[INFO] Copying artifacts from the 'isaac-lab-{ci.container_name}' container...")
ci.copy()
print("\n[INFO] Finished copying the artifacts from the container.")
elif args.command == "stop":
print(f"[INFO] Stopping the launched docker container {ci.container_name}...")
ci.stop()
x11_utils.x11_cleanup(ci.statefile)
else:
raise RuntimeError(f"Invalid command provided: {args.command}")


if __name__ == "__main__":
main()
Loading

0 comments on commit f565c33

Please sign in to comment.