-
Notifications
You must be signed in to change notification settings - Fork 932
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Converts container.sh into Python utilities (#539)
This PR replicates the capabilities of `container.sh` in Python, and breaks its capabilities up into several files under a new directory `/docker/isaaclab_container_utils`, as well as a superior interface in `container.py`. The intention of this change is to make our container-mediating code more easily readable, debuggable, and modifiable. It is also done in the hopes that it can be more easily distributed as we see a desire from users to [compose and modify our setup](#455). It also has the additional benefit of needing fewer sudo installs because of Python's native yaml handling. The central class, `IsaacLabContainerInterface`, contains a lot of the original utility of the script, and several of the current `container.py` scripts options simply configure it and call a method. @pascal-roth `apptainer_utils.py` and the `./container.py job/push` logic are separated out, I'm curious what you think of this delineation. I also haven't been able to fully test that end of things as I don't have a cluster to use, though I did verify that it worked to the extent I could. I will update the docs when I have received approval <!-- As you go through the list, delete the ones that are not applicable. --> - Breaking change (fix or feature that would cause existing functionality to not work as expected) - This change requires a documentation update - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have run all the tests with `./isaaclab.sh --test` and they pass - [ ] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there --------- Signed-off-by: Hunter Hansen <50837800+hhansen-bdai@users.noreply.github.com> Signed-off-by: James Smith <142246516+jsmith-bdai@users.noreply.github.com> Co-authored-by: James Smith <142246516+jsmith-bdai@users.noreply.github.com> Co-authored-by: David Hoeller <dhoeller@nvidia.com>
- Loading branch information
Showing
12 changed files
with
796 additions
and
479 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -26,6 +26,7 @@ | |
**/*.sif | ||
docker/exports/ | ||
docker/.container.yaml | ||
docker/.container.cfg | ||
|
||
# IDE | ||
**/.idea/ | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
### | ||
# Cluster specific settings | ||
### | ||
|
||
# Docker cache dir for Isaac Sim (has to end on docker-isaac-sim) | ||
# e.g. /cluster/scratch/$USER/docker-isaac-sim | ||
CLUSTER_ISAAC_SIM_CACHE_DIR=/some/path/on/cluster/docker-isaac-sim | ||
# Isaac Lab directory on the cluster (has to end on isaaclab) | ||
# e.g. /cluster/home/$USER/isaaclab | ||
CLUSTER_ISAACLAB_DIR=/some/path/on/cluster/isaaclab | ||
# Cluster login | ||
CLUSTER_LOGIN=username@cluster_ip | ||
# Cluster scratch directory to store the SIF file | ||
# e.g. /cluster/scratch/$USER | ||
CLUSTER_SIF_PATH=/some/path/on/cluster/ | ||
# Python executable within Isaac Lab directory to run with the submitted job | ||
CLUSTER_PYTHON_EXECUTABLE=source/standalone/workflows/rsl_rl/train.py |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,188 @@ | ||
#!/usr/bin/env bash | ||
|
||
#== | ||
# Configurations | ||
#== | ||
|
||
# Exits if error occurs | ||
set -e | ||
|
||
# Set tab-spaces | ||
tabs 4 | ||
|
||
# get script directory | ||
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )" | ||
|
||
#== | ||
# Functions | ||
#== | ||
# Function to check docker versions | ||
# If docker version is more than 25, the script errors out. | ||
check_docker_version() { | ||
# check if docker is installed | ||
if ! command -v docker &> /dev/null; then | ||
echo "[Error] Docker is not installed! Please check the 'Docker Guide' for instruction." >&2; | ||
exit 1 | ||
fi | ||
# Retrieve Docker version | ||
docker_version=$(docker --version | awk '{ print $3 }') | ||
apptainer_version=$(apptainer --version | awk '{ print $3 }') | ||
|
||
# Check if version is above 25.xx | ||
if [ "$(echo "${docker_version}" | cut -d '.' -f 1)" -ge 25 ]; then | ||
echo "[ERROR]: Docker version ${docker_version} is not compatible with Apptainer version ${apptainer_version}. Exiting." | ||
exit 1 | ||
else | ||
echo "[INFO]: Building singularity with docker version: ${docker_version} and Apptainer version: ${apptainer_version}." | ||
fi | ||
} | ||
|
||
# Checks if a docker image exists, otherwise prints warning and exists | ||
check_image_exists() { | ||
image_name="$1" | ||
if ! docker image inspect $image_name &> /dev/null; then | ||
echo "[Error] The '$image_name' image does not exist!" >&2; | ||
echo "[Error] You might be able to build it with /IsaacLab/docker/container.py." >&2; | ||
exit 1 | ||
fi | ||
} | ||
|
||
# Check if the singularity image exists on the remote host, otherwise print warning and exit | ||
check_singularity_image_exists() { | ||
image_name="$1" | ||
if ! ssh "$CLUSTER_LOGIN" "[ -f $CLUSTER_SIF_PATH/$image_name.tar ]"; then | ||
echo "[Error] The '$image_name' image does not exist on the remote host $CLUSTER_LOGIN!" >&2; | ||
exit 1 | ||
fi | ||
} | ||
|
||
submit_job() { | ||
|
||
echo "[INFO] Arguments passed to job script ${@}" | ||
|
||
case $CLUSTER_JOB_SCHEDULER in | ||
"SLURM") | ||
CMD=sbatch | ||
job_script_file=submit_job_slurm.sh | ||
;; | ||
"PBS") | ||
CMD=bash | ||
job_script_file=submit_job_pbs.sh | ||
;; | ||
*) | ||
echo "[ERROR] Unsupported job scheduler specified: '$CLUSTER_JOB_SCHEDULER'. Supported options are: ['SLURM', 'PBS']" | ||
exit 1 | ||
;; | ||
esac | ||
|
||
ssh $CLUSTER_LOGIN "cd $CLUSTER_ISAACLAB_DIR && $CMD $CLUSTER_ISAACLAB_DIR/docker/cluster/$job_script_file \"$CLUSTER_ISAACLAB_DIR\" \"isaac-lab-$profile\" ${@}" | ||
} | ||
|
||
#== | ||
# Main | ||
#== | ||
|
||
#!/bin/bash | ||
|
||
help() { | ||
echo -e "\nusage: $(basename "$0") [-h] <command> [<profile>] [<job_args>...] -- Utility for interfacing between IsaacLab and compute clusters." | ||
echo -e "\noptions:" | ||
echo -e " -h Display this help message." | ||
echo -e "\ncommands:" | ||
echo -e " push [<profile>] Push the docker image to the cluster." | ||
echo -e " job [<profile>] [<job_args>] Submit a job to the cluster." | ||
echo -e "\nwhere:" | ||
echo -e " <profile> is the optional container profile specification. Defaults to 'base'." | ||
echo -e " <job_args> are optional arguments specific to the job command." | ||
echo -e "\n" >&2 | ||
} | ||
|
||
# Parse options | ||
while getopts ":h" opt; do | ||
case ${opt} in | ||
h ) | ||
help | ||
exit 0 | ||
;; | ||
\? ) | ||
echo "Invalid option: -$OPTARG" >&2 | ||
help | ||
exit 1 | ||
;; | ||
esac | ||
done | ||
shift $((OPTIND -1)) | ||
|
||
# Check for command | ||
if [ $# -lt 1 ]; then | ||
echo "Error: Command is required." >&2 | ||
help | ||
exit 1 | ||
fi | ||
|
||
command=$1 | ||
shift | ||
profile="base" | ||
|
||
case $command in | ||
push) | ||
if [ $# -gt 1 ]; then | ||
echo "Error: Too many arguments for push command." >&2 | ||
help | ||
exit 1 | ||
fi | ||
[ $# -eq 1 ] && profile=$1 | ||
echo "Executing push command" | ||
[ -n "$profile" ] && echo "Using profile: $profile" | ||
if ! command -v apptainer &> /dev/null; then | ||
echo "[INFO] Exiting because apptainer was not installed" | ||
echo "[INFO] You may follow the installation procedure from here: https://apptainer.org/docs/admin/main/installation.html#install-ubuntu-packages" | ||
exit | ||
fi | ||
# Check if Docker image exists | ||
check_image_exists isaac-lab-$profile:latest | ||
# Check if Docker version is greater than 25 | ||
check_docker_version | ||
# source env file to get cluster login and path information | ||
source $SCRIPT_DIR/.env.cluster | ||
# make sure exports directory exists | ||
mkdir -p /$SCRIPT_DIR/exports | ||
# clear old exports for selected profile | ||
rm -rf /$SCRIPT_DIR/exports/isaac-lab-$profile* | ||
# create singularity image | ||
# NOTE: we create the singularity image as non-root user to allow for more flexibility. If this causes | ||
# issues, remove the --fakeroot flag and open an issue on the IsaacLab repository. | ||
cd /$SCRIPT_DIR/exports | ||
APPTAINER_NOHTTPS=1 apptainer build --sandbox --fakeroot isaac-lab-$profile.sif docker-daemon://isaac-lab-$profile:latest | ||
# tar image (faster to send single file as opposed to directory with many files) | ||
tar -cvf /$SCRIPT_DIR/exports/isaac-lab-$profile.tar isaac-lab-$profile.sif | ||
# make sure target directory exists | ||
ssh $CLUSTER_LOGIN "mkdir -p $CLUSTER_SIF_PATH" | ||
# send image to cluster | ||
scp $SCRIPT_DIR/exports/isaac-lab-$profile.tar $CLUSTER_LOGIN:$CLUSTER_SIF_PATH/isaac-lab-$profile.tar | ||
;; | ||
job) | ||
[ $# -ge 1 ] && profile=$1 && shift | ||
job_args="$@" | ||
echo "Executing job command" | ||
[ -n "$profile" ] && echo "Using profile: $profile" | ||
[ -n "$job_args" ] && echo "Job arguments: $job_args" | ||
source $SCRIPT_DIR/.env.cluster | ||
# Check if singularity image exists on the remote host | ||
check_singularity_image_exists isaac-lab-$profile | ||
# make sure target directory exists | ||
ssh $CLUSTER_LOGIN "mkdir -p $CLUSTER_ISAACLAB_DIR" | ||
# Sync Isaac Lab code | ||
echo "[INFO] Syncing Isaac Lab code..." | ||
rsync -rh --exclude="*.git*" --filter=':- .dockerignore' /$SCRIPT_DIR/.. $CLUSTER_LOGIN:$CLUSTER_ISAACLAB_DIR | ||
# execute job script | ||
echo "[INFO] Executing job script..." | ||
# check whether the second argument is a profile or a job argument | ||
submit_job $job_args | ||
;; | ||
*) | ||
echo "Error: Invalid command: $command" >&2 | ||
help | ||
exit 1 | ||
;; | ||
esac |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
#!/usr/bin/env python3 | ||
|
||
# Copyright (c) 2022-2024, The Isaac Lab Project Developers. | ||
# All rights reserved. | ||
# | ||
# SPDX-License-Identifier: BSD-3-Clause | ||
|
||
import argparse | ||
import shutil | ||
from pathlib import Path | ||
|
||
from utils import x11_utils | ||
from utils.isaaclab_container_interface import IsaacLabContainerInterface | ||
|
||
|
||
def main(): | ||
parser = argparse.ArgumentParser(description="Utility for using Docker with Isaac Lab.") | ||
subparsers = parser.add_subparsers(dest="command", required=True) | ||
|
||
# We have to create a separate parent parser for common options to our subparsers | ||
parent_parser = argparse.ArgumentParser(add_help=False) | ||
parent_parser.add_argument("profile", nargs="?", default="base", help="Optional container profile specification.") | ||
|
||
subparsers.add_parser( | ||
"start", help="Build the docker image and create the container in detached mode.", parents=[parent_parser] | ||
) | ||
subparsers.add_parser( | ||
"enter", help="Begin a new bash process within an existing Isaac Lab container.", parents=[parent_parser] | ||
) | ||
subparsers.add_parser( | ||
"copy", help="Copy build and logs artifacts from the container to the host machine.", parents=[parent_parser] | ||
) | ||
subparsers.add_parser("stop", help="Stop the docker container and remove it.", parents=[parent_parser]) | ||
|
||
args = parser.parse_args() | ||
|
||
if not shutil.which("docker"): | ||
raise RuntimeError("Docker is not installed! Please check the 'Docker Guide' for instruction.") | ||
|
||
# Creating container interface | ||
ci = IsaacLabContainerInterface(context_dir=Path(__file__).resolve().parent, profile=args.profile) | ||
|
||
print(f"[INFO] Using container profile: {ci.profile}") | ||
if args.command == "start": | ||
print(f"[INFO] Building the docker image and starting the container {ci.container_name} in the background...") | ||
x11_outputs = x11_utils.x11_check(ci.statefile) | ||
if x11_outputs is not None: | ||
(x11_yaml, x11_envar) = x11_outputs | ||
ci.add_yamls += x11_yaml | ||
ci.environ.update(x11_envar) | ||
ci.start() | ||
elif args.command == "enter": | ||
print(f"[INFO] Entering the existing {ci.container_name} container in a bash session...") | ||
x11_utils.x11_refresh(ci.statefile) | ||
ci.enter() | ||
elif args.command == "copy": | ||
print(f"[INFO] Copying artifacts from the 'isaac-lab-{ci.container_name}' container...") | ||
ci.copy() | ||
print("\n[INFO] Finished copying the artifacts from the container.") | ||
elif args.command == "stop": | ||
print(f"[INFO] Stopping the launched docker container {ci.container_name}...") | ||
ci.stop() | ||
x11_utils.x11_cleanup(ci.statefile) | ||
else: | ||
raise RuntimeError(f"Invalid command provided: {args.command}") | ||
|
||
|
||
if __name__ == "__main__": | ||
main() |
Oops, something went wrong.