Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved Docker & NGC #2557

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions etc/picongpu/bash/mpiexec.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -46,11 +46,11 @@ cd simOutput

# test if cuda_memtest binary is available
if [ -f !TBG_dstPath/input/bin/cuda_memtest ] ; then
mpiexec --prefix $MPI_ROOT -tag-output --display-map -x LIBRARY_PATH -x LD_LIBRARY_PATH -am !TBG_dstPath/tbg/openib.conf --mca mpi_leave_pinned 0 -npernode !TBG_gpusPerNode -n !TBG_tasks !TBG_dstPath/input/bin/cuda_memtest.sh
mpiexec -am !TBG_dstPath/tbg/openib.conf --mca mpi_leave_pinned 0 -npernode !TBG_gpusPerNode -n !TBG_tasks !TBG_dstPath/input/bin/cuda_memtest.sh
else
echo "no binary 'cuda_memtest' available, skip GPU memory test" >&2
fi

if [ $? -eq 0 ] ; then
mpiexec --prefix $MPI_ROOT -x LIBRARY_PATH -x LD_LIBRARY_PATH -tag-output --display-map -am !TBG_dstPath/tbg/openib.conf --mca mpi_leave_pinned 0 -npernode !TBG_gpusPerNode -n !TBG_tasks !TBG_dstPath/input/bin/picongpu !TBG_author !TBG_programParams | tee output
mpiexec -am !TBG_dstPath/tbg/openib.conf --mca mpi_leave_pinned 0 -npernode !TBG_gpusPerNode -n !TBG_tasks !TBG_dstPath/input/bin/picongpu !TBG_author !TBG_programParams | tee output
fi
4 changes: 2 additions & 2 deletions etc/picongpu/bash/mpirun.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -46,11 +46,11 @@ cd simOutput

# test if cuda_memtest binary is available
if [ -f !TBG_dstPath/input/bin/cuda_memtest ] ; then
mpirun --display-map -am !TBG_dstPath/tbg/openib.conf --mca mpi_leave_pinned 0 -x LD_LIBRARY_PATH -npernode !TBG_gpusPerNode -n !TBG_tasks !TBG_dstPath/input/bin/cuda_memtest.sh
mpirun -am !TBG_dstPath/tbg/openib.conf --mca mpi_leave_pinned 0 -npernode !TBG_gpusPerNode -n !TBG_tasks !TBG_dstPath/input/bin/cuda_memtest.sh
else
echo "no binary 'cuda_memtest' available, skip GPU memory test" >&2
fi

if [ $? -eq 0 ] ; then
mpirun -tag-output --display-map -am !TBG_dstPath/tbg/openib.conf --mca mpi_leave_pinned 0 -x LD_LIBRARY_PATH -npernode !TBG_gpusPerNode -n !TBG_tasks !TBG_dstPath/input/bin/picongpu !TBG_author !TBG_programParams | tee output
mpirun -am !TBG_dstPath/tbg/openib.conf --mca mpi_leave_pinned 0 -npernode !TBG_gpusPerNode -n !TBG_tasks !TBG_dstPath/input/bin/picongpu !TBG_author !TBG_programParams | tee output
fi
12 changes: 10 additions & 2 deletions share/picongpu/dockerfiles/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,15 +25,23 @@ This exposes the ISAAC port to connect via the webclient to.
.. code:: bash

docker pull ax3l/picongpu
docker run --runtime=nvidia -p 2459:2459 -t ax3l/picongpu:0.3.0 lwfa
docker run --runtime=nvidia -p 2459:2459 -t ax3l/picongpu:0.4.0 lwfa_live
# open firefox and isaac client

or

.. code:: bash

singularity pull shub://ax3l/picongpu
singularity exec --nv shub://ax3l/picongpu lwfa
singularity exec --nv shub://ax3l/picongpu lwfa_live

.. note::

PIConGPU is perfectly multi-GPU capable and scales up to thousands of GPUs on the largest GPU clusters available.
In order to share data between ranks, the communication layer we use (MPI) requires shared system memory for IPC and pinned (page-locked) system memory.
The default docker limits on these resources are very small (few dozen MB) and need to be increased in order to run on multiple GPUs.

For the ``docker run`` commands above, append: ``--shm-size=1g --ulimit memlock=-1`` to increase the defaults.

Maintainer / Developer
----------------------
Expand Down
28 changes: 26 additions & 2 deletions share/picongpu/dockerfiles/ubuntu-1604/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ ENV DEBIAN_FRONTEND=noninteractive \
FORCE_UNSAFE_CONFIGURE=1 \
SPACK_ROOT=/usr/local \
SPACK_EXTRA_REPO=/usr/local/share/spack-repo \
PIC_PACKAGE='picongpu@develop+isaac backend=cuda'
PIC_PACKAGE='picongpu@0.4.0-rc4+isaac backend=cuda'

# install minimal spack dependencies
# - adds gfortran for spack's openmpi package
Expand Down Expand Up @@ -38,6 +38,7 @@ RUN apt-get update && \
pkg-config \
python \
rsync \
time \
unzip \
vim && \
rm -rf /var/lib/apt/lists/*
Expand Down Expand Up @@ -73,8 +74,31 @@ RUN /bin/echo -e "source $SPACK_ROOT/share/spack/setup-env.sh\n" \
RUN /bin/bash -l -c ' \
pic-create $PICSRC/share/picongpu/examples/LaserWakefield /opt/picInputs/lwfa && \
cd /opt/picInputs/lwfa && \
pic-build -b "cuda:30;35;37;50;60" -c'-DCUDAMEMTEST_ENABLE=OFF' && \
pic-build -b "cuda:30;35;37;50;60;70" -c'-DCUDAMEMTEST_ENABLE=OFF' && \
rm -rf .build'
# KHI (Benchmark)
RUN /bin/bash -l -c ' \
pic-create $PICSRC/share/picongpu/examples/KelvinHelmholtz /opt/picInputs/khi && \
cd /opt/picInputs/khi && \
pic-build -b "cuda:30;35;37;50;60;70" -c'-DCUDAMEMTEST_ENABLE=OFF' && \
rm -rf .build'
# Laser-Ion Acceleration
RUN /bin/bash -l -c ' \
pic-create $PICSRC/share/picongpu/examples/FoilLCT /opt/picInputs/foil && \
cd /opt/picInputs/foil && \
pic-build -b "cuda:30;35;37;50;60;70" -c'-DCUDAMEMTEST_ENABLE=OFF' && \
rm -rf .build'


COPY start_lwfa.sh /usr/bin/lwfa
COPY start_lwfa_4.sh /usr/bin/lwfa4
COPY start_lwfa_8.sh /usr/bin/lwfa8
COPY start_lwfa_live.sh /usr/bin/lwfa_live
COPY start_lwfa_live_4.sh /usr/bin/lwfa_live4
COPY start_lwfa_live_8.sh /usr/bin/lwfa_live8
COPY start_khi_1.sh /usr/bin/bench1
COPY start_khi_4.sh /usr/bin/bench4

This comment was marked as resolved.

This comment was marked as resolved.

COPY start_khi_8.sh /usr/bin/bench8
COPY start_foil_4.sh /usr/bin/foil4
COPY start_foil_8.sh /usr/bin/foil8
CMD /bin/bash -l
14 changes: 14 additions & 0 deletions share/picongpu/dockerfiles/ubuntu-1604/modules.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,24 @@ modules:
enable::
- tcl
tcl:
# Note on OpenMPI in Docker
# We should be able to use the latest MPI with
# `OMPI_MCA_btl_vader_single_copy_mechanism=none`
# to avoid disabling vader alltogether:
# https://github.com/open-mpi/ompi/issues/4948#issuecomment-377341406
openmpi:
environment:
set:
OMPI_MCA_mpi_leave_pinned: '0'
OMPI_MCA_btl: '^vader'
# This anonymous spec selects any package that
# depends on openmpi. The double colon at the
# end clears the set of rules that matched so far.
^openmpi::
environment:
set:
OMPI_MCA_mpi_leave_pinned: '0'
OMPI_MCA_btl: '^vader'

This comment was marked as off-topic.

icet:
environment:
prepend_path:
Expand Down
5 changes: 5 additions & 0 deletions share/picongpu/dockerfiles/ubuntu-1604/packages.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,8 @@ packages:
paths:
python@2.7.12%gcc@5.4.0 arch=linux-ubuntu16-x86_64: /usr
buildable: False
openmpi:
version: [2.1.2]
all:
providers:
mpi: [openmpi]
42 changes: 42 additions & 0 deletions share/picongpu/dockerfiles/ubuntu-1604/start_foil_4.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
#!/bin/bash -l
#

# output directory from startup arguments
output_dir=${1:-"/tmp/foil4_001/"}

if [ "$output_dir" = "-h" ] || [ "$output_dir" = "--help" ]
then
echo "Usage:"
echo " $0 [output_directory]"
fi

#isaac &
#server_id=$!

echo ""
#echo "Let's watch a laser-plasma movie!"
#echo " http://laser.plasma.ninja/ngc/interface.htm"
echo "Let's create some openPMD HDF5 files from a novel"
echo "plasma ion accelerator driven by a short, intense"
echo "laser pulse!"
echo ""

# wait until server is up
sleep 5

# start PIConGPU
cd /opt/picInputs/foil
tbg \
-f \
-s "bash -l" \
-c etc/picongpu/4.cfg \
-t etc/picongpu/bash/mpirun.tpl \
$output_dir

# kill the isaac server after tbg returns
#kill $server_id

echo ""
echo "Simulation finished! See the created output in:"
echo " $output_dir"
echo ""
42 changes: 42 additions & 0 deletions share/picongpu/dockerfiles/ubuntu-1604/start_foil_8.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
#!/bin/bash -l
#

# output directory from startup arguments
output_dir=${1:-"/tmp/foil8_001/"}

if [ "$output_dir" = "-h" ] || [ "$output_dir" = "--help" ]
then
echo "Usage:"
echo " $0 [output_directory]"
fi

#isaac &
#server_id=$!

echo ""
#echo "Let's watch a laser-plasma movie!"
#echo " http://laser.plasma.ninja/ngc/interface.htm"
echo "Let's create some openPMD HDF5 files from a novel"
echo "plasma ion accelerator driven by a short, intense"
echo "laser pulse!"
echo ""

# wait until server is up
sleep 5

# start PIConGPU
cd /opt/picInputs/foil
tbg \
-f \
-s "bash -l" \
-c etc/picongpu/8.cfg \
-t etc/picongpu/bash/mpirun.tpl \
$output_dir

# kill the isaac server after tbg returns
#kill $server_id

echo ""
echo "Simulation finished! See the created output in:"
echo " $output_dir"
echo ""
30 changes: 30 additions & 0 deletions share/picongpu/dockerfiles/ubuntu-1604/start_khi_1.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#!/bin/bash -l
#

# output directory from startup arguments
output_dir=${1:-"/tmp/khi1_001/"}

if [ "$output_dir" = "-h" ] || [ "$output_dir" = "--help" ]
then
echo "Usage:"
echo " $0 [output_directory]"
fi

echo ""
echo "Running KHI Benchmark on 1 GPUs..."
echo ""


# start PIConGPU
cd /opt/picInputs/khi
/usr/bin/time -f "%e" tbg \
-f \
-s "bash -l" \
-c etc/picongpu/1_bench.cfg \
-t etc/picongpu/bash/mpirun.tpl \
$output_dir

echo ""
echo "Simulation finished! See the created output in:"
echo " $output_dir"
echo ""
30 changes: 30 additions & 0 deletions share/picongpu/dockerfiles/ubuntu-1604/start_khi_4.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#!/bin/bash -l
#

# output directory from startup arguments
output_dir=${1:-"/tmp/khi4_001/"}

if [ "$output_dir" = "-h" ] || [ "$output_dir" = "--help" ]
then
echo "Usage:"
echo " $0 [output_directory]"
fi

echo ""
echo "Running KHI Benchmark on 4 GPUs..."
echo ""


# start PIConGPU
cd /opt/picInputs/khi
/usr/bin/time -f "%e" tbg \
-f \
-s "bash -l" \
-c etc/picongpu/4_bench.cfg \
-t etc/picongpu/bash/mpirun.tpl \
$output_dir

echo ""
echo "Simulation finished! See the created output in:"
echo " $output_dir"
echo ""
30 changes: 30 additions & 0 deletions share/picongpu/dockerfiles/ubuntu-1604/start_khi_8.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#!/bin/bash -l
#

# output directory from startup arguments
output_dir=${1:-"/tmp/khi8_001/"}

if [ "$output_dir" = "-h" ] || [ "$output_dir" = "--help" ]
then
echo "Usage:"
echo " $0 [output_directory]"
fi

echo ""
echo "Running KHI Benchmark on 8 GPUs..."
echo ""


# start PIConGPU
cd /opt/picInputs/khi
/usr/bin/time -f "%e" tbg \
-f \
-s "bash -l" \
-c etc/picongpu/8_bench.cfg \
-t etc/picongpu/bash/mpirun.tpl \
$output_dir

echo ""
echo "Simulation finished! See the created output in:"
echo " $output_dir"
echo ""
32 changes: 25 additions & 7 deletions share/picongpu/dockerfiles/ubuntu-1604/start_lwfa.sh
Original file line number Diff line number Diff line change
@@ -1,12 +1,24 @@
#!/bin/bash -l
#

isaac &
server_id=$!
# output directory from startup arguments
output_dir=${1:-"/tmp/lwfa1_001/"}

if [ "$output_dir" = "-h" ] || [ "$output_dir" = "--help" ]
then
echo "Usage:"
echo " $0 [output_directory]"
fi

#isaac &
#server_id=$!

echo ""
echo "Let's watch a laser-plasma movie!"
echo " http://laser.plasma.ninja/isaac_1_3_0/interface.htm"
#echo "Let's watch a laser-plasma movie!"
#echo " http://laser.plasma.ninja/ngc/interface.htm"
echo "Let's create some output files from a"
echo "laser wakefield (electron) accelerator (LWFA)"
echo "driven by a short, intense laser pulse!"
echo ""

# wait until server is up
Expand All @@ -15,10 +27,16 @@ sleep 5
# start PIConGPU
cd /opt/picInputs/lwfa
tbg \
-f \
-s "bash -l" \
-c etc/picongpu/1_isaac.cfg \
-c etc/picongpu/1.cfg \
-t etc/picongpu/bash/mpirun.tpl \
/tmp/lwfa_001
$output_dir

# kill the isaac server after tbg returns
kill $server_id
#kill $server_id

echo ""
echo "Simulation finished! See the created output in:"
echo " $output_dir"
echo ""
Loading