Skip to content

Commit

Permalink
Docker: More Examples, NGC & Benchmark
Browse files Browse the repository at this point in the history
OpenMPI: Use 2.1.2

Multi-Rank MPI support and thus multi-GPU support requires for
in-node communication a BTL such as "sm" which was replaced with
"vader" in OpenMPI 3.0.0.

"vader" requires CMA support from the kernel which seems to be
something that is not working in a docker container. We therefore
switch back to an older release (pre-3.0) of OpenMPI that still has
the (slower) "sm" transport.

Build for SM_70 (V100)

Docker Readme: MPI BTL shared Mem

Overwrite exising output with tbg -f

Configurable output directory and print to user.

Use non-ISAAC examples for now.
Add _live LWFA examples for later testing of ISAAC.
  • Loading branch information
ax3l committed Oct 17, 2018
1 parent d2dd489 commit 1f8d9a9
Show file tree
Hide file tree
Showing 18 changed files with 623 additions and 35 deletions.
12 changes: 10 additions & 2 deletions share/picongpu/dockerfiles/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,15 +25,23 @@ This exposes the ISAAC port to connect via the webclient to.
.. code:: bash
docker pull ax3l/picongpu
docker run --runtime=nvidia -p 2459:2459 -t ax3l/picongpu:0.3.0 lwfa
docker run --runtime=nvidia -p 2459:2459 -t ax3l/picongpu:0.4.0 lwfa_live
# open firefox and isaac client
or

.. code:: bash
singularity pull shub://ax3l/picongpu
singularity exec --nv shub://ax3l/picongpu lwfa
singularity exec --nv shub://ax3l/picongpu lwfa_live
.. note::

PIConGPU is perfectly multi-GPU capable and scales up to thousands of GPUs on the largest GPU clusters available.
In order to share data between ranks, the communication layer we use (MPI) requires shared system memory for IPC and pinned (page-locked) system memory.
The default docker limits on these resources are very small (few dozen MB) and need to be increased in order to run on multiple GPUs.

For the ``docker run`` commands above, append: ``--shm-size=1g --ulimit memlock=-1`` to increase the defaults.

Maintainer / Developer
----------------------
Expand Down
28 changes: 26 additions & 2 deletions share/picongpu/dockerfiles/ubuntu-1604/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ ENV DEBIAN_FRONTEND=noninteractive \
FORCE_UNSAFE_CONFIGURE=1 \
SPACK_ROOT=/usr/local \
SPACK_EXTRA_REPO=/usr/local/share/spack-repo \
PIC_PACKAGE='picongpu@develop+isaac backend=cuda'
PIC_PACKAGE='picongpu@0.4.0-rc4+isaac backend=cuda'

# install minimal spack dependencies
# - adds gfortran for spack's openmpi package
Expand Down Expand Up @@ -38,6 +38,7 @@ RUN apt-get update && \
pkg-config \
python \
rsync \
time \
unzip \
vim && \
rm -rf /var/lib/apt/lists/*
Expand Down Expand Up @@ -73,8 +74,31 @@ RUN /bin/echo -e "source $SPACK_ROOT/share/spack/setup-env.sh\n" \
RUN /bin/bash -l -c ' \
pic-create $PICSRC/share/picongpu/examples/LaserWakefield /opt/picInputs/lwfa && \
cd /opt/picInputs/lwfa && \
pic-build -b "cuda:30;35;37;50;60" -c'-DCUDAMEMTEST_ENABLE=OFF' && \
pic-build -b "cuda:30;35;37;50;60;70" -c'-DCUDAMEMTEST_ENABLE=OFF' && \
rm -rf .build'
# KHI (Benchmark)
RUN /bin/bash -l -c ' \
pic-create $PICSRC/share/picongpu/examples/KelvinHelmholtz /opt/picInputs/khi && \
cd /opt/picInputs/khi && \
pic-build -b "cuda:30;35;37;50;60;70" -c'-DCUDAMEMTEST_ENABLE=OFF' && \
rm -rf .build'
# Laser-Ion Acceleration
RUN /bin/bash -l -c ' \
pic-create $PICSRC/share/picongpu/examples/FoilLCT /opt/picInputs/foil && \
cd /opt/picInputs/foil && \
pic-build -b "cuda:30;35;37;50;60;70" -c'-DCUDAMEMTEST_ENABLE=OFF' && \
rm -rf .build'


COPY start_lwfa.sh /usr/bin/lwfa
COPY start_lwfa_4.sh /usr/bin/lwfa4
COPY start_lwfa_8.sh /usr/bin/lwfa8
COPY start_lwfa_live.sh /usr/bin/lwfa_live
COPY start_lwfa_live_4.sh /usr/bin/lwfa_live4
COPY start_lwfa_live_8.sh /usr/bin/lwfa_live8
COPY start_khi_1.sh /usr/bin/bench1
COPY start_khi_4.sh /usr/bin/bench4
COPY start_khi_8.sh /usr/bin/bench8
COPY start_foil_4.sh /usr/bin/foil4
COPY start_foil_8.sh /usr/bin/foil8
CMD /bin/bash -l
14 changes: 14 additions & 0 deletions share/picongpu/dockerfiles/ubuntu-1604/modules.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,24 @@ modules:
enable::
- tcl
tcl:
# Note on OpenMPI in Docker
# We should be able to use the latest MPI with
# `OMPI_MCA_btl_vader_single_copy_mechanism=none`
# to avoid disabling vader alltogether:
# https://github.com/open-mpi/ompi/issues/4948#issuecomment-377341406
openmpi:
environment:
set:
OMPI_MCA_mpi_leave_pinned: '0'
OMPI_MCA_btl: '^vader'
# This anonymous spec selects any package that
# depends on openmpi. The double colon at the
# end clears the set of rules that matched so far.
^openmpi::
environment:
set:
OMPI_MCA_mpi_leave_pinned: '0'
OMPI_MCA_btl: '^vader'
icet:
environment:
prepend_path:
Expand Down
5 changes: 5 additions & 0 deletions share/picongpu/dockerfiles/ubuntu-1604/packages.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,8 @@ packages:
paths:
python@2.7.12%gcc@5.4.0 arch=linux-ubuntu16-x86_64: /usr
buildable: False
openmpi:
version: [2.1.2]
all:
providers:
mpi: [openmpi]
42 changes: 42 additions & 0 deletions share/picongpu/dockerfiles/ubuntu-1604/start_foil_4.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
#!/bin/bash -l
#

# output directory from startup arguments
output_dir=${1:-"/tmp/foil4_001/"}

if [ "$output_dir" = "-h" ] || [ "$output_dir" = "--help" ]
then
echo "Usage:"
echo " $0 [output_directory]"
fi

#isaac &
#server_id=$!

echo ""
#echo "Let's watch a laser-plasma movie!"
#echo " http://laser.plasma.ninja/ngc/interface.htm"
echo "Let's create some openPMD HDF5 files from a novel"
echo "plasma ion accelerator driven by a short, intense"
echo "laser pulse!"
echo ""

# wait until server is up
sleep 5

# start PIConGPU
cd /opt/picInputs/foil
tbg \
-f \
-s "bash -l" \
-c etc/picongpu/4.cfg \
-t etc/picongpu/bash/mpirun.tpl \
$output_dir

# kill the isaac server after tbg returns
#kill $server_id

echo ""
echo "Simulation finished! See the created output in:"
echo " $output_dir"
echo ""
42 changes: 42 additions & 0 deletions share/picongpu/dockerfiles/ubuntu-1604/start_foil_8.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
#!/bin/bash -l
#

# output directory from startup arguments
output_dir=${1:-"/tmp/foil8_001/"}

if [ "$output_dir" = "-h" ] || [ "$output_dir" = "--help" ]
then
echo "Usage:"
echo " $0 [output_directory]"
fi

#isaac &
#server_id=$!

echo ""
#echo "Let's watch a laser-plasma movie!"
#echo " http://laser.plasma.ninja/ngc/interface.htm"
echo "Let's create some openPMD HDF5 files from a novel"
echo "plasma ion accelerator driven by a short, intense"
echo "laser pulse!"
echo ""

# wait until server is up
sleep 5

# start PIConGPU
cd /opt/picInputs/foil
tbg \
-f \
-s "bash -l" \
-c etc/picongpu/8.cfg \
-t etc/picongpu/bash/mpirun.tpl \
$output_dir

# kill the isaac server after tbg returns
#kill $server_id

echo ""
echo "Simulation finished! See the created output in:"
echo " $output_dir"
echo ""
30 changes: 30 additions & 0 deletions share/picongpu/dockerfiles/ubuntu-1604/start_khi_1.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#!/bin/bash -l
#

# output directory from startup arguments
output_dir=${1:-"/tmp/khi1_001/"}

if [ "$output_dir" = "-h" ] || [ "$output_dir" = "--help" ]
then
echo "Usage:"
echo " $0 [output_directory]"
fi

echo ""
echo "Running KHI Benchmark on 1 GPUs..."
echo ""


# start PIConGPU
cd /opt/picInputs/khi
/usr/bin/time -f "%e" tbg \
-f \
-s "bash -l" \
-c etc/picongpu/1_bench.cfg \
-t etc/picongpu/bash/mpirun.tpl \
$output_dir

echo ""
echo "Simulation finished! See the created output in:"
echo " $output_dir"
echo ""
30 changes: 30 additions & 0 deletions share/picongpu/dockerfiles/ubuntu-1604/start_khi_4.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#!/bin/bash -l
#

# output directory from startup arguments
output_dir=${1:-"/tmp/khi4_001/"}

if [ "$output_dir" = "-h" ] || [ "$output_dir" = "--help" ]
then
echo "Usage:"
echo " $0 [output_directory]"
fi

echo ""
echo "Running KHI Benchmark on 4 GPUs..."
echo ""


# start PIConGPU
cd /opt/picInputs/khi
/usr/bin/time -f "%e" tbg \
-f \
-s "bash -l" \
-c etc/picongpu/4_bench.cfg \
-t etc/picongpu/bash/mpirun.tpl \
$output_dir

echo ""
echo "Simulation finished! See the created output in:"
echo " $output_dir"
echo ""
30 changes: 30 additions & 0 deletions share/picongpu/dockerfiles/ubuntu-1604/start_khi_8.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#!/bin/bash -l
#

# output directory from startup arguments
output_dir=${1:-"/tmp/khi8_001/"}

if [ "$output_dir" = "-h" ] || [ "$output_dir" = "--help" ]
then
echo "Usage:"
echo " $0 [output_directory]"
fi

echo ""
echo "Running KHI Benchmark on 8 GPUs..."
echo ""


# start PIConGPU
cd /opt/picInputs/khi
/usr/bin/time -f "%e" tbg \
-f \
-s "bash -l" \
-c etc/picongpu/8_bench.cfg \
-t etc/picongpu/bash/mpirun.tpl \
$output_dir

echo ""
echo "Simulation finished! See the created output in:"
echo " $output_dir"
echo ""
32 changes: 25 additions & 7 deletions share/picongpu/dockerfiles/ubuntu-1604/start_lwfa.sh
Original file line number Diff line number Diff line change
@@ -1,12 +1,24 @@
#!/bin/bash -l
#

isaac &
server_id=$!
# output directory from startup arguments
output_dir=${1:-"/tmp/lwfa1_001/"}

if [ "$output_dir" = "-h" ] || [ "$output_dir" = "--help" ]
then
echo "Usage:"
echo " $0 [output_directory]"
fi

#isaac &
#server_id=$!

echo ""
echo "Let's watch a laser-plasma movie!"
echo " http://laser.plasma.ninja/isaac_1_3_0/interface.htm"
#echo "Let's watch a laser-plasma movie!"
#echo " http://laser.plasma.ninja/ngc/interface.htm"
echo "Let's create some output files from a"
echo "laser wakefield (electron) accelerator (LWFA)"
echo "driven by a short, intense laser pulse!"
echo ""

# wait until server is up
Expand All @@ -15,10 +27,16 @@ sleep 5
# start PIConGPU
cd /opt/picInputs/lwfa
tbg \
-f \
-s "bash -l" \
-c etc/picongpu/1_isaac.cfg \
-c etc/picongpu/1.cfg \
-t etc/picongpu/bash/mpirun.tpl \
/tmp/lwfa_001
$output_dir

# kill the isaac server after tbg returns
kill $server_id
#kill $server_id

echo ""
echo "Simulation finished! See the created output in:"
echo " $output_dir"
echo ""
42 changes: 42 additions & 0 deletions share/picongpu/dockerfiles/ubuntu-1604/start_lwfa_4.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
#!/bin/bash -l
#

# output directory from startup arguments
output_dir=${1:-"/tmp/lwfa4_001/"}

if [ "$output_dir" = "-h" ] || [ "$output_dir" = "--help" ]
then
echo "Usage:"
echo " $0 [output_directory]"
fi

#isaac &
#server_id=$!

echo ""
#echo "Let's watch a laser-plasma movie!"
#echo " http://laser.plasma.ninja/ngc/interface.htm"
echo "Let's create some output files from a"
echo "laser wakefield (electron) accelerator (LWFA)"
echo "driven by a short, intense laser pulse!"
echo ""

# wait until server is up
sleep 5

# start PIConGPU
cd /opt/picInputs/lwfa
tbg \
-f \
-s "bash -l" \
-c etc/picongpu/4.cfg \
-t etc/picongpu/bash/mpirun.tpl \
$output_dir

# kill the isaac server after tbg returns
#kill $server_id

echo ""
echo "Simulation finished! See the created output in:"
echo " $output_dir"
echo ""
Loading

0 comments on commit 1f8d9a9

Please sign in to comment.