Running Analysis Pipelines - guide to containerized MPI-parallel workflows#
Introduction#
Analysis pipelines for large-scale astronomical data processing can consist of a variety of specialized software tools and libraries. The latter may have been developed under operating systems and software environments that may differ from the actual runtime environment, i.e. a high-performance computing (HPC) cluster like Ramses. Therefore, it may be beneficial to wrap workflows in containers. Containerization is the packaging of all your analysis code with just the operating system (OS) libraries and dependencies required to run the code. A container is essentially a single lightweight executable that runs consistently on any infrastructure.
When the container concept is extended to MPI-parallel applications, then some of the independence from the hosting OS gets lost, owing mainly to the fact that MPI applications depend on low-level system libraries. For example, efficient MPI communication often relies on low-latency network hardware like InfiniBand, which requires specific drivers and libraries to be available inside the container at both build and runtime.
This guide describes the creation of workflow containers for MPI-parallel applications, where
the focus is on employing mpi4py, the MPI bindings for Python, as a high-level programming interface.
Details are provided for building an Apptainer/Singularity image that features
Uses the host’s MPI, according to Apptainer’s Bind model
mpi4pycompiled within ApptainerLeverages Infiniband via UCX/UCC
A prerequisite for the material covered below is some general understanding of
General containerization concepts
General MPI-parallel concepts
Apptainer/Singularity basics
Container basics and Apptainer/Singularity#
In principle, containerization involves three steps:
Define the container image (OS, dependencies, environment) via a definition file, i.e., what will the container do?
Build the container image from the definition file
Instantiate a container from that image and run on the host system
Apptainer (formerly Singularity) is a containerization technology designed for HPC environments. It is more adequate for HPC systems than Docker mainly because root privileges are not required to run containers.
The first step in containerization is to create a container image.
Container images involve image definition files that specify, among other things,
the base OS image. The Dockerfile is Docker’s default name for such a definition file.
In Apptainer, the Dockerfile counterpart is called Singularity Definition File (SDF).
SDFs typically have the file extension *.def.
A container image is then created from an SDF, producing a read-only template containing instructions
for spawning a container. The container image encapsulates the filesystem, environment, and metadata.
In Apptainer, images are mostly stored in a format referred to as SIF (Singularity Image Format),
using the extension *.sif. Hence, a container is an instance of an image that runs as a process
on the host system.
A container is not started the same way as running a compiled binary, but rather
through the container engine, which basically is a container manager. The engine provides the
execution environment for container images and virtualizes the resources for containerized applications.
For example, in the Windows OS, Docker Desktop is a container engine.
In Apptainer, the container engine is part of the apptainer command-line tool.
So a command like
apptainer exec myIMAGE.sif python myscript.py
lets Apptainer instantiate a container from the template myIMAGE.sif and executes the Python
code myscript.py, where the latter runs inside the container, i.e., it uses the OS,
filesystem and environment as defined in the image.
Note that in the following, the words container and apptainer are used synonymously in contexts that distinguish between operations that happen inside or outside (on the host) of a container, i.e., one may read inside/outside Apptainer ….
MPI and mpi4py (MPI for Python)#
Apptainer currently supports two open-source implementations of MPI; these are OpenMPI and MPICH. This document focuses on OpenMPI. MPI stands for Message Passing Interface (MPI) and is a common standard for performing communication across parallel computing architectures, which can be compute nodes of a single system or across compute platforms. MPI comes in form of libraries with a low-level set of commands that enable message passing, parallel IO, etc. Owing to its low-level nature, MPI is often used in conjunction with compiled HPC-suitable programming languages like C/C++ or Fortran.
As opposed to more low-level scientific programming languages, Python is
an ideal candidate for implementing the higher-level parts of
compute-intensive applications. MPI for Python (mpi4py) has thus
evolved in order to provide MPI bindings for Python programs. mpi4py
builds on the MPI specification and provides an object oriented
interface based on MPI-2 C++ bindings (see also MPI for
Python).
MPI + Apptainer#
According to the Apptainer+MPI documentation, one distinguishes between two different ways of joining Apptainer with MPI. The first way is referred to as Host-MPI model, also called Hybrid model, which is useful when shipping containerized MPI-parallel applications. This model involves a containerized MPI working in conjunction with the host’s MPI. Therefore, the name Hybrid appears more suitable than Host-MPI as the latter may suggest involvement of only a host MPI (personal opinion of the author).
The second method, referred to as Bind-model and explained in more detail below, involves mounting the host’s MPI into Apptainer. Hence, the containerized application uses the host MPI directly, that is, without a containerized MPI layer.
Model 1: Hybrid model#
The essence of the Hybrid model is that one executes some containerized MPI program, which in practice is realized as follows:
$ mpirun -n <NRANKS> apptainer exec <IMAGE> </PATH/TO/MPI-PROGRAM/WITHIN/APPTAINER>
where <IMAGE> is the Apptainer image (residing on the host). Here,
mpirun is executed on the host, launched on the apptainer command
itself. Hence, each MPI rank is launched as a separate container
process; in other words, the host’s process manager daemon (ORTED in
OpenMPI) launches Apptainer containers for each MPI rank. Inside
Apptainer, the application (/PATH/TO/MPI-PROGRAM/WITHIN/APPTAINER)
loads containerized MPI libraries which connect back to ORTED via PMI
(Process Management Interface). This procedure results in MPI
communication across container boundaries. It is the necessary way if
directly mounting the host MPI into Apptainer is not possible due to
security policies. The other case with less strict policies leads to the
second model, called Bind model.
Model 2: Bind model#
The essence of the Bind model involves mounting the host’s MPI into
Apptainer, again expressed via the mpirun command:
$ export MPI_DIR=<PATH/TO/HOST/MPI/DIRECTORY>
$ mpirun -n <NRANKS> apptainer exec --bind $MPI_DIR <IMAGE> </PATH/TO/MPI-PROGRAM/WITHIN/CONTAINER>
Similar to the hybrid approach, the bind-approach starts the MPI
application by calling the MPI launcher (mpirun) from the host.
However, note that the major difference is the bind/mount of the host
MPI through --bind $MPI_DIR. Figuratively speaking, the host sets
the stage by launching ranks, providing (MPI) libraries, etc., while the
container just brings its own environment, i.e. Python code and other
dependencies, not including an MPI layer. This can make such containers more lightweight, as they do
not include any MPI implementation.
Comparison Hybrid Model vs Bind Model in Apptainer#
The Hybrid model’s advantage of providing a higher degree of (containerized) autonomy comes with the disadvantage of a compatibility requirement between container MPI and host MPI. The keyword is ABI (Application Binary Interface) compatibility. Basically, ABI provides a low-level specification that defines how compiled programs interact with the system and with each other at the binary level. ABI compatibility of two (separately compiled) binaries ensures matching specifications on data type sizes and alignment, function calling conventions, system call numbers, register usage, stack layout, and exception handling. Non-compatibility may manifest in elusive runtime errors or segmentation faults despite a seemingly successful compilation process.
Feature |
Hybrid Model |
Bind Model |
MPI inside container? |
Containerized MPI installation interacts with host MPI |
Container uses host MPI via bind mount |
MPI launcher location |
Host - |
Host - |
MPI process management |
OpenMPI daemon (ORTED) launches containers and connects via Process Management Interface (PMI) |
No ORTED inside container — host MPI handles everything |
Compatibility needs |
⚠️ Container MPI must be ABI-compatible with host MPI |
✅ Compatibility given through host MPI |
Performance tuning |
⚠️ Container MPI must be configured for host hardware, e.g., UCX, verbs |
✅ Host MPI already tuned for hardware |
Container size |
📦 Larger - includes MPI stack |
📦 Smaller - no MPI inside container |
Use case |
When bind-mounts are restricted, host MPI not accessible, for portability and full container isolation |
When bind mounts are allowed, for simplicity and lightweight containers |
This Howto focuses on the Bind model. As the Bind model’s mpirun
template above shows, this requires two steps:
Know where the MPI implementation on the host is installed (
MPI_DIR)Mount/bind the host MPI into the container in a location where the (container’s) system will be able to find libraries and binaries.
If one wishes to unlock the full MPI performance of a given HPC system, Step 2 brings up another layer of bind-mounts not yet mentioned, while applicable to probably the majority of HPC systems. On a hardware level, many HPC systems use fast low-latency network technology like InfiniBand. It is desirable for MPI-containers to stay on the Infiniband highway, i.e., not have to revert to a slower network alternative. This will require the software translators that link your MPI programs to the InfiniBand hardware to be available inside a container. Such translators are UCX and UCC.
What is UCX and UCC?#
UCX (Unified Communication X) is a high-performance messaging layer that accelerates data transfer across HPC systems. It supports transports like Infiniband, shared memory, TCP, and GPU interconnects. UCX acts as the backbone for MPI implementations, accelerating data movement between nodes and devices.
UCC (Unified Collective Communication) builds on UCX by optimizing collective operations like broadcast, reduce, and barrier, as these are essential for scaling parallel applications. UCC ensures that these operations run efficiently across modern interconnects like Infiniband and NVLink.
mpi4py)Without going into more detail, the basic rule is:
To leverage high-performance messaging layers, that is, to use Infiniband via UCX/UCC,
an MPI-capable container requires the exposure of UCX/UCC-related libraries at build and runtime.
Here, build time refers to the compilation of mpi4py inside Apptainer, that is, one cannot
use a pre-built mpi4py that was compiled without UCX/UCC support.
Without UCX/UCC support, our MPI program (or mpi4py) would fall back to slower, less capable transports like TCP-only, or it
may fail to initialize. The following section describes the steps for building a UCX/UCC-capable
Apptainer image for the Bind-model approach.
Building a sample Bind-model container#
/projects/sw/Apptainer/Build_Examples/build_mpi4py/Step 1) Build base sandbox from Apptainer definition file#
The starting point is the following SDF (Singularity/Apptainer definition file), to be
found under
/projects/sw/Apptainer/Build_Examples/build_mpi4py/centos_stream9.def
Bootstrap: docker
From: rockylinux:9
%labels
Author Michael Commer @ ITCC
Purpose "Python + mpi4py container using host MPI/RDMA stack on RHEL9"
%environment
export MPI_PATH=/projects/sw/eb/arch/zen4/software/OpenMPI/4.1.5-GCC-12.3.0
export PATH=${MPI_PATH}/bin:/opt/venv/bin:$PATH
export LD_LIBRARY_PATH=${MPI_PATH}/lib:/opt/lib:/opt/ucc:/opt/ucx:/opt/gcc:\
/opt/hwloc:/opt/libfabric:/opt/numactl:/opt/gpfs:/opt/binutils:/opt/zlib:\
/opt/libxml2:/opt/libpciaccess:$LD_LIBRARY_PATH
export PYTHONPATH=/opt/venv/lib/python3.9/site-packages:$PYTHONPATH
%post
# Install system dependencies
dnf install -y --allowerasing \
python3 python3-pip python3-devel \
gcc gcc-c++ make \
util-linux git curl ca-certificates \
&& dnf clean all
# Create mount points for host MPI and libraries
export MPI_PATH=/projects/sw/eb/arch/zen4/software/OpenMPI/4.1.5-GCC-12.3.0
for d in lib ucc ucx gcc hwloc libfabric numactl gpfs binutils zlib libxml2 libpciaccess; do
mkdir -p /opt/$d
done
mkdir -p /etc/libibverbs.d $MPI_PATH
# Create and activate Python virtual environment
python3 -m venv /opt/venv
source /opt/venv/bin/activate
%files
# copy specific MPI-related/RDMA driver libs into the container
/lib64/libefa.so.1 /opt/lib/libefa.so.1
/lib64/libibverbs.so.1 /opt/lib/libibverbs.so.1
/lib64/libm.so.6 /opt/lib/libm.so.6
/lib64/libnl-3.so.200 /opt/lib/libnl-3.so.200
/lib64/libnl-route-3.so.200 /opt/lib/libnl-route-3.so.200
/lib64/libpmi.so.0 /opt/lib/libpmi.so.0
/lib64/libpmi2.so.0 /opt/lib/libpmi2.so.0
/lib64/librdmacm.so.1 /opt/lib/librdmacm.so.1
/lib64/libresolv.so.2 /opt/lib/libresolv.so.2
/lib64/libuuid.so.1 /opt/lib/libuuid.so.1
/lib64/libz.so.1 /opt/lib/libz.so.1
/usr/lib64/libibverbs/libmlx5-rdmav34.so /opt/lib/libmlx5-rdmav34.so
/usr/lib64/slurm/libslurm_pmi.so /opt/lib/libslurm_pmi.so
Building the sandbox is done as follows:
apptainer -v build --fakeroot --sandbox sandbox_centos_stream9 centos_stream9.def
As a shortcut, you can use the helper-script command ./build.sh bs
instead. When done, you will see an unpacked directory tree
sandbox_centos_stream9/, referred to as a sandbox (image). Such a
sandbox is useful for development and debugging because you can always
re-enter it via apptainer shell --writable sandbox_centos_stream9/
and install/fix things inside without rebuilding.
Step 2) Compile mpi4py inside sandbox#
Now, we want to rebuild mpi4py against the host OpenMPI+UCX+UCC
stack. This step involves
loading the appropriate MPI-module:
module purge && module load mpi/OpenMPI/4.1.5-GCC-12.3.0bind-mouting MPI:
... --bind ${MPI_PATH}:${MPI_PATH}:robind-mounting UCX/UCC and other MPI-related libraries, for example:
... --bind /projects/sw/eb/arch/zen4/software/UCX/1.14.1-GCCcore-12.3.0/lib:/opt/ucx:ro
You can perform these steps via the helper-script: ./build.sh is,
which esentially performs all necessary bind-mounts:
# Enter sandbox and manually compile mpi4py
MPI_PATH=/projects/sw/eb/arch/zen4/software/OpenMPI/4.1.5-GCC-12.3.0
FMOD=mpi/OpenMPI/4.1.5-GCC-12.3.0
module purge && module load $FMOD
# ...
apptainer shell --writable \
--bind ${MPI_PATH}:${MPI_PATH}:ro \
--bind /etc/libibverbs.d:/etc/libibverbs.d:ro \
--bind /projects/sw/eb/arch/zen4/software/UCX/1.14.1-GCCcore-12.3.0/lib:/opt/ucx:ro \
--bind /projects/sw/eb/arch/zen4/software/UCC/1.2.0-GCCcore-12.3.0/lib:/opt/ucc:ro \
--bind /projects/sw/eb/arch/zen4/software/GCCcore/12.3.0/lib/../lib64:/opt/gcc:ro \
--bind /projects/sw/eb/arch/zen4/software/hwloc/2.9.1-GCCcore-12.3.0/lib:/opt/hwloc:ro \
--bind /projects/sw/eb/arch/zen4/software/libfabric/1.21.0-GCCcore-12.3.0/lib:/opt/libfabric:ro \
--bind /projects/sw/eb/arch/zen4/software/numactl/2.0.16-GCCcore-12.3.0/lib:/opt/numactl:ro \
--bind /usr/lpp/mmfs/lib:/opt/gpfs:ro \
--bind /projects/sw/eb/arch/zen4/software/binutils/2.40-GCCcore-12.3.0/lib:/opt/binutils:ro \
--bind /projects/sw/eb/arch/zen4/software/zlib/1.2.13-GCCcore-12.3.0/lib:/opt/zlib:ro \
--bind /projects/sw/eb/arch/zen4/software/libxml2/2.11.4-GCCcore-12.3.0/lib:/opt/libxml2:ro \
--bind /projects/sw/eb/arch/zen4/software/libpciaccess/0.17-GCCcore-12.3.0/lib:/opt/libpciaccess:ro \
sandbox_centos_stream9
Once inside the sandbox (at the prompt Apptainer>), launch the
mpi4py compilation:
Apptainer> LDFLAGS="-L/opt/ucx -L/opt/ucc -lucp -lucs -luct -lucc" pip install --no-binary=mpi4py --no-cache-dir --force-reinstall mpi4py
Building mpi4py inside Apptainer involves compiling Python bindings
that link against the MPI libraries and their underlying (UCX and UCC)
communication frameworks. In this case, it was found that the compiler
and linker did not find the associated UCX/UCC libraries automatically,
despite adequate settings of LD_LIBRARY_PATH, as LD_LIBRARY_PATH is not
evaluated at compile time. Hence the additional
passing of LDFLAGS="-L/opt/ucx ..." to pip install. On other systems, and
with other versions of MPI/mpi4py etc., it may become a trial-and-error
procedure to pass the correct linking-flags.
Step 2b, optional) Verify UCX/UCC transport inside Apptainer#
To make sure UCX libraries are present and contain required symbols:
Apptainer> nm -D /opt/ucx/libucs.so | grep ucs_mpool_params_reset
which should produce something like
00000000000233b0 T ucs_mpool_params_reset, where the T
represents the state of a defined symbol. If missing, UCX may be
outdated, incompatible or incorrectly linked. Afterwards, check UCC
linkage to UCX:
Apptainer> ldd /opt/ucc/libucc.so.1 | grep libucs
which should produce something like
libucs.so.0 => /opt/ucx/libucs.so.0 (0x00007fa8d88da000). Finally,
you could try out mpi4py:
Apptainer> python
# Then run this 2-liner:
>>> from mpi4py import MPI
>>> print(f"Hello from rank {MPI.COMM_WORLD.Get_rank()} of {MPI.COMM_WORLD.Get_size()}")
Step 3) Build final image in Singularity image format (*.sif)#
When happy with the performance of the sandbox, you can build a final immutable image:
apptainer build centos_stream9.sif sandbox_centos_stream9
Again, this step can also be run through the helper-script:
./build.sh bf. Generally, keep the sandbox as long as you are still
iterating and expect to configure and debug inside the container. Create
the final .sif image once the build works and you want something
stable and portable.
Step 3b, optional) Perform simple mpi4py benchmark#
You can run an interactive slurm job with two nodes in order to test
node-to-node communication using the mpi4py.bench module:
salloc --nodes=2 --ntasks-per-node=1 --job-name=n2xt1 --time=00:30:00
# when interactive slurm job has started:
module load mpi/OpenMPI/4.1.5-GCC-12.3.0
./build.sh r # runs "mpi4py.bench pingpong"
Assigning one task per node forces communication across the two nodes.
The helper-script command ./build.sh r is a shortcut to the command
m=67108868 # max packet size for benchmark test
appt_tools mpirun -n 2 python -m mpi4py.bench pingpong --max-size $m
where appt_tools is another shortcut to the above
mpirun ... apptainer exec --bind ... command. The tool
appt_tools is described further below. In the current case,
appt_tools tries to ease your life by creating and launching a
script mpirun.sh containing the following lengthy
mpirun-command:
MPI_DIR=/projects/sw/eb/arch/zen4/software/OpenMPI/4.1.5-GCC-12.3.0
FSIF=/projects/sw/Apptainer/SIF-files/centos_stream9+mpi4py.sif # copy of the final sif we build above
FEXE="python -m mpi4py.bench pingpong --max-size 67108868" # command to run inside Apptainer
mpirun -n 2 apptainer exec \
--bind ${MPI_DIR}:${MPI_DIR}:ro \
--bind /etc/libibverbs.d:/etc/libibverbs.d:ro \
--bind /projects/sw/eb/arch/zen4/software/UCX/1.14.1-GCCcore-12.3.0/lib:/opt/ucx:ro \
--bind /projects/sw/eb/arch/zen4/software/UCC/1.2.0-GCCcore-12.3.0/lib:/opt/ucc:ro \
--bind /projects/sw/eb/arch/zen4/software/GCCcore/12.3.0/lib/../lib64:/opt/gcc:ro \
--bind /projects/sw/eb/arch/zen4/software/hwloc/2.9.1-GCCcore-12.3.0/lib:/opt/hwloc:ro \
--bind /projects/sw/eb/arch/zen4/software/libfabric/1.21.0-GCCcore-12.3.0/lib:/opt/libfabric:ro \
--bind /projects/sw/eb/arch/zen4/software/numactl/2.0.16-GCCcore-12.3.0/lib:/opt/numactl:ro \
--bind /projects/sw/eb/arch/zen4/software/binutils/2.40-GCCcore-12.3.0/lib:/opt/binutils:ro \
--bind /projects/sw/eb/arch/zen4/software/zlib/1.2.13-GCCcore-12.3.0/lib:/opt/zlib:ro \
--bind /projects/sw/eb/arch/zen4/software/libxml2/2.11.4-GCCcore-12.3.0/lib:/opt/libxml2:ro \
--bind /projects/sw/eb/arch/zen4/software/libpciaccess/0.17-GCCcore-12.3.0/lib:/opt/libpciaccess:ro \
--bind /usr/lpp/mmfs/lib:/opt/gpfs:ro \
$FSIF $FEXE
The pingpong test shows this kind of output:
# MPI PingPong Test
# Size [B] Bandwidth [MB/s] | Time Mean [s] ± StdDev [s] Samples
1 0.62 | 1.6172415e-06 ± 1.6920e-07 10000
2 1.23 | 1.6257193e-06 ± 1.9910e-07 10000
4 2.47 | 1.6217059e-06 ± 2.0745e-07 10000
8 4.85 | 1.6494426e-06 ± 1.5699e-07 10000
...
1048576 11412.46 | 9.1879942e-05 ± 5.1434e-07 1000
2097152 11831.73 | 1.7724815e-04 ± 4.8272e-07 10
4194304 12080.45 | 3.4719765e-04 ± 3.8304e-07 10
8388608 12209.32 | 6.8706595e-04 ± 3.6534e-07 10
16777216 12272.32 | 1.3670782e-03 ± 4.2741e-07 10
33554432 12296.29 | 2.7288256e-03 ± 6.3523e-06 10
67108864 12322.43 | 5.4460724e-03 ± 7.1724e-07 10
where a convergence of the value Bandwidth [MB/s] towards ~12000 MB/s indicates the expected bandwidth with UCX/UCC transport (this holds for a modest load on the compute nodes).
A side remark is that one may see the following kind of message upon launching a containerized MPI-parallel application:
Open MPI's OFI driver detected multiple equidistant NICs from the current process,
but had insufficient information to ensure MPI processes fairly pick a NIC for use.
This may negatively impact performance. A more modern PMIx server is necessary to
resolve this issue.
Note: This message is displayed only when the OFI component's verbosity level is
-79797552 or higher.
OpenMPI uses PMIx to determine process locality, such as which CPU socket or NUMA domain a process is on. If PMIx cannot provide that info, e.g., due to an older version or missing integration, OpenMPI’s OFI (libfabric) transport layer fails to make smart NIC choices and falls back to round-robin or default behavior. Hence, this is only a performance advisory about seeing multiple NICs (e.g., Infiniband interfaces) that are equally close to the process. There is an internal setting
export FI_PROVIDER=verbs
export FI_VERBS_IFACE=ib0
carried out by appt_tools prior to launching mpirun which should
help avoid the OFI/NIC-related advisory.
appt_tools - a helper tool for containerized/apptainerized mpirun/sbatch jobs#
One of the main purposes of appt_tools is to provide shortcuts to
common apptainer command lines that involve mpirun through the Bind
model.
$ mpirun -n <NRANKS> apptainer exec --bind "$MPI_DIR" <IMAGE> --bind ... </PATH/TO/MPI-PROGRAM/WITHIN/CONTAINER>
with potentially numerous --bind entries. While appt_tools has
other functionalities, here only the mpirun-feature is described. This
feature is activated via the mpirun subcommand:
appt_tools [OPTIONS] mpirun <YOUR-MPI-OPTIONS>
which creates and instantly launches a run script mpirun.sh. You can
use the option -o <OUTFILE> to choose a different output script
name. If you prefer not to launch the script right away, use
appt_tools [OPTIONS] mpirunx <YOUR-MPI-OPTIONS>
which will only create mpirun.sh. The helper-script build.sh
contains two examples:
#
# appt_tools - mpirun Example 1: mpi4py.bench
m=67108868 # max packet size for benchmark test
# writes a bash script mpi4py.bench.sh and launches it
appt_tools -V -o mpi4py.bench.sh mpirun -n 2 python -m mpi4py.bench pingpong --max-size $m
#
# appt_tools - mpirun Example 2: simple MPI ring-communication test
# writes a bash script mpirun.sh and launches it
exe=/projects/sw/Apptainer/usr/bin/mpiinitst.4.1.5-GCC-12.3.0.exe
appt_tools -V mpirun -n 2 $exe
which you can run with ./build.sh r1 and ./build.sh r2,
respectively.
appt_tools runtime parameters#
In the output of ./build.sh r1 or ./build.sh r2, one notes the
settings
MPI-module: mpi/OpenMPI/4.1.5-GCC-12.3.0 (already-loaded MPI-module mpi/* in parent-shell)
MPI_DIR: /projects/.../OpenMPI/4.1.5-GCC-12.3.0 (module prepend-path PATH)
SIFFILE: ./centos_stream9+mpi4py.sif (Apptainer/Singularity-image-format file)
MPI_DIR is required to mount the host’s MPI into Apptainer and is
evaluated from the setting for MPI-module. SIFFILE provides the
image file (<IMAGE>) for the apptainer exec command. Different
options exist for providing these values to appt_tools as outlined
in the following.
appt_tools configuration (*.ini) file#
This section describes runtime parameters that are needed when launching
apptainerized MPI programs. Different ways exist for providing runtime
parameters to appt_tools. All runtime parameters have in common that
they can be set via command-line arguments or configuration (*.ini)
file settings. The appt_tools configuration (*.ini) file format
is not exactly the same as the common INI file format as it has the
general format
[VAR] # value
<VALUE>
or
[VAR] # list of values
<VALUE_1>
...
<VALUE_N>
[mod]: MPI-module / MPI_DIR#
MPI_DIR can be provided in different ways. The following 5 ways
are evaluated in the listed order until a setting for MPI-module is
found.Command-line option
-m/--module: You can provide the MPI-module via the command-lineappt_tools -m mpi/OpenMPI/4.1.5-GCC-12.3.0 mpirun ...
This will let
appt_toolsextractMPI_DIRfrom the PATH information of the specified module, herempi/OpenMPI/4.1.5-GCC-12.3.0.module loadline present in slurm file: Only applicable for runs that involve thesbacth/sbatchxsubcommand. See theappt_toolsdocumentation, to be invoked viaappt_tools -doc. - Setting[mod]inappt_toolsconfiguration file. The configuration file is a file with the file suffixini. An adequate entry would then be[mod] # MPI-module mpi/OpenMPI/4.1.5-GCC-12.3.0
MPI-module already loaded: If the above steps do not produce a setting for an MPI-module, the next attempt consists of figuring out if an adequate MPI-module is already loaded in the current environment. This is done internally via the
module listcommand.Environment variable
MOD_MPI: The last option of providing MPI-module information consists of a variable setting like (in bash)export MOD_MPI=mpi/OpenMPI/4.1.5-GCC-12.3.0
If working with always the same MPI-module, setting
MOD_MPIin your shell initialization file (i.e.,~/.bashrc) might be useful.
[sif]: IMAGE file (*.sif)#
<IMAGE>) can also be done in
different ways, which are evaluated in the following order.Command-line option
-s/--sif: You can provide an absolute or relative file path for<IMAGE>:appt_tools -s /projects/sw/Apptainer/SIF-files/centos_stream9+mpi4py.sif mpirun ...
Setting
[sif]inappt_toolsconfiguration (*.ini) file. An adequate entry would be[sif] # SIFFILE /projects/sw/Apptainer/SIF-files/centos_stream9+mpi4py.sif
Find the newest (if multiple files are present)
*.siffile in current directory.Environment variable
APT_SIFFILE: The last option of providingSIFFILEconsists of a variable setting like (in bash)export APT_SIFFILE=/path/to/your/file.sif
If repeatedly working with the same
SIFFILE, settingAPT_SIFFILEin your shell initialization file (i.e.,~/.bashrc) might be useful.