Apptainer is an open-source containerization platform that is primarily designed for high-performance computing (HPC) and scientific computing environments. It focuses on providing secure and efficient containerization for running applications and workflows, particularly in shared and multi-user HPC clusters.
In the context of containerization technologies like Apptainer, “container” and “image” are two closely related concepts, but they have distinct meanings:
A container is a running instance of a container image.
A container image is a static, standalone package that contains all the necessary files and configurations needed to run an application or service.
For AI and ML work on Berzelius, where highly complex production environments and a high degree of user customizability are essential, NSC strongly recommends the use of a container environment. Apptainer and Enroot are the supported options, while Docker is not supported due to security considerations.
Employing a container environment offers several advantages, including enhanced portability and the ability to reproduce results across a wide range of systems, including laptops, Berzelius, and EuroHPC resources like LUMI. Additionally, it provides users with the flexibility to select their preferred operating system independently of the host environment, resulting in a more familiar and user-friendly experience.
Please be aware that Apptainer will not run from the /home/username
directory. The reason for this is that Apptainer image files can be large, and there is no need to store Apptainer images in /home/username
, which has a limited quota of 20 GB. Please run your images directly from your project directory.
Apptainer is available on both login nodes and compute nodes.
[xuan@node044 ~]$ apptainer --version
apptainer version 1.1.9-1.el8
You can check the available options and subcommands using --help
:
apptainer --help
Apptainer is compatible with Docker images. Docker Hub is a cloud-based platform and registry service provided by Docker. It serves as a central repository for container images, making it easy for developers to share, distribute, and collaborate on containerized applications and services. Image hosted on the hub can be easily downloaded with the docker://
URL as reference.
apptainer pull pytorch_2.0.1.sif docker://pytorch/pytorch:2.0.1-cuda11.7-cudnn8-runtime
The image is stored locally as a .sif file (pytorch_2.0.1.sif, in this case).
NVIDIA GPU Cloud (NGC) is a platform and repository that provides a comprehensive set of GPU-optimized containers, pre-trained deep learning models, and AI software to accelerate and simplify AI and GPU-accelerated computing workflows.
apptainer pull tensorflow-20.03-tf2-py3.sif docker://nvcr.io/nvidia/tensorflow:20.03-tf2-py3
Running containers from the available public images is not the only option. In many cases, it is required to create a new one from scratch.
An Apptainer definition file is a configuration file that provides instructions for building an Apptainer image. An example of the Apptainer definition file is as follows. Please refer to the Apptainer User Guide for more details.
Let’s take a look at the definition file:
%environment
is used to define environment variables available inside the container.%post
are lines to execute inside the containerWe set the environment variable PYTHONNOUSERSITE=1
to instruct Python in the container to ignore the user-specific site-packages directory on the host when searching for modules and packages. This can be particularly useful when working in a container environment to ensure that the Python environment only uses packages installed inside the container and not in user-specific locations on the host.
We first download the base image:
apptainer build cuda_11.7.1-cudnn8-devel-ubuntu22.04.sif docker://nvidia/cuda:11.7.1-cudnn8-devel-ubuntu22.04
We build the image from the following definition file.
Bootstrap: localimage
From: cuda_11.7.1-cudnn8-devel-ubuntu22.04.sif
%environment
export PATH=/opt/mambaforge/bin:$PATH
export PYTHONNOUSERSITE=True
%post
apt-get update && apt-get install -y --no-install-recommends \
git \
nano \
wget \
curl
# Install Mambaforge
cd /tmp
curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh"
bash Mambaforge-$(uname)-$(uname -m).sh -fp /opt/mambaforge -b
rm Mambaforge*sh
export PATH=/opt/mambaforge/bin:$PATH
mamba install python==3.10 pytorch==2.0.1 torchvision torchaudio torchdata torchtext pytorch-cuda=11.7 -c pytorch -c nvidia -y
# Pin packages
cat <<EOT > /opt/mambaforge/conda-meta/pinned
pytorch==2.0.1
EOT
mamba install matplotlib scipy pandas -y
Building an Apptainer image requires root access. You can build an image on Berzelius with a few restrictions using the fakeroot
feature. The fakeroot feature allows an unprivileged user to run a container as a “fake root” user by leveraging user namespace UID/GID mapping. A “fake root” user has almost the same administrative rights as root but only inside the container and the requested namespaces.
apptainer build --fakeroot pytorch_2.0.1.sif pytorch_2.0.1.def
The image can be built directly from a base image without downloading it from a registry. You just need to change the definition header to:
Bootstrap: docker
From: nvidia/cuda:11.7.1-cudnn8-devel-ubuntu22.04
You can modify an existing image to suit your requirements.
The command apptainer build
provides a flag --sandbox
that will create a writable directory in your work directory.
apptainer build --fakeroot --sandbox pytorch_2.0.1 pytorch_2.0.1.sif
We then initialize an interactive session using the apptainer shell
command to write files within the sandbox directory with the --writable
flag.
apptainer shell --fakeroot --writable pytorch_2.0.1/
Apptainer> mamba install jupyterlab -y
Apptainer> apt update
Apptainer> apt install vim -y
Apptainer> exit
Finally, we save the modified image.
apptainer build pytorch_2.0.1_v2.sif pytorch_2.0.1
We may need to access outside directories when running a container. By default, Apptainer binds:
You can specify the directories to bind using the --bind
or -B
flag.
apptainer shell -B /proj/your_proj/users/username/data:/data pytorch_2.0.1.sif
Here, the colon :
separates the path to the directory on the host from the mounting point inside the container.
The apptainer shell
command initializes a new interactive shell inside the container. The --nv
flag to enable Nvidia support.
To quit the container, typing exit
or hitting Ctrl + D
.
[xuan@node001 containers]$ apptainer shell --nv pytorch_2.0.1.sif
Apptainer> exit
exit
[xuan@node001 containers]$
Note that when exiting from the container all the running processes are killed (stopped). Changes saved into bound directories are preserved. By default anything else in the container is lost.
The command apptainer exec
starts the container from a specified image and executes a command inside it.
[xuan@node001 containers]$ apptainer exec --nv pytorch_2.0.1.sif python -c "import torch; print('GPU Name: ' + torch.cuda.get_device_name(0))"
GPU Name: NVIDIA A100-SXM4-40GB
You can integrate the containers into a batch job submission script. You need first to create a batch job script.
#!/bin/bash
#SBATCH -A your_proj_account
#SBATCH --gpus=1
#SBATCH --time 00:10:00
apptainer exec --nv -B /local_data_dir:/data apptainer_image.sif python some_script.py
Operation | Command |
---|---|
Downloading images | apptainer pull pytorch_2.0.1.sif docker://pytorch/pytorch:2.0.1-cuda11.7-cudnn8-runtime |
Building images | apptainer build –fakeroot apptainer_image.sif apptainer_image.def |
Initializing a shell | apptainer shell –nv apptainer_image.sif |
Executing commands | apptainer exec –nv -B /local_data_dir:/data apptainer_image.sif bash -c “python some_script.py” |
Guides, documentation and FAQ.
Applying for projects and login accounts.