Running dockerized Jupyter Lab/Notebook on HPC systems

This illustrates the steps to run any dockerized Jupyter Lab/Notebook on HPC system.

Creating the required Apptainer Container

For this example, we will use the NGIMS Metagenomics-Analysis-of-Biofilm-Microbiome project as it has a docker file with all the definitions and dependencies required to run the notebooks. The rest of the process assumes you have built the docker image from the docker file. Details on how to build a docker image from a docker file are found here.

Since most of the HPCs do not support docker containers (primarily due to security reasons as Apptainer does not need root privileges), we need to create an Apptainer image using the docker image; steps to create an Apptainer image based on docker containers (either private or public) are found here.

If you have a Linux system and would like to try running apptainer on your local system, the steps to install Apptainer on WSL Linux (or any Linux and Debian) distros are found here

Apptainer Image from a Docker Image:

The following command will convert a locally available docker image to an apptainer image (a common practice if you have both docker and apptainer available on your system)

apptainer build <apptainer_image_name>.sif docker-daemon://<docker_image_name>:<tag>

To get the Docker Image from Docker Hub (Public Image)

apptainer pull docker://<docker image>

To get the Docker Image from Docker Hub (Private Image)

We first need to log into the docker hub to get the private docker image (i.e., the one you have for your account). To log into the docker, we can use the following command

apptainer remote login --username myuser docker://docker.io

On running the above command, a prompt asking for token or password appears

Note: This type of login stores the token/password to /home/myuser/.apptainer/remote.yaml

We can use a one-off login if we do not want to store the tokens. For a one-off login, use the -login flag

apptainer pull --docker-login docker://myuser/private

Jupyter from the Container

Once the apptainer image is built, the Jupyter Notebook can be launched from within the container. We need to create an SSH tunnel to launch and access the notebook hosted on HPC. The following script could be used to launch and create a tunnel launch the lab/notebook environment.

#!/bin/bash
PASSWORD_LOCATION=${HOME}/.jupyter/jupyter_notebook_config.json
CHECK_PASSWORD=true
TIMELIMIT="02:00:00"
CPUS_PER_TASK=2
MEM_PER_CPU=3800
JOB_NAME="apptainer-jupyter-notebook"
LOGIN_HOST=$(hostname -s)
NODE_TYPE="nodes"    # change this to the node to be used
GPU_TYPE="ampere"    # change the GPU type as per your cluster
NO_OF_GPU=1    #change as per necessity
function check_python {
    if command -v python &> /dev/null ; then
        echo "found python"
        python_exe=python
    elif command -v python3 &> /dev/null ; then
        echo "found python3"
        python_exe=python3
    else
        echo "Missing python and python3, we need a python interpreter to continue."
        echo "Exiting..."
        exit 1
    fi
}

function password_set {
    $python_exe - << END
import sys
import json
with open("$PASSWORD_LOCATION") as config_file:
    data = json.load(config_file)
    for n in data['NotebookApp']:
        if 'password' == n:
            sys.exit(0)

    sys.exit(1)
END
}

# Countdown function to delay jupyter notebook startup but still show output to user
countdown()
(
  IFS=:
  set -- $*
  secs=$(( ${1#0} * 3600 + ${2#0} * 60 + ${3#0} ))
  while [ $secs -gt 0 ]
  do
    sleep 1 &
    printf "\r%02d:%02d:%02d" $((secs/3600)) $(( (secs/60)%60)) $((secs%60))
    secs=$(( $secs - 1 ))
    wait
  done
  echo
)

# export function to make sure it can be used in subshells
export -f countdown

# Get all the passed in options
while getopts "dhn:t:c:m:" opt; do
  case ${opt} in
    d )
      CHECK_PASSWORD=false
      ;;
    n )
      ENVIRONMENT=$OPTARG
      ;;
    h )
      echo "Usage: $(basename $0) [-h] [-d] [-n ENVIRONMENT] [-c CPUS] [-t TIMELIMIT] [-m MEM]"
      echo "    -h            Display this message"
      echo "    -d            Don't check to see if a password is set (default checks for password)"
      echo "    -n            Name of conda environment (default=jupyter) that jupyter is"
      echo "                   installed into.  If you pass -n base it will use the base environment"
      echo "    -t            Timelimit of job to run (default=${TIMELIMIT})"
      echo "    -c            Number of cpus per task to pass to srun (default=$CPUS_PER_TASK)"
      echo "    -m            Memory per cpu (default=${MEM_PER_CPU} megs)"
      exit 0
      ;;
    t )
      TIMELIMIT=$OPTARG
      ;;
    c )
      CPUS_PER_TASK=$OPTARG
      ;;
    m )
      MEM_PER_CPU=$OPTARG
      ;;
    \? )
      echo "Invalid option: $OPTARG" 1>&2
      exit -1
      ;;
    : )
      echo "Invalid option: $OPTARG requires an argument" 1>&2
      exit -1
      ;;
  esac
done
shift $((OPTIND -1))

# check for python or python3 and set the proper binary name
check_python

# Check for password
if $CHECK_PASSWORD ; then
    # Assume we do need to set it
    NEED_TO_SET=true

    # See if file exists
    if [ -d "$(dirname $PASSWORD_LOCATION)" ] && [ -f "$PASSWORD_LOCATION" ]; then
        if password_set; then
            # Password is already set
            NEED_TO_SET=false
        fi
    elif [ ! -d "$(dirname $PASSWORD_LOCATION)" ]; then
        # If the directory for the jupyter notebook config file that contains the password does not exist, jupyter will probably crash and complain. ensure the folder exists.
        mkdir "$(dirname $PASSWORD_LOCATION)"
    fi

    if $NEED_TO_SET; then
        echo "You need to set a password for jupyter notebook before you begin"
        echo "Running \`jupyter notebook password\`"
        jupyter notebook password
    fi
fi

# Start up an interactive job to run Jupyter Notebook on with 0 full nodes,
srun -c $CPUS_PER_TASK -t $TIMELIMIT --mem-per-cpu $MEM_PER_CPU -J $JOB_NAME -p $NODE_TYPE --pty bash -c '
#source ~/anaconda3/etc/profile.d/conda.sh
#conda activate '$ENVIRONMENT'
module load apptainer
LOCAL_JN_PORT=$(expr 50000 + ${SLURM_JOBID: -4})
SSH_CTL=$TMPDIR/.ssh-tunnel-control
ssh -f -g -N -M -S $SSH_CTL \
    -R *:$LOCAL_JN_PORT:localhost:$LOCAL_JN_PORT \
     '$LOGIN_HOST'
echo
echo "========================================================"
echo
echo "Your Jupyter Notebook is now running"
echo "To Connect:"
echo "1) Mac/Linux/MobaXterm users: run the following command FROM A NEW LOCAL TERMINAL WINDOW (not this one)"
echo
echo "ssh -L${LOCAL_JN_PORT}:localhost:${LOCAL_JN_PORT} $USER@'$LOGIN_HOST'.lawrence.usd.edu"
echo
echo "For other users (PuTTY, etc) create a new SSH session and tunnel port $LOCAL_JN_PORT to localhost:$LOCAL_JN_PORT"
echo
echo "========================================================"
echo "Starting Jupyter Notebook in..."
countdown "00:00:10"
echo "========================================================"
echo
echo "Connect to your jupyter notebook with the following links."
echo "You need to use the password that was set for your jupyter notebook instance."
echo "  http://localhost:$LOCAL_JN_PORT/  or"
echo "  http://127.0.0.1:$LOCAL_JN_PORT/"
echo
echo "If you do not remember your password, you can set a new password by:"
echo "  1) exiting with Ctrl+c"
echo "  2) loading your conda jupyter environment"
echo "       conda activate '$ENVIRONMENT'"
echo "  3) setting a password"
echo "       jupyter notebook password"
echo "  4) re-running this startup script"
echo
sleep 5
echo "========================================================"
echo
apptainer run datascience-notebook_latest.sif jupyter notebook --no-browser --port=$LOCAL_JN_PORT
sleep 3
ssh -S $SSH_CTL -O exit localhost 2>&1 > /dev/null
sleep 3
exit'

Important Notes: The script uses some changeable parameters that could be modified as per necessity

TIMELIMIT defines how long the Jupyter Notebook will run; the time format is HH:MM:SS
CPUS_PER_TASK defines the number of CPU cores to be used
MEM_PER_CPU defines the RAM per CPU core, value specified in MB. Please use G after the number to specify in GB (eg. 8G). To allocate all the available memory for the node, use 0

apptainer run datascience-notebook_latest.sif jupyter notebook --no-browser --port=$LOCAL_JN_PORT

This is the line that actually starts the container in the script and hence the name of the .sif should be changed as per necessity

NB : Apptainer allows us to access GPU, to use the GPU, a --nv flag needs to be provided while running the container. But first, we have to change how the resources are allocated, as the above command runs an interactive job on a compute (CPU) node. To change the type of node where the apptainer starts

srun -c $CPUS_PER_TASK -t $TIMELIMIT --mem-per-cpu $MEM_PER_CPU -J $JOB_NAME -p $NODE_TYPE --gres=gpu:$GPU_TYPE:$NO_OF_GPU --pty bash -c '

Hence, the command to start the notebook to use GPU would be

apptainer run --nv datascience-notebook_latest.sif jupyter notebook --no-browser --port=$LOCAL_JN_PORT

Once all the changes are made, save the script with .sh extension to the HPC cluster in the same directory, which has the .sif file.

You can also use the jupyter lab server using:

apptainer run datascience-notebook_latest.sif jupyter lab --no-browser --port=$LOCAL_JN_PORT

Starting the notebook

To start the notebook, use the command

bash <bash_file_name>.sh

Creating an SSH Tunnel

To create an SSH Tunnel, copy and paste the line from the screen to a new terminal in MobaXterm or any Terminal; the line will start with SSH and has the following pattern.

ssh -L5XXXX:localhost:5XXXX <usd_username>@'<login_node>'.lawrence.usd.edu

Launch the Notebook from the Local Machine

Once completed and the Jupyter Lab Starts, the Jupyter Notebook could be launched from the local machine using the link provided from the screen with the following pattern.

http://127.0.0.1:5XXXX

P.S: Thanks to Bill Conn of the USDRcg team for the core script

Anup Khanal's Blog