einfra logoDocumentation
Operators

MPI Operator

Deploying MPI (Message Passing Interface) jobs on the platform using the Kubeflow MPI Operator

Although MPI jobs are traditionally associated with High-Performance Computing, our platform can run MPI jobs easily using the MPI Operator from Kubeflow. The MPI Operator simplifies running allreduce-style distributed training on Kubernetes. We have deployed a cluster-wide MPI Operator that allows you to create MPI jobs by defining a Kubernetes resource of kind MPIJob. Full documentation on the resource structure is available here. Additional documentation on the MPI Operator can be found at the Kubeflow site or in this blog post.

To run an MPI job, you need to complete two steps: prepare a Docker image and create an MPIJob manifest.

MPI Job Docker Image

The MPIJob requires a specific Docker image that must include a configured OpenSSH server and a created user. For Debian family distributions, you need to add the following fragment (download here):

RUN apt-get update && \
    apt-get -y --no-install-recommends install openmpi-bin openssh-server openssh-client bind9-host && \
    apt-get clean && rm -rf /var/lib/apt/lists/*

RUN mkdir -p /var/run/sshd

RUN useradd -m -u 1000 user

RUN sed -i 's/[ #]\(.*StrictHostKeyChecking \).*/ \1no/g' /etc/ssh/ssh_config && \
    echo "    UserKnownHostsFile /dev/null" >> /etc/ssh/ssh_config && \
    echo "StrictModes no" > /etc/ssh/sshd_config && \
    echo "PidFile /tmp/sshd.pid" >> /etc/ssh/sshd_config && \
    echo "HostKey ~/.ssh/id_rsa" >> /etc/ssh/sshd_config && \
    echo "Port 2222" >> /etc/ssh/sshd_config && \
    echo "ListenAddress 0.0.0.0" >> /etc/ssh/sshd_config && \
    sed -i 's/.*Port 22.*/   Port 2222/g' /etc/ssh/ssh_config

WORKDIR /data

RUN chown 1000:1000 /etc/ssh/ssh_config

CMD /usr/sbin/sshd -De

You will also need to add your application code and base the image on an appropriate base image such as tensorflow or pytorch. The same image must be used for both the Launcher and Worker Pods.

MPI Job Manifest

You can download an example of an MPI Job manifest. Note that this example is not functional as-is. You must specify both the IMAGE and COMMAND fields. The username in the sshAuthMountPath attribute must match the user name created in the Docker image. The Worker.replicas attribute determines the number of worker Pods to spawn, which corresponds to the number of parallel jobs that will run. Additionally, the parameter -n "2" specified for the Launcher must match the value of Worker.replicas. Important: the number "2" must always be quoted.

Running the MPI Job

Once you have the Docker image and the manifest ready, you run the MPI Job using:

kubectl create -f mpijob.yaml -n namespace

replacing namespace with your namespace. The Launcher jobs might be failing for a while; this is expected and normal.

Last updated on

publicity banner

On this page

einfra banner