MPI Operator
Deploying MPI (Message Passing Interface) jobs on the platform using the Kubeflow MPI Operator
Although MPI jobs are traditionally associated with High-Performance Computing, our platform can run MPI jobs easily using the MPI Operator from Kubeflow. The MPI Operator simplifies running allreduce-style distributed training on Kubernetes. We have deployed a cluster-wide MPI Operator that allows you to create MPI jobs by defining a Kubernetes resource of kind MPIJob. Full documentation on the resource structure is available here. Additional documentation on the MPI Operator can be found at the Kubeflow site or in this blog post.
To run an MPI job, you need to complete two steps: prepare a Docker image and create an MPIJob manifest.
MPI Job Docker Image
The MPIJob requires a specific Docker image that must include a configured OpenSSH server and a created user. For Debian family distributions, you need to add the following fragment (download here):
RUN apt-get update && \
apt-get -y --no-install-recommends install openmpi-bin openssh-server openssh-client bind9-host && \
apt-get clean && rm -rf /var/lib/apt/lists/*
RUN mkdir -p /var/run/sshd
RUN useradd -m -u 1000 user
RUN sed -i 's/[ #]\(.*StrictHostKeyChecking \).*/ \1no/g' /etc/ssh/ssh_config && \
echo " UserKnownHostsFile /dev/null" >> /etc/ssh/ssh_config && \
echo "StrictModes no" > /etc/ssh/sshd_config && \
echo "PidFile /tmp/sshd.pid" >> /etc/ssh/sshd_config && \
echo "HostKey ~/.ssh/id_rsa" >> /etc/ssh/sshd_config && \
echo "Port 2222" >> /etc/ssh/sshd_config && \
echo "ListenAddress 0.0.0.0" >> /etc/ssh/sshd_config && \
sed -i 's/.*Port 22.*/ Port 2222/g' /etc/ssh/ssh_config
WORKDIR /data
RUN chown 1000:1000 /etc/ssh/ssh_config
CMD /usr/sbin/sshd -DeYou will also need to add your application code and base the image on an appropriate base image such as tensorflow or pytorch. The same image must be used for both the Launcher and Worker Pods.
MPI Job Manifest
You can download an example of an MPI Job manifest. Note that this example is not functional as-is. You must specify both the IMAGE and COMMAND fields. The username in the sshAuthMountPath attribute must match the user name created in the Docker image. The Worker.replicas attribute determines the number of worker Pods to spawn, which corresponds to the number of parallel jobs that will run. Additionally, the parameter -n "2" specified for the Launcher must match the value of Worker.replicas. Important: the number "2" must always be quoted.
Running the MPI Job
Once you have the Docker image and the manifest ready, you run the MPI Job using:
kubectl create -f mpijob.yaml -n namespacereplacing namespace with your namespace. The Launcher jobs might be failing for a while; this is expected and normal.
Last updated on
