While MPI jobs are traditional domain in High Performance Computing, our platform is capable of running MPI Jobs easily using MPI Operator from kubeflow. The MPI Operator makes it easy to run allreduce-style distributed training on Kubernetes. We have deployed cluster-wide MPI Operator which allpws you to create an MPI job by defining a Kuberntes kind
MPIJob, full documentation on kind’s structure is available here. Some documentation on MPI Operator can be found at kubeflow site or this blogpost.
To be able to run MPI job, there are two steps required: prepare specific Docker image and create MPIJob manifest.
MPI Job Docker Image
MPIJob expects specific Docker Image, the image must contain openssh server and this server needs to be configured. Also creating some user in docker image is required. However, user needs to add the following fragment for Debian family distributions (download here):
RUN apt-get update && \ apt-get -y --no-install-recommends install openmpi-bin openssh-server openssh-client bind9-host && \ apt-get clean && rm -rf /var/lib/apt/lists/* RUN mkdir -p /var/run/sshd RUN useradd -m -u 1000 user RUN sed -i 's/[ #]\(.*StrictHostKeyChecking \).*/ \1no/g' /etc/ssh/ssh_config && \ echo " UserKnownHostsFile /dev/null" >> /etc/ssh/ssh_config && \ echo "StrictModes no" > /etc/ssh/sshd_config && \ echo "PidFile /tmp/sshd.pid" >> /etc/ssh/sshd_config && \ echo "HostKey ~/.ssh/id_rsa" >> /etc/ssh/sshd_config && \ echo "Port 2222" >> /etc/ssh/sshd_config && \ echo "ListenAddress 0.0.0.0" >> /etc/ssh/sshd_config && \ sed -i 's/.*Port 22.*/ Port 2222/g' /etc/ssh/ssh_config WORKDIR /data RUN chown 1000:1000 /etc/ssh/ssh_config CMD /usr/sbin/sshd -De
Of course, you need to add your application and base proper image like
pytorch. The same image should be used for both
MPI Job Manifest
You can download example of MPI Job manifest. This example is not working as is. You need to specify
COMMAND. User in the attribute
sshAuthMountPath must match user name created in the docker image. The attribute
Worker.replicas denotes how many workers to spawn, i.e., how many paralel jobs will run. Also the parameter
-n "2" for
Launcher needs to match
Worker.replicas. Note, the number
2 must always be quoted
Running the MPI Job
Once you have the docker image and the manifest ready, you run the MPI Job using:
kubectl create -f mpijob.yaml -n namespace
namespace with your namespace. The Launcher jobs might be failing for a while, this is expected and normal.