MPI Operator
While MPI jobs are traditional domain in High Performance Computing, our platform is capable of running MPI Jobs easily using MPI Operator from kubeflow. The MPI Operator makes it easy to run allreduce-style distributed training on Kubernetes. We have deployed cluster-wide MPI Operator which allpws you to create an MPI job by defining a Kuberntes kind MPIJob
, full documentation on kind’s structure is available here. Some documentation on MPI Operator can be found at kubeflow site or this blogpost.
To be able to run MPI job, there are two steps required: prepare specific Docker image and create MPIJob manifest.
MPI Job Docker Image
MPIJob expects specific Docker Image, the image must contain openssh server and this server needs to be configured. Also creating some user in docker image is required. However, user needs to add the following fragment for Debian family distributions (download here):
RUN apt-get update && \
apt-get -y --no-install-recommends install openmpi-bin openssh-server openssh-client bind9-host && \
apt-get clean && rm -rf /var/lib/apt/lists/*
RUN mkdir -p /var/run/sshd
RUN useradd -m -u 1000 user
RUN sed -i 's/[ #]\(.*StrictHostKeyChecking \).*/ \1no/g' /etc/ssh/ssh_config && \
echo " UserKnownHostsFile /dev/null" >> /etc/ssh/ssh_config && \
echo "StrictModes no" > /etc/ssh/sshd_config && \
echo "PidFile /tmp/sshd.pid" >> /etc/ssh/sshd_config && \
echo "HostKey ~/.ssh/id_rsa" >> /etc/ssh/sshd_config && \
echo "Port 2222" >> /etc/ssh/sshd_config && \
echo "ListenAddress 0.0.0.0" >> /etc/ssh/sshd_config && \
sed -i 's/.*Port 22.*/ Port 2222/g' /etc/ssh/ssh_config
WORKDIR /data
RUN chown 1000:1000 /etc/ssh/ssh_config
CMD /usr/sbin/sshd -De
Of course, you need to add your application and base proper image like tensorflow
or pytorch
. The same image should be used for both Launcher
and Worker
Pods.
MPI Job Manifest
You can download example of MPI Job manifest. This example is not working as is. You need to specify IMAGE
and COMMAND
. User in the attribute sshAuthMountPath
must match user name created in the docker image. The attribute Worker.replicas
denotes how many workers to spawn, i.e., how many paralel jobs will run. Also the parameter -n "2"
for Launcher
needs to match Worker.replicas
. Note, the number 2
must always be quoted "2"
.
Running the MPI Job
Once you have the docker image and the manifest ready, you run the MPI Job using:
kubectl create -f mpijob.yaml -n namespace
replacing namespace
with your namespace. The Launcher jobs might be failing for a while, this is expected and normal.