COOL is a simple and generic structure for MPI collective operations. COOL enables highly efficient designs for all collective operations in the cloud. We explore a system design based on COOL that implements frequently used collective operations. Our design efficiently uses the intra-rack network while minimizing cross-rack communication, thus improving the application performance and scalability. We use recent software-defined networking capabilities to build optimal network paths for I/O intensive collective operations.
Our analytical evaluation shows that our design imposes the least possible network overhead across racks. Furthermore, when compared with OpenMPI and MPICH, our design reduces the number of steps to only three, decreases the number of exchanged messages by a factor of N, the total number of processes, and reduces the network load by up to an order of magnitude. These significant improvements come at the cost of a modest increase in the computation load on a few processes.
People
Publications
[1] COOL: A Cloud-Optimized Structure for MPI Collective Operations
Mohammed Alfatafta, Zuhair AlSader, Samer Al-Kiswany
Proceedings of IEEE International Conference on Cloud Computing (Cloud), July 2018. [pdf]