|Home -> Research -> NB Collectives
Non-blocking MPI Collective Operations
What are Non-blocking Collective Operations?
Non-blocking collective operations (short: NBC) are an extension to the MPI-2 standard to combine the benefits of non-blocking point-to-point operations with the advantages of collective communication.
Non-blocking point-to-point operation allows overlapping of communication and computation to use the common parallelism in modern computer systems more efficiently. This enables the user to use the CPU even during ongoing message transmissions at the network level.
Collective operations allow the user to simplify his code and to use well tested and highly optimized routines for common collective communication patterns. These collective communication patterns can be highly optimized for the underlying hardware and yield a much better performance as naive implementations. Unfortunately, all these operations are only defined in a blocking manner, which disables explicit overlap.
Our new proposal combines advantages of non-blocking point-to-point and collective communication to enable non-blocking collective communication which can be used to overlap optimized collective communication patterns.The document : "Non-Blocking Collective Operations for MPI-2" provides a formal definition and may act as an MPI-2 standard-extension proposal.
Why should I use them?
The communication can consume a huge part of the running time of a parallel application. The communication time in those applications can be addressed as overhead because it does not progress the solution of the problem in most cases (with exception of reduce operations). Using overlapping techniques enables the user to move communication and the necessary synchonization in the background and use parts of the original communication time to perform useful computation.
However, the application code and the algorithm have to support overlapping. The user must find independent parts of the calculation to perform during a message transmission which do not depend on the result of this transmission.
How can I use them?There are many possibilities to use non-blocking collective operations. They are somehow similar to techniques known from non-blocking point-to-point communication. However, it depends on the problem and the implementation if nbcolls can decrease the running time of a parallel program. Some common scenarios are explained below.
Where can I get it?A first prototypic implementation focuses on portability and is based on MPI-1. It runs on every parallel computer which supports MPI-1. It uses optimized point-to-point algorithms. It runs in non-threaded environments, which forces the user to progress the operations by calling a test function. The prototype of LibNBC is available as source code at LibNBC. A description of LibNBC is available in "Design, Implementation, and Usage of LibNBC".
Are there any Examples available?A first example, a 3D CG Poisson solver is available at CG Solver. A detailed description and benchmark results are available in "Optimizing a Conjugate Gradient Solver with Non-Blocking Collective Operations".
|© Torsten Hoefler