ISC HPC Blog
Smart Acceleration for Clusters
The DEEP and DEEP-ER projects are different from other European exascale ventures: Of course, we push the envelope with regards to HPC system architecture, like almost everyone in this business does. But, we also explore the outer limits when it comes to programming models. After all, according to our philosophy of exascale computing is all about the system as a whole.
Rather than following the conventional approach of “accelerated clusters” using closely paired CPUs with co-processors in a 1-to-n scheme, DEEP advances heterogeneity to the system level, in this case, a general-purpose HPC cluster combined with a tightly coupled many core system – what we call the Booster. This means applications can run on an optimal n-to-m combination of CPUs and co-processors. An easy-to-use and dynamic parallel offload model based on OmpSs enables applications to take maximum advantage of this and run code components on the parts of system that best match their characteristics.
So far, so good – in theory. That’s why, some 25 months into the project, it’s about time for the first proof of our concept.
In the last couple of weeks DEEP has gone through a very exciting phase – basically the ultimate baptism of fire for our concept: The new hardware has first come to life. As expected, we’ve experienced some teething pains, but the prototype Intel Xeon PhiT-based Booster nodes now boot and run system-level and application code. A high-performance backplane connects up to eigtht of them in a small 2x2x2 “Proto-Booster” using an FPGA implementation of the EXTOLL network. Work now focuses on the production of improved versions and then the step-by-step integration of the DEEP Booster.
Additionally we’ve made huge progress regarding the energy efficiency of the hardware. Eurotech has qualified their unique direct liquid cooling solution, which will enable inlet temperatures of in excess of 40 °C and thus enable “free cooling” for all but a handful of hot summer days in Germany.
On the software side, the Cluster-Booster protocol that connects both parts of the DEEP system has shown its performance potential. It is heads and shoulders above conventional approaches that involve a host CPU when bridging dissimilar networks and it entirely avoids intermediate data copies, delivering 96 percent of the theoretical peak bandwidth. Plus, we made another great leap forward by integrating the protocol with ParTec’s ParaStation MPI implementation. This enables a global MPI communication substrate across the full DEEP system, which, in turn, has also opened up a migration path for those MPI applications where the OmpSs-style task offload may not be practical.
It may not be fully clear if our DEEP journey ends exactly where we had envisioned it when we first started out, but we’re excited to keep going. The next big step for us will be to prove the worth of the DEEP architecture when running our six pilot applications. The hardware and software of the DEEP system will be integrated and operational before December 1st , giving the applications team half a year to tune their performance.
If you want to know more about the details of our adventurous journey, meet us at our joint BoF together with another EC-funded exascale projects: “Exascale Research: The European Approach.” The BoF takes place Tuesday June 24, 2014, 2:15pm – 3:15pm in Hall 5. We are also happy to welcome you at our booth #833.
Prof. Dr. Dr. Thomas Lippert received his diploma in Theoretical Physics in 1987 from the University of Würzburg. He completed Ph.D. theses in theoretical physics at Wuppertal University on simulations of quantum field theories and at Groningen University in the field of parallel computing. He is director of the Jülich Supercomputing Centre at Forschungzentrum Jülich, member of the board of directors of the John von Neumann Institute for Computing (NIC), and he holds the chair for Computational Theoretical Physics at the University of Wuppertal. His research interests include lattice gauge theories, quantum computing, numerical and parallel algorithms, and cluster computing.