ISC HPC Blog
Is co-design for exascale computing a false hope?
It has become a common, even required, mantra at conferences and workshops discussing exascale – that co-design is essential to achieve performance at exascale. The idea is that optimum performance for the science/mission within the constraints of cost, power, etc. is only achievable if we can modify both the hardware and application software roadmaps in conversation with each other.
However, as anyone who has heard me talk at those workshops and conferences in recent weeks will know, I am concerned that we are placing too much faith in co-design. In fact, we could describe co-design as the current “belief system” of the exascale hopeful.
I think the HPC community won’t secure any major changes to next generation of computing hardware - especially processors (CPU or GPU) and memory. The computer components of the exascale era (~2020) will be more strongly influenced by the needs of the consumer industry (games, mobile devices, etc.) and the growth of analytics than by HPC.
We have been reassured by knowledgeable people who have spent productive time inside the vendor community that HPC is a valuable market segment, even for computing companies with much broader markets such as Intel, IBM, NVidia, AMD, etc. and so the hardware architects will listen to the needs of the HPC community. But, when pushed, the same authoritative voices agree that the amount of change in the hardware designs as a result of the HPC community’s voice will be small.
But, should we worry?
After all, thinking back to another great change in supercomputing technology, I don’t think the HPC community secured much influence over the design of “killer micros”. But, after a period of denying the inevitable change, we learned how to successfully integrate the technology of commodity PCs and servers into new kinds of supercomputers.
The lesson for exascale is surely the same – we should focus on predicting the likely technologies of ~2020, influence it as much as we can, understand how it might perform. Then figure out how to best use that next generation of consumer driven technology as components of supercomputer configurations, and as drivers of our application development roadmaps.
In addition, my predicted drivers of the technology (games, mobile and analytics) all have characteristics that suit the needs of HPC quite well – namely FLOPS, power consumption and memory performance.
I do agree that stronger interaction between the hardware system (rather than component) architects, the application developers and the mission drivers (e.g. the science goals) is required to progress in supercomputing.
But I also worry that the focus on the mantra of co-design may in fact be distracting us from some of pressing challenges in our exascale quest.
First, the balance of effort in the exascale journey. Users of HPC along with HPC professionals and advocates have assembled a plethora of cases where Exascale systems will be beneficial or even required to achieve the science or business goals. A broad set of R&D into exascale technologies is underway by vendors, academic researchers & government. But, in my view, the exascale conversation is still too focused on FLOPS, and indeed on hardware in general rather than full ecosystem of hardware, software, applications, people, …
Second, the pervasiveness of the software challenge. There is still the risk that the users and developers of software won’t have the appetite for the major changes required by the computing technologies of the exascale era. And yet, what choice do they have? Because the same core technologies will be part of most HPC systems in the 2020 timeframe – not just the Top5 systems but also new HPC systems in the Top20, Top100 – in fact throughout the HPC space. We need to ensure funding agents, users and code developers understand that most exascale challenges will affect all HPC systems.
Third, the scale of the software marathon. We must get a firm understanding of the balance of the software journey for the exascale era. How many of the existing software packages (applications especially) can sensibly be rewritten for new levels of performance? How many must be rewritten to exploit exascale computing? How many must be evolved (e.g. because of the many years of investment in validation against experimental data)? We need to understand the scale and balance of this epic software effort.
Fourth, on a technical level, co-design won’t tell us the answer to the agreed basic characteristic of the exascale systems – we need to find another few orders of magnitude concurrency in our applications. We need real research to discover how to usefully apply many more FLOPS to smaller lumps of data (reduced memory capacity per FLOPS) under severe locality constraints (data movement is just too expensive in both power and performance).
So, in summary, co-design won’t solve all – or even most – of the technical challenges of exascale. It won’t be much help with the equally important cultural challenges – we underestimate the inertia and complexity of the applications software ecosystem.
However, co-design is an essential part of the exascale conversation – it is a mechanism by which we can co-ordinate our influence on the hardware architects – the technical issues of exascale era computing look so daunting that every little bit of HPC change secured in the hardware helps. Co-design is also an explicit recognition of the fact that software must be evolved and innovated with the predicted hardware technologies in mind in order to deliver useful exascale computing. Indeed, co-design’s illumination of the required software effort may be its biggest contribution.
Andrew can also be followed regularly at https://twitter.com/#!/hpcnotes
Andrew is Vice-President HPC Services and Consulting at the Numerical Algorithms Group (NAG). NAG provides HPC application performance services and impartial technology consulting to customers around the world. NAG is also a core part of the UK’s HECToR national supercomputing service, providing the Computational Science and Engineering (CSE) Support Service, including training. Andrew was originally a researcher using HPC and developing related software in government and industrial settings, later becoming involved in leadership of HPC services such as the UK’s CSAR national HPC service. Andrew has undertaken independent reviews of HPC services, was involved in the early technical management of PRACE, and was a theme co-chair in the European Exascale Software Initiative (EESI). Andrew is interested in future concerns of the HPC community, including exascale, application performance, skills development, and broadening usage.