Focusing LINPACK: The TOP500 Yardstick
Thursday, June 3, 2010, 9:00am – 10:30am, Hall B
- Dr. Erich Strohmaier, Head of Future Technology Group, Lawrence Berkeley National Laboratory, USA
LINPACK Benchmark with Time Limits on Multicore & GPU Based Accelerators
- Prof. Dr. Jack Dongarra, University Distinguished Professor of Computer Science, University of Tennessee & Oak Ridge National Laboratory, USA
The original LINPACK Benchmark is, in some sense, an accident. It was originally designed to assist users of the LINPACK package by providing information on execution times required to solve a system of linear equations. The first “LINPACK Benchmark” report appeared as an appendix in the LINPACK Users’ Guide in 1979. The appendix comprised of data for one commonly used path in the LINPACK software package. Results were provided for a matrix problem of size 100, on a collection of widely used computers (23 computers in all). This was done so users could estimate the time required to solve their matrix problem by extrapolation. Over the years additional performance data was added, more as a hobby than anything else, and today the collection includes over 1300 different computer systems. In addition to the number of computers increasing, the scope of the benchmark has also expanded. Today one form of the Linpack benchmark is the basis of the Top500 listing. We will look at how and why the benchmark has changed over the past 30 years and discuss the plans for another change to accommodate new technology and limitations.
Linpack on Multicores and GPUs
We will provide a brief historical look at the development of dense linear algebra libraries, from LINPACK, to LAPACK, to ScaLAPACK. These packages served the community well for many years. Today we see new computer architectures emerging, which will cause another change to the software landscape, namely many core and accelerators. These changes will necessitate changes again to the linear algebra libraries. We have been developing two packages, PLASMA and MAGMA, for just these architectures.
The main motivation for the PLASMA (Parallel Linear Algebra Software for Multiprocessor Architectures) project is to create a new generation of dense linear algebra libraries that achieve the fastest possible time to an accurate solution on multicore systems. Specifically, PLASMA aims at outperforming ScaLAPACK and LAPACK on distributed and shared memory systems, as well as leading vendor implementations (e.g. Intel’s MKL and AMD’s ACML) on the top of the line multi-core systems. It is also a main goal of PLASMA to provide a unified framework for different memory architectures, e.g. distributed memory systems (traditional clusters and tightly coupled MPPs), shared memory systems (traditional socket-level SMPs, multi-cores or CMPs, NUMA systems), as well as accelerator based computing.
Following are the main goals to be accomplish by the PLASMA project:
- Dynamic Scheduling PLASMA will relieve the programmer from scheduling of tasks by implementing dependency-driven/data-driven dynamic scheduling. Tasks will be scheduled as their dependencies become satisfied and subsequently input data becomes available.
- Communication & Memory Management PLASMA shall separate the algorithm developer from the specifics of particular memory architecture. In particular, PLASMA will relieve the programmer from explicit message passing on a distributed memory system and the allocation/management of communication data buffers.
The MAGMA project aims to develop a dense linear algebra library similar to LAPACK but for heterogeneous/hybrid architectures, starting with current "Multicore+GPU" systems.
The MAGMA research is based on the idea that, to address the complex challenges of the emerging hybrid environments, optimal software solutions will themselves have to hybridize, combining the strengths of different algorithms within a single framework. Building on this idea, we aim to design linear algebra algorithms and frameworks for hybrid manycore and GPUs systems that can enable applications to fully exploit the power that each of the hybrid components offers.