June 22–26, 2014
Leipzig, Germany

Session Details

Name: Tutorial 01: Node-Level Performance Engineering
Time: Sunday, June 22, 2014
09:00 am - 06:00 pm
Room:   Seminar Room 6/7
CCL - Congress Center Leipzig
Breaks:08:00 am - 10:30 am Welcome Coffee
Presenter:   Georg Hager, RRZE
  Jan Treibig, RRZE
  Gerhard Wellein, RRZE & University of Erlangen-Nuremberg
Abstract:   This tutorial covers performance engineering approaches on the compute node level. “Performance engineering” is more than employing tools to identify hotspots and blindly applying textbook optimizations. It is about developing a thorough understanding of the interactions between software and hardware. This process starts at the core, socket, and node level, where the code gets executed that does the actual “work.” Once the architectural requirements of a code are understood and correlated with performance measurements, the potential benefit of optimizations can often be predicted.
We start by giving an overview of modern processor and node architectures, including accelerators such as GPGPUs and Xeon Phi. Typical bottlenecks such as instruction throughput and data transfers are identified using kernel benchmarks and put into the architectural context. The impact of optimizations like SIMD vectorization, ccNUMA placement, and cache blocking is shown, and different aspects of a “holistic” node-level performance engineering strategy are demonstrated. Using the LIKWID multicore tools we show the importance of topology awareness, affinity enforcement, and hardware metrics. The latter are used to support the performance engineering process by supplying information that can validate or falsify performance models. Case studies on sparse matrix-vector multiplication and a conjugate gradient solver conclude the tutorial.

Content Level
25% Introductory, 50% Intermediate, 25% Advanced

Audience Prerequisites
Some knowledge about MPI and OpenMP, and some (basic) knowledge about typical processor and node architectures (cores, caches, sockets).