|Name:||Algorithms & Analysis
(6) Dynamically Executing Array Operations on GPGPU
|Time:||Monday, June 18, 2012
3:00 PM - 8:30 PM
|Room:||Hall H, #911
CCH - Congress Center Hamburg
|Speakers:||Ashish Kumar Agarwal, Indian Institute of Technology Kanpur|
|Abstract:||GPGPUs are well suited for HPC as they outperform CPUs for a large number of applications. However, the entry barrier of programming GPGPUs is high due to the complexity of the currently available programming models. Also, the wide variety of available hardware requires different optimizations for different devices. The required in-depth understanding of the architecture for optimal performance is too much of a digression for most researchers from their domain areas.
We have built a C++ library and runtime environment for Nvidia GPGPUs that addresses these issues. The library provides arrays as objects and element-wise array operations as methods on these objects and abstracts away hardware details. The programmer is only required to rewrite existing C++ program using our API, which is a straightforward process. At runtime the library queries hardware configuration and decides memory layout for data with optimizations like memory access vectorization, coalescing and bank conflict elimination. It dynamically compiles parallelizable code into cuda assembly with heuristics for work division for close to optimal performance. As all the optimizations are performed at runtime, there is no need for recompilation for different architectures, making GPGPU programs write once, run optimally everywhere.
We ran tests that compare fine-tuned CPU and GPGPU versions of certain applications against code written using our library. The tests show that our library performs within 1x-5x time of such cuda code for both high-end and low-end GPGPU devices and vastly outperform the CPU versions run on all cores of high-end multi-core CPUs.