June 22–26, 2014
Leipzig, Germany

Presentation Details

Name: (19a) Large-Scale Multi-Level Sorting for GPU-Based Heterogeneous Architectures
Time: Thursday, June 26, 2014
10:30 am - 11:00 am
Room:   Hall 4
CCL - Congress Center Leipzig
Breaks:10:30 am - 11:00 am Coffee Break
07:30 am - 10:30 am Welcome Coffee
Presenter:   Hideyuki Shamoto, Tokyo Institute of Technology
Abstract:   Sorting is widely used in many applications such as database and MapReduce operations in SNS networks and IoT (Internet of Things), etc. The recent growth of data in these application fields requires us to conduct ultra fast sort- ing of extremely large-scale data collections. On the other hand, thanks to the high memory bandwidth of GPU devices, supercomputers with heterogeneous architectures, such as TSUBAME2.5, Titan, etc., have recently emerged in the Top500 list. Use of these new commodity-based GPU devices in large-scale sys- tems may drastically accelerate sorting performance to extremely large-scale data collections. However, efficient implementation of parallel sorting with de- tailed performance analysis in large-scale heterogeneous systems have not been investigated. In order to clarify performance bottlenecks of the existing parallel sorting algorithms, we first conducted preliminary performance studies to the HykSort and PSort algorithms on TSUBAME2.5. The results indicate that both sorting methods have the same bottleneck, and the bottleneck is local sort which is a part of parallel sorting. In order to remove the bottleneck of parallel sorting, we offload the local sort phase to the GPUs. As a result, we achieved performance improvements of up to 4.6x and proved the effectiveness of using GPU memory in parallel sorting.

Hideyuki Shamoto, Koichi Shirahata, Aleksandr Drozd, Hitoshi Sato & Satoshi    Matsuoka, Tokyo Institute of Technology