| Abstract: |
|
The data-movement cost over PCI Express is one of the biggest performance bottlenecks for accelerating data-intensive applications on traditional discrete GPU architectures. To address this bottleneck, AMD Fusion introduces a fused architecture that tightly integrates the CPU and GPU onto the same die and connects them with a high-speed, on-chip, memory controller. This novel architecture incorporates a shared memory-space between and the CPU and GPU, thus enabling several novel inter-device data-transferring techniques that are not available on discrete architectures. For instance, a kernel running on the GPU can now directly access a CPU-resident memory buffer and vice versa. In this paper, we seek to understand the implication of the fused architecture on CPU-GPU heterogeneous computing by systematically characterizing several data-intensive kernels on two generations of AMD Fusion, (i.e., Zacate E350 and Llano A8-3850). Our study reveals that the fused architecture is very promising for leveraging the massive compute power of GPUs to accelerate data-intensive applications when compared to discrete architectures. |
|