Impacts of optimization strategies on performance, power/energy consumption of a GPU based parallel reduction

Thi Yen Phuong , Deok-Young Lee , Jeong-Gun Lee

Journal of Central South University ›› 2017, Vol. 24 ›› Issue (11) : 2624 -2637.

PDF
Journal of Central South University ›› 2017, Vol. 24 ›› Issue (11) : 2624 -2637. DOI: 10.1007/s11771-017-3676-5
Article

Impacts of optimization strategies on performance, power/energy consumption of a GPU based parallel reduction

Author information +
History +
PDF

Abstract

In the era of modern high performance computing, GPUs have been considered an excellent accelerator for general purpose data-intensive parallel applications. To achieve application speedup from GPUs, many of performance-oriented optimization techniques have been proposed. However, in order to satisfy the recent trend of power and energy consumptions, power/energy-aware optimization of GPUs needs to be investigated with detailed analysis in addition to the performance-oriented optimization. In this work, in order to explore the impact of various optimization strategies on GPU performance, power and energy consumptions, we evaluate performance and power/energy consumption of a well-known application running on different commercial GPU devices with the different optimization strategies. In particular, in order to see the more generalized performance and power consumption patterns of GPU based accelerations, our evaluations are performed with three different Nvdia GPU generations (Fermi, Kepler and Maxwell architectures), various core clock frequencies and memory clock frequencies. We analyze how a GPU kernel execution is affected by optimization and what GPU architectural factors have much impact on its performance and power/energy consumption. This paper also categorizes which optimization technique primarily improves which metric (i.e., performance, power or energy efficiency). Furthermore, voltage frequency scaling (VFS) is also applied to examine the effect of changing a clock frequency on these metrics. In general, our work shows that effective GPU optimization strategies can improve the application performance significantly without increasing power and energy consumption.

Keywords

parallel reduction / GPU / code optimization / power / energy / voltage frequency scaling

Cite this article

Download citation ▾
Thi Yen Phuong, Deok-Young Lee, Jeong-Gun Lee. Impacts of optimization strategies on performance, power/energy consumption of a GPU based parallel reduction. Journal of Central South University, 2017, 24(11): 2624-2637 DOI:10.1007/s11771-017-3676-5

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

StrattonJ A, AnssariN, RodriguesC, IJ S, ObeidN, ChangL W, LiuG D, HwuWOptimization and architecture effects on GPU computing workload performance [C]//Innovative Parallel Computing (InPar), 2012, San Jose, USA, IEEE: 110

[2]

RyooS, RodriguesC I, BaghsorkhiS S, StoneS S, KirkD B, HwuW WOptimization principles and application performance evaluation of a multithreaded GPU using CUDA [C]//Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '08), 2008, Utah, USA, ACM: 7382

[3]

JangB, DoS, PienH, KaeliDArchitecture-aware optimization targeting multithreaded stream computing [C]//Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units-GPGPU-2, 2009, Washington DC, USA, ACM: 6270

[4]

JangB, SchaaD, MistryP, KaeliD. Exploiting memory access patterns to improve memory performance in data-parallel architectures [J]. IEEE Transactions on Parallel and Distributed Systems, 2011, 22(1): 105-118

[5]

MeiX, ZhaoK, LiuC, ChuXBenchmarking the memory hierarchy of modern GPUs [M], 2014, Heidelberg, Springer Berlin: 144156

[6]

SudaR, RenDAccurate measurements and precise modeling of power dissipation of CUDA kernels toward power optimized high performance CPU-GPU computing [C]//The Tenth International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), 2009, Hiroshima, Japan, IEEE

[7]

CalandriniG 1, GardelA, BravoI, RevengaP, LázaroJ L, Toled-MoreoF J. Power measurement methods for energy efficient applications [J]. Sensors, 2013, 13(6): 7786-7796

[8]

DasguptaA, HongS, KimH, ParkJA new temperature distribution measurement method on GPU architectures using thermocouples [R], 2012

[9]

LangJ, RüngerGHigh-resolution power profiling of GPU functions using low-resolution measurement [C]//19th International Conference on Parallel Processing (Euro-Par 2013), 2013, Aachen, Germany, Springer-Verlag Berlin: 801812

[10]

CollangeS, DefourD, TisserandAPower consumption of GPUs from a software perspective [C]//ICCS '09 Proceedings of the 9th International Conference on Computational Science, 2009, LA, USA, Springer-Verlag Berlin: 914923

[11]

PhuongT Y, LeeJ GSoftware based ultrasound B-mode/beamforming optimization on GPU and its performance prediction [C]//21th IEEE International Conference on High Performance Computing, 2014, Goa, India, IEEE: 110

[12]

JiaoY, LinH, BalajiP, FengWPower and performance characterization of computational kernels on the GPU [C]//IEEE/ACM International Conference on Green Computing and Communications and International Conference on Cyber, Physical and Social Computing, 2010, Hangzhou, China, IEEE: 221228

[13]

HongSModeling performance and power for energy-efficient GPGPU computing [D], 2012, Georgia, Georgia Institute of Technology

[14]

HongS, KimH. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness [J]. In ACM SIGARCH Computer Architecture News, 2009, 37: 152-163

[15]

HongS, KimH. An integrated GPU power and performance model [J]. In ACM SIGARCH Computer Architecture News, 2010, 38: 280-289

[16]

KasichayanulaK, TerpstraD, LuszczekP, TomovS, MooreS, PetersonG DPower aware computing on GPUs [C]//Symposium on Application Accelerators in High Performance Computing, 2012, Illinois, USA, IEEE: 6473

[17]

AbeY, SasakiH, KatoS, InoueK, EdahiroM, PeresMPower and performance characterization and modeling of GPUaccelerated systems [C]//IEEE 28th International Symposium on Parallel and Distributed Processing, 2014, Arizona, USA, IEEE: 113122

[18]

AbeY, SasakiH, PeresM, InoueK, MurakamiK, KatoSPower and performance analysis of GPU-accelerated systems [C]//Proceedings of the ACM Workshop on Power-Aware Computing and System, 2012, California, USA, ACM

[19]

MeiX-x, YungL-s, ZhaoK-y, ChuX-wenA measurement study of GPU DVFS on energy conservation [C]//Proceedings of the ACM Workshop on Power-Aware Computing and System, 2013, Pennsylvania, USA, ACM

[20]

RongG E, VogtR, MajumderJ, AlamA, BurtscherM, ZongZ-liangEffects of dynamic voltage and frequency scaling on a K20 GPU [C]//Parallel Processing (ICPP), 2013 42nd International Conference, 2013, Lyon, France, IACC: 826833

[21]

UkidaveY, ZiabariA K, MistryP, SchirnerG, KaeliD. Analyzing power efficiency of optimization techniques and algorithm design methods for applications on heterogeneous platforms [J]. International Journal of High Performance Computing Applications, 2014, 28(3): 319-334

[22]

CoplinJ, BurtscherMEffects of source-code optimizations on GPU performance and energy consumption [C]//Proceedings of the 8th Workshop on General Purpose Processing using GPUs, 2015

[23]

HarrisMOptimizing parallel reduction in CUDA, nvidia developer technology [EB/OL], 2007

[24]

NVIDIA [EB/OL]. [2017]. http://www.geforce.com/hardware/ desktop-gpus/geforce-gtx-titan-x/specifications.

[25]

HarrisM5 things you should know about the new maxwell GPU architecture [EB/OL], 2014

AI Summary AI Mindmap
PDF

137

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/