Energy optimization of representative barrier algorithms

Juan Chen; Yong Dong

doi:10.1007/s11771-012-1348-z

Journal of Central South University ›› 2012, Vol. 19 ›› Issue (10) :2823 -2831. DOI: 10.1007/s11771-012-1348-z

Article

Energy optimization of representative barrier algorithms

Juan Chen ¹^,^a
, Yong Dong ¹

Author information +

History +

PDF

Abstract

Too high energy consumption is widely recognized to be a critical problem in large-scale parallel computing systems. The LogP-based energy-saving model and the frequency scaling method were proposed to reduce energy consumption analytically and systematically for other two representative barrier algorithms: tournament barrier and central counter barrier. Furthermore, energy optimization methods of these two barrier algorithms were implemented on parallel computing platform. The experimental results validate the effectiveness of the energy optimization methods. 67.12% and 70.95% energy savings are obtained respectively for tournament barrier and central counter barrier on platforms with 2048 processes with 1.55%-8.80% performance loss. Furthermore, LogP-based energy-saving analytical model for these two barrier algorithms is highly accurate as the predicted energy savings are within 9.67% of the results obtained by simulation.

Keywords

energy saving / tournament barrier / central counter barrier / LogP / Open MPI

Cite this article

Download citation ▾

Juan Chen, Yong Dong. Energy optimization of representative barrier algorithms. Journal of Central South University, 2012, 19(10): 2823-2831 DOI:10.1007/s11771-012-1348-z

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	YelickK.. Ten ways to waste a parallel computer [C]. Proceedings of the 36th Annual International Symposium on Computer Architecture, 2009Austin, TX, USAACM1

[2]	LiD., de SupinskiB., SchulzM., CameronK., NikolopoulosD. S.. Hybrid MPI/OpenMP power-aware computing [C]. Proceedings of 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2010Atlanta, GAIEEE Press1-12

[3]	Pjesivac-grbovicJ., AngskunT., BosilcaG., FaggG. E., GabrielE., DongarraJ. J.. Performance analysis of MPI collective operations [C]. Cluster Computing-07, 2007Hingham, MA, USAKluwer Academic Publishers127-143

[4]	YewP. C., TzengN. F., LawrieD. H.. Distributing hot-spot addressing in large scale multiprocessors [J]. IEEE Transactions on Computers, 1987, 36(4): 388-395

[5]	HensgenD., FinkelR., ManberU.. Two algorithms for barrier synchronization [J]. Int J Parallel Program, 1988, 17(1): 1-17

[6]	FREUDENTHAL E, GOTTLIEB A. Process coordination with fetch-and-increment [C]// ASPLOS-IV: Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems: ACM Press, 1991: 260–268.

[7]

GOODMAN J R, VERNON M K, WOEST P J. Efficient synchronization primitives for large-scale cache-coherent multiprocessors [C]// ACM SIGARCH Computer Architecture News-Special Issue: Proceedings of ASPLOS-III: the Third International Conference on Architecture Support for Programming Languages and Operating Systems, 1989: 64–75.

[8]	BrooksE. D.. The butterfly barrier [J]. International Journal of Parallel Programming, 1986, 15(4): 295-307

[9]	CullerD., KarpR., PattersonD., SahayA., SchauserK. E., SantosE., SubramonianR.. Eicken T von. LogP: Towards a realistic model of parallel computation [C]. Proceedings of the 4th ACM SIGPLAN Symposium on Principles and Practice of Parrallel Programming, 1993New YorkACM1-12

[10]	HoeflerT., CerquettiL., MehlanT., MietkeF., RehmW.. A practical approach to the rating of barrier algorithms using the LogP model and open MPI [C]. Proceedings of the 2005 International Conference on Parallel Processing Workshops, 2005Washington DCIEEE Computer Society562-569

[11]	Open MPI, Open source high performance computing [EB/OL]. [2012-09-10]. http://www.open-mpi.org/.

[12]	NANJEGOWDA R, HERNANDEZ O, CHAPMAN B. Scalability evaluation of barrier algorithms for openMP [C]// Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism. Dresden, Germany. Springer-Verlag. 2009: 42–52.

[13]	HOEFLER T, MEHLAN T, MIETKE F, REHM W. Fast barrier synchronization for InfiniBand [C]// Proceedings of the 20th IEEE International Parallel & Distributed Processing Symposium (IPDPS’06), CAC’06 Workshop. Greece. IEEE. 2006: 272–280.

[14]	LiJ., MartJ. F., HuangM. C.. The thrifty barrier: Energy-aware synchronization in shared-memory multiprocessors [C]. Proceedings of International Symposium on High-Performance Computer Architecture, 2004Madrid, SpainIEEE Computer Society14-23

[15]	LiJ., MartinezJ. F.. Power-performance implications of thread-level parallelism on chip multiprocessors [C]. Proceedings of Symposium on Performance Analysis of Systems and Software (ISPASS’05), 2005Austin, TXIEEE124-134

[16]	LiJ., MartJ. F.. Power-performance considerations of parallel computing on chip multiprocessors [J]. ACM Trans Archit Code Optim, 2005, 2(4): 397-422

[17]	GOLUBEV O, LOGH M, PONCINO M. On the energy efficiency of synchronization primitives for shared-memory single-chip multiprocessors [C]// Proceedings of the 17th ACM Great Lakes Symposium on VLSI. Stresa-Lago Maggiore, Italy: ACM. 2007: 489–492.

[18]	FerrC., BahaR. I., LoghM., PoncinoM.. Energy-optimal synchronization primitives for single-chip multi-processors [C]. Proceedings of the 19th ACM Great Lakes Symposium on VLSI, 2009Boston Area, MA, USAACM141-144

[19]	VillO., PalermG., SilvanoC.. Efficiency and scalability of barrier synchronization on NoC based many-core architectures [C]. Proceedings of the 2008 International Conference on Compilers, Architectures and Synthesis for Embedded Systems, 2008Atlanta, GA, USAACM81-90

[20]	KandemiM., SonS. W.. Reducing power through compiler-directed barrier synchronization elimination [C]. Proceedings of the 2006 International Symposium on Low Power Electronics and Design, 2006Tegernsee, Bavaria, GermanyACM354-357

[21]	KappiahN., FreehV. W., LowenthalD. K.. Just in time dynamic voltage scaling: Exploiting inter-node slack to save energy in MPI programs [C]. Proceedings of the ACM/IEEE SC2005 Conference on High Performance Networking and Computing (SC’05), 2005Seattle, WA, USAIEEE Computer Society12-18

[22]	RountreeB., LownenthalD. K., SupinskiB. R., SchulzM., FreehV. M., BletschT.. Adagio: Making DVS practical for complex HPC applications [C]. Proceedings of the 23rd international conference on Supercomputing, 2009Yorktown Heights, NY, USAACM460-469

[23]	Intel® Xeon® Processor X5670 (12M Cache, 2.93 GHz, 6.40 GT/s Intel® QPI) [EB/OL]. [2012-09-10]. http://ark.intel.com/products/47920/Intel-Xeon-Processor-X5670-12M-Cache-2_93-GHz-6_40-GTs-Intel-QPI#infosectioessentials.

[24]	XieMin.Research and implementation of high-availability MPI parallel programming environment and parallel programming methods [D], 2007ChangshaSchool of Computer, National University of Defense Technology19-23