Energy optimization of representative barrier algorithms
Juan Chen , Yong Dong
Journal of Central South University ›› 2012, Vol. 19 ›› Issue (10) : 2823 -2831.
Energy optimization of representative barrier algorithms
Too high energy consumption is widely recognized to be a critical problem in large-scale parallel computing systems. The LogP-based energy-saving model and the frequency scaling method were proposed to reduce energy consumption analytically and systematically for other two representative barrier algorithms: tournament barrier and central counter barrier. Furthermore, energy optimization methods of these two barrier algorithms were implemented on parallel computing platform. The experimental results validate the effectiveness of the energy optimization methods. 67.12% and 70.95% energy savings are obtained respectively for tournament barrier and central counter barrier on platforms with 2048 processes with 1.55%-8.80% performance loss. Furthermore, LogP-based energy-saving analytical model for these two barrier algorithms is highly accurate as the predicted energy savings are within 9.67% of the results obtained by simulation.
energy saving / tournament barrier / central counter barrier / LogP / Open MPI
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
FREUDENTHAL E, GOTTLIEB A. Process coordination with fetch-and-increment [C]// ASPLOS-IV: Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems: ACM Press, 1991: 260–268. |
| [7] |
GOODMAN J R, VERNON M K, WOEST P J. Efficient synchronization primitives for large-scale cache-coherent multiprocessors [C]// ACM SIGARCH Computer Architecture News-Special Issue: Proceedings of ASPLOS-III: the Third International Conference on Architecture Support for Programming Languages and Operating Systems, 1989: 64–75. |
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
Open MPI, Open source high performance computing [EB/OL]. [2012-09-10]. http://www.open-mpi.org/. |
| [12] |
NANJEGOWDA R, HERNANDEZ O, CHAPMAN B. Scalability evaluation of barrier algorithms for openMP [C]// Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism. Dresden, Germany. Springer-Verlag. 2009: 42–52. |
| [13] |
HOEFLER T, MEHLAN T, MIETKE F, REHM W. Fast barrier synchronization for InfiniBand [C]// Proceedings of the 20th IEEE International Parallel & Distributed Processing Symposium (IPDPS’06), CAC’06 Workshop. Greece. IEEE. 2006: 272–280. |
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
GOLUBEV O, LOGH M, PONCINO M. On the energy efficiency of synchronization primitives for shared-memory single-chip multiprocessors [C]// Proceedings of the 17th ACM Great Lakes Symposium on VLSI. Stresa-Lago Maggiore, Italy: ACM. 2007: 489–492. |
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
Intel® Xeon® Processor X5670 (12M Cache, 2.93 GHz, 6.40 GT/s Intel® QPI) [EB/OL]. [2012-09-10]. http://ark.intel.com/products/47920/Intel-Xeon-Processor-X5670-12M-Cache-2_93-GHz-6_40-GTs-Intel-QPI#infosectioessentials. |
| [24] |
|
/
| 〈 |
|
〉 |