A novel architecture for ahead branch prediction
Wenbing JIN, Feng SHI, Qiugui SONG, Yang ZHANG
A novel architecture for ahead branch prediction
In theory, branch predictors with more complicated algorithms and larger data structures provide more accurate predictions. Unfortunately, overly large structures and excessively complicated algorithms cannot be implemented because of their long access delay. To date, many strategies have been proposed to balance delay with accuracy, but none has completely solved the issue. The architecture for ahead branch prediction (A2BP) separates traditional predictors into two parts. First is a small table located at the front-end of the pipeline, which makes the prediction brief enough even for some aggressive processors. Second, operations on complicated algorithms and large data structures for accurate predictions are all moved to the back-end of the pipeline. An effective mechanism is introduced for ahead branch prediction in the back-end and small table update in the front. To substantially improve prediction accuracy, an indirect branch prediction algorithm based on branch history and target path (BHTP) is implemented in A2BP. Experiments with the standard performance evaluation corporation (SPEC) benchmarks on gem5/SimpleScalar simulators demonstrate that A2BP improves average performance by 2.92% compared with a commonly used branch target buffer-based predictor. In addition, indirect branch misses with the BHTP algorithm are reduced by an average of 28.98% compared with the traditional algorithm.
branch prediction / branch speculation / branch target buffer / indirect branch / instruction pipeline
[1] |
Seznec A. The L-TAGE branch predictor. Journal of Instruction-Level Parallelism, 2007
|
[2] |
Srinivasam R, Frachtenberg E, Lubeck O. An idealistic Neuro-PPM branch predictor. Journal of Instruction-Level Parallelism, 2007
|
[3] |
Jimenez D A, Keckler S W, Lin C. The impact of delay on the design of branch predictors. In: Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture (MICRO’00). 2000, 67−76
|
[4] |
Burcea I, Moshovos A. Phantom-BTB: a virtualized branch target buffer design. In: Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’09). 2009, 313−324
CrossRef
Google scholar
|
[5] |
Jimenez D A. Reconsidering complex branch predictors. In: Proceedings of the 9th International Symposium on High-Performance Com puter Architecture (HPCA’03). 2003, 43−52
|
[6] |
Agarwal V, Hrishikesh M, Keckler S W. Clock rate versus IPC: the end of the road for conventional microarchitecture. In: Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA’00). 2000, 248−259
|
[7] |
Burcea I, Somogyi S, Moshovos A. Predictor virtualization. In: Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’ 09). 2008, 157−167
|
[8] |
Seznec A, Michaud P. A case for (partially)-tagged geometric history length predictors. Journal of Instruction-Level Parallelism
|
[9] |
Seznec A, Fraboulet A. Effective ahead pipelining of instruction block address generation. In: Proceedings of the 30th Annual International Symposium on Computer Architecture (ISCA’03). 2003, 241−252
|
[10] |
Seznec A, Felix S, Krishnan V. Design tradeoffs for the alpha EV8 conditional branch predictor. In: Proceedings of the 29th Annual International Symposium on Computer Architecture (ISCA’02). 2002, 295−306
CrossRef
Google scholar
|
[11] |
Santana O J, Ramirez A, Valero M. Latency tolerant branch predictors. In: Proceedings of Innovative Architecture for Future Generation High-performance Processors and Systems. 2003, 30−39
|
[12] |
Joao J A, Mutlu O, Kim H. Improving the performance of objectoriented languages with dynamic predication of indirect jumps. In: Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’ 08). 2008, 80−90
|
[13] |
Li T, Bhargava R, John L K. Adapting branch-target buffer to improve the target predictability of java code. ACM Transactions on Architecture and Code Optimization, 2005, 2(2): 109−130
CrossRef
Google scholar
|
[14] |
Joao J A, Mutlu O, Kim H. Dynamic prediction of indirect jumps. IEEE Computer Architecture Letters, 2007, 6(2): 25−28
CrossRef
Google scholar
|
[15] |
Binkert N, Beckmann B, Black G. The gem5 simulator. ACM SIGARCH Computer Architecture News, 2011, 39(2): 1−7
CrossRef
Google scholar
|
[16] |
Nathan B L, Ronald D G, Lisa H R. The M5 simulator: modeling networked Systems. IEEE Micro Magazine, 2006, 26(4): 52−60
CrossRef
Google scholar
|
[17] |
Milo M K, Daniel S J, Bradford B M. Multifacet’s general executiondriven multiprocessor simulator (GEMS) toolset. ACM SIGARCH Computer Architecture News, 2005, 33(4): 92−99
CrossRef
Google scholar
|
[18] |
Austin T, Larson E, Ernst D. SimpleScalar: an infrastructure for computer system modeling. IEEE Micro Magazine, 2002, 35(2): 59−67
|
[19] |
Guthaus M R, Ringenberg J S, Ernst D. MiBench: a free, commercially representative embedded benchmark suite. In: Proceedings of the 2001 IEEE International Workshop on Workload Characterization. 2001, 3−14
|
[20] |
Kim H, Joao J A, Mutlu O. VPC prediction: reducing the cost of indirect branches via hardware-based dynamic devirtualization. In: Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA’07). 2007, 424−435
|
/
〈 | 〉 |