by Jeff Andrews, application engineer, Intel Corp.
Proven techniques and Intel tools enable developers to minimize branch mispredictions and keep deep-pipelined processors fully utilized.
Modern microprocessors are pipelined in order to get more instructions completed faster. This means that instructions do not wait for the previous ones to complete before their execution begins. A problem with this approach arises, however, due to conditional branches. If the microprocessor encounters a conditional branch and the result for the condition has not yet been calculated, how does it know whether to take the branch or not? This is where branch prediction comes in.
Branch prediction is what the processor uses to decide whether to take a conditional branch or not. Getting this information as accurately as possible is important, as an incorrect prediction (mispredict) will cause the microprocessor to throw out all the instructions that did not need to be executed and start over with the correct set of instructions. This process is particularly expensive with deeply pipelined processors.
This article introduces the various branch-prediction methods used by the microprocessor and provides some tips about how to avoid costly mispredicts. The paper assumes that the reader is familiar with programming in C and with IA32 assembly-language instructions.
Branch examples
A pipelined processor makes it possible to begin execution of instructions before their predecessors are completed by breaking the instruction execution up into stages. When a conditional branch is encountered, the microprocessor uses branch prediction to determine which direction the branch will take. The following are examples of C language commands that cause conditional branches.
The first type of construction considered here that causes conditional branches is if-else :
![]()
![]() | Consolidating the IT infrastructure: leveraging Oracle9i Real Application Clusters on Intel to build an 'Adaptive Architecture' Intel Corp., Oracle Corp., and Cap Gemini Ernst & Young. Cap Gemi... |
![]() | Deploying Oracle9i Real Application Clusters on Intel® architecture-based clusters Information storage and retrieval lies at the heart of almost all core... |
![]() | De-Mystifying Software Performance Optimization by Paul Del Vecchio, senior performance analyst, Software and Soluti... |
If you're interested in this topic, these articles may be helpful:
![]() | Practical C++ Programming, 2nd Edition by Steve Oualline, O'Reilly Media Inc. C++ is a powerful, highly fl... |
![]() | Platform 2015 Software: Enabling Innovation in Parallelism for the Next Decade Driven by steady advances in microprocessor design and manufacturing t... |
![]() | More WorkSharing with OpenMP Abstract By Richard Gerber As you know, OpenMP* contains a very po... |
![]() | Getting started with SSE/SSE2 for the Intel Pentium 4 Processor Intel Corp. This paper teaches the programmer how to get started wi... |
![]() | For-loop threading methods by Jeff Andrews, application engineer, Intel Corp. Explore differen... |
![]()
Related Jobs:

