by Phil Kerly, senior software engineer, Software and Solutions Group, Intel Corp.
Hyper-Threading Technology-enabled processors contain multiple logical processors per physical processor package. The state information necessary to support each logical processor is replicated while sharing the underlying physical processor resources. Given that processor resources are generally underutilized by most applications, Hyper-Threading Technology-enabled processors can improve overall application performance. Multiple threads running in parallel can achieve higher utilization and increased throughput. Since logical processors share underlying physical processor resources, interactions between multiple threads utilizing those resources can have synergistic affects.
The various levels of cache are one such shared physical processor resource. This paper describes the various levels of cache within Intel NetBurst® microarchitecture, how effective use of data locality using a cache data blocking technique can improve application performance, and how to optimize this technique for Hyper-Threading Technology-enabled processors. Note that this paper is not targeted to a specific processor so various cache sizes and specific implementations described in this document are subject to change in future processors. At the time this article is being published, Hyper-Threading Technology is enabled only in Intel® Xeon® processors, but Intel announced at the Spring Intel Developer Forum that this technology will be available on desktop processors sometime in 2003.
Intel NetBurst Micro-architecture: cache
Intel NetBurst microarchitecture has two levels of cache (see Figure 1). Note that the Intel Xeon processor MP has an additional third-level cache, not shown in the figure. The first level cache is separated into an execution trace cache and a first level data cache. The execution trace cache stores previously decoded micro-ops to remove decoder latencies from main execution loops. The trace cache, as well as the first level data cache, is closely coupled to the processor pipeline. The second level Advanced Transfer Cache ensures a steady supply of instructions to the front-end and data to the execution pipeline through the first level data cache.

Figure 1: Intel NetBurst MicroArchitecture
![]()
If you're interested in this topic, these articles may be helpful:
![]() | Tutorial: Introduction to Hyper-Threading Technology (online) Tutorial Description: This Intel tutorial details Hyper-Threading (HT)... |
![]() | Online: HT Technology on the Desktop This course describes Intel® Hyper-Threading Technology and introd... |
![]() | Data placement in threaded programs by Andrew Binstock, principal analyst, Pacific Data Works LLC. Intel C... |
![]() | Multithreaded game programming and Hyper-Threading Technology by Will Damon. Intel Corp. Multiprocessor machines are becoming mor... |
![]() | Threading Games for High Performance on Intel Processors The evolution of the multi-threaded processor design is the trend for ... |
![]()
Related Jobs:

