Introduction
By Khang Nguyen
Contributors: Bob Valentine, Erik Niemeyer, Paul Lindberg
Currently, optimizing applications for a desktop platform is not the same as doing it for the mobile platform due to differences in the usage models for each platform. Intel® Core™ microarchitecture combines the best of the desktop Intel NetBurst® microarchitecture and mobile Pentium® architecture. As Intel will be using a single architecture for both the desktop and mobile platforms, the challenge is how to prepare your applications so that they can run well on Intel Core microarchitecture. What can we do with existing and new desktop and mobile applications to make them ready when the new Intel processors hit the market? This paper is not intended to show users everything they can do to improve the performance of existing applications on the Intel Core microarchitecture. It only suggests some techniques to either improve or maintain the performance of an existing application when running on systems with these new Intel® processors.
Techniques
Cache
Data in cache is accessed much faster than that in the main memory. Therefore, always try to load data in cache as much as possible. One of the features of Intel® Core™ microarchitecture is that level 2 cache is shared among cores. The primary benefit of a shared L2 is L2 data-sharing between threads running on different cores on the same die. This necessitates reevaluation of the mapping of hot data sections to threads in an application to ensure maximum hits in the L2. The other advantage of shared L2 cache is that if one core is disabled, the remaining core can make use of the full L2 cache. In order to get the number of threads that share the level cache, you need to execute instruction cpuid with eax = 4 and ecx = 0, 1, 2... (0, 1, 2 corresponding to the cache level 1, 2, and 3 if it exists, respectively.) The number of threads will be obtained by adding 1 to the return value in eax[25:14].
Note that in some systems, there is an option in the bios to toggle the “maximum input value”—you need to disable it. This option is used to limit the maximum value returned by executing cpuid with eax=0 to 3. This option is needed to boot Windows* NT 4.0. Without setting the limit of this value, Windows NT will hang up with a blue screen (screen of death). With this setting, executing cpuid with eax=4 will result in an error. Make sure that the "maximum input value" is the same in all processors.
![]()
If you're interested in this topic, these articles may be helpful:
![]() | Creating C# wrappers for Intel Integrated Performance Primitives using Microsoft .NET interoperability mechanisms from Intel Corp. One of the beauties of .NET is that an object wri... |
![]() | Getting started with SSE/SSE2 for the Intel Pentium 4 Processor Intel Corp. This paper teaches the programmer how to get started wi... |
![]() | Accelerating .NET applications with the Intel VTune Performance Analyzer 6.1 by Alan Zeichick, The Intel VTune Performance Analyzer is more th... |
![]() | Intel Itanium microarchitecture support for .NET and Java by Matt Gillespie, technical author and editor. Intel Corp. The Int... |
![]() | Using Intel C++ Compiler 8.0 with the Eclipse Integrated Development Environment on Linux systems Intel Corp. The Eclipse Project is an open-source software-developm... |
![]()
Related Jobs:

