- Extending Xen* with Intel® Virtualization Technology
- ENERGY STAR* System Implementation
- Competitive Comparison: Dual-Core Intel® Xeon®: Processor-based Platforms vs. AMD Opteron*
- CMP Implementation in Systems Based on the Intel® Core™ Duo processor
- Software Company Plans for Multi-Core: How Epic Games, Adobe Systems, and IBM use Multi-Core Capability
- How to use all of CPUID for x64 platforms under Microsoft Visual Studio .NET 2005
- Flash and .NET Integration using ASP.NET
- Build and consume an ASP.NET Web service
- Multithreaded .NET Web service clients: threads and responsiveness
- High performance image processing and visualization in .NET client applications: Intel Integrated Performance Primitives (IPP)
Welcome to the Intel® Software Dispatch Subscription Program
by Andrew Binstock, principal analyst, Pacific Data Works LLC. Intel Corp.
The previous two installments of this column have discussed adjustments that programmers need to make when moving from single-threaded to multi-threaded programming. I paid specific attention to the aspects that apply to Hyper-Threading Technology, which allows multiple threads to execute on the same physical processor. In the first installment I discussed the need for the new conception of program and data design based around threads as units of execution. Then, in the second installment I presented an overview of the challenges caused by thread interactions, with specific focus on how operating system APIs and special processor instructions are effective ways of synchronizing threads if done correctly. If done incorrectly, however, I noted these techniques are punished severely by the system with poor performance. The key is to know what you're doing and to thoroughly test performance of the finished software.
While that discussion focused on thread synchronization via synchronization variables, it is here expanded to the problem of data that is modified by threads on different physical processors.
Sharing data between physical processors
Threads on separate physical processors that modify common data encounter both physical and programmatic challenges. The programmatic challenge arises from a thread's need to control the data in order to modify it. If several threads are accessing the same data item, the thread must observe an operating-system–specific protocol for getting solo access to the data item in order to update it. All modern operating systems have extensive tools and APIs that enable programmers to accomplish this task. They rely on mutexes, locks, and critical sections—all concepts that manage access among threads—to handle the problem. Because these tools operate at the operating-system level, they tend to be implemented differently, and the literature specific to the given operating system should be consulted. Some useful references are included at the end of this article.
However, while operating system APIs for handling shared data are effective, they tend to be slow. In addition, the necessary interactions between threads can be complicated and it is easy to have threads affect each other detrimentally. Hence, programmers are well advised to redesign programs, even change algorithms to minimize sharing of modifiable data.
The silicon-level challenge caused by this sharing is the effect that data updates have on the individual processors. This problem is distinct from the ownership contention just discussed. When a shared data item is modified, all processors that have a copy of the item in cache are notified their copy is invalid, and they are required to refresh their cache with the new data. This activity ensures cache coherency—a principle that obliges all cache instances of a data item on the system to be identical.
The overhead caused by cache coherency is substantial and can be reduced only by diminishing its frequency. One way that works in a subset of cases is to have each thread maintain its own copy of the data and update the common data only when necessary. For example, the common data might contain fields of varying importance, such that some fields do not have to be updated immediately upon modification. In such cases, the private data can hold the updated version and refresh the common data all at once when it is imperative to do so.
The problem of cache coherency, however, can have a costly side effect that occurs even when two programs are not sharing common data. This problem is called false sharing.
False Sharing
On Pentium® 4 and Xeon™ processors, the minimum size of cache blocks is 128 bytes. (The term cache line is frequently used to refer to the minimum block size loaded into cache.) This block concept means that when a variable is loaded into cache, the entire 128-byte block in which it's located is moved into cache. The cache itself does not know or care which specific bytes of the 128 are to be used by the thread; the cache simply fetches the whole block.
False sharing occurs when two threads store data items private to each of them in the same 128-byte data block. Since the block with these different variables is read as a whole into cache, each time one thread updates its own variable the entire block is updated in all the various caches. As a result, a processor is forced to update the block in its cache even though the data element it's interested in has not changed. This activity is caused by the purely accidental placement of thread 1's variable in the same cache block that contains thread 2's data.
The penalty for false sharing can be extremely high. In many cases it can degrade performance by as much as 50%. To solve the problem of false sharing, programmers should make sure that frequently used data items assigned to different threads are separated in memory by more than the size of a cache line, that is by more than 128 bytes for Intel® Pentium 4 and Intel Xeon processors. Padding out the space is the simple way to guarantee this.
As described here, false sharing is primarily a concern when separate physical processors share the same cache line. However, on Hyper-Threading Technology enabled systems—where two threads share the same cache—the problem exists in a different form. Due to the way that the Intel NetBurst™ microarchitecture sequences the reads and writes as it speculatively executes instructions, false sharing can induce a "machine clear" exception on Hyper-Threading Technology enabled systems. This event forces the processor to retire all pending instructions and restart the execution pipeline. As we have seen in previous columns, restarting an execution pipeline is an expensive proposition that causes an immediate drag on performance. Data from Intel suggests that false sharing in such systems can cause a tenfold increase in machine clear counts. Avoid false sharing at all costs.
![]()
If you're interested in this topic, these articles may be helpful:
![]() | Multithreaded Technology and Multi-Core Processors Introduction by Craig Szydlowski Infrastructure Processor Divi... |
![]() | Getting Started with OpenMP* Abstract By Richard Gerber As you probably know by now, to get the m... |
![]() | Backgrounder: Intel Government-Assisted PC Programs Introduction Increasingly, governments are realising that Information... |
![]() | Threading Games for High Performance on Intel Processors The evolution of the multi-threaded processor design is the trend for ... |
![]() | Transforming Education with Technology: An overview of Intel’s Worldwide Programs Intel: More Than 35 Years of Revolutionary Technology By enabling ... |
![]()
Related Jobs:
![]() | Security And Development Manager #951 - NJ - Montvale - RCG Information Technology Description: All applicants must have a minimum of 3 years IT Indus... |
![]() | SPADOC/SDS Subject Matter Expert (IBM/UNIX) #361276 - CA - Lompoc - Zel Technologies, LLC Placement will be immediate with job candidates required to report w... |
![]() | Enterprise Architect #508082 - VA - Chantilly - BAE Systems Description: Act in the capacity of the BAE SYSTEMS Technical Lead; r... |
![]() | System Program Analyst (ISC2 / NASA / GSFC) #358349 - CA - Lompoc - Zel Technologies, LLC Collect and analyze satellite catalog data to determine the qualit... |
![]() | Data Architect #3101985 - TX - Houston - Ajilon Description : Ajilon Consulting, a leader in the placement of Informa... |
![]() | Business Analyst - PI #972 - MO- Fenton - Maritz Inc. This position will report to the Director, Marketing and Business Inte... |
![]() | Sr. J2EE/Java Developer #3115096 - TX - Dallas - Ajilon Description : GREAT PAY - FULL BENEFITS MUST BE US CITIZEN or... |
![]() | Senior Search Development Engineer #A3E58343BFB3E61D - NJ - Edison - Ask Jeeves,Inc. Title Senior Search Development Engineer City Edison State NJ Zi... |
![]() | Biztalk Technical Project Manager/Lead #952 - NJ - Montvale - Radiant Systems, Inc Description: All applicants must have a minimum of 3 years IT Indus... |
![]() | Sr. J2EE/Java Developer #3265527 - CA - Irvine - Ajilon Description : OUTSTANDING PAY - FULL BENEFITS MUST BE US CITI... |


