- Extending Xen* with Intel® Virtualization Technology
- ENERGY STAR* System Implementation
- Competitive Comparison: Dual-Core Intel® Xeon®: Processor-based Platforms vs. AMD Opteron*
- CMP Implementation in Systems Based on the Intel® Core™ Duo processor
- Software Company Plans for Multi-Core: How Epic Games, Adobe Systems, and IBM use Multi-Core Capability
- How to use all of CPUID for x64 platforms under Microsoft Visual Studio .NET 2005
- Flash and .NET Integration using ASP.NET
- Build and consume an ASP.NET Web service
- Multithreaded .NET Web service clients: threads and responsiveness
- High performance image processing and visualization in .NET client applications: Intel Integrated Performance Primitives (IPP)
Welcome to the Intel® Software Dispatch Subscription Program
by Knud J. Kirkegaard, principal engineer, Paul Winalski, senior software engineer, and David C. Sehr, compiler architect, Intel Corp.
Many applications developed for Linux suffer performance degradation from a little-known and even less frequently used feature. That feature is symbol preemption, which is used by some developers of shared objects. Linux implements the Executable and Linking Format (ELF) object file format, which provides options to control the impact of symbol preemption. This paper describes the use of compiler options for the Intel® compiler and GCC that enable full use of the ELF symbol visibility features. This common set of compiler directives and command-line options was developed in collaboration with the GCC team at Red Hat and will be implemented in GCC from version 3.5. Our experience with several large applications running on the Intel Itanium® Processor Family has shown that very significant performance gains can be had with relatively little change to the customer application and build environment. What follows consists of four sections. The first discusses the class of applications that might encounter the overhead from preemption. The second describes the user model and options used to control preemption on a symbol-by-symbol basis. The third section presents an example and describes how performance is improved. The final section presents some conclusions.
Definitions
At run time, an application consists of one or more files that are mapped into a process address space by the runtime loader. Each distinct file is called a component of the application. There are two types of components. There is always one file that is the first one loaded for an application. This file is the main program component and there is always exactly one of them. Usually there are other components that are loaded with the application called shareable objects. As the name implies, a shareable object may be a component of more than one program. An example is libc.so, which is the shareable object version of the C run-time library.
A symbol is a name that represents a numeric value defined in an object file or a component file. Symbols typically represent the addresses of data items or routines. The linker (ld) is the program that builds components from object files produced by a compiler or by the assembler. One of the linker's main jobs is to resolve symbolic references between the object files that comprise a component. References to symbols in other components are resolved at execution time by the runtime loader (ld.so).
One key feature of symbol resolution on Linux is symbol preemption. By default, all global symbols in a component are visible to all other components. When the runtime loader loads a component, if the new component defines a symbol that already exists in a previously-loaded component, the definition in the new component is overridden (preempted) by the existing definition. The runtime loader re-binds references to the symbol in the new component to refer to the existing definition. Thus, if the runtime loader is loading component x.so that defines a routine foo(), and a previously-loaded component of the application has already defined foo(), calls to foo() in x.so are modified to call the existing definition in foo(), not the one in x.so. Note that symbols defined in the main program of an application cannot be preempted, since the main program is always loaded first.
Because of symbol preemption, the final value of a global symbol might not be known until run time. This inhibits many useful code optimizations. Fortunately, as we shall see, there are several techniques available to avoid the performance penalties that symbol preemption imposes.
The global offset table (GOT) is a data structure that contains a list of addresses of symbols in a component. All references to symbols that are preemptable must be made indirectly, by first loading the symbol's address from the GOT. This allows the runtime loader to preempt all of the component's references to a symbol simply by changing the value in the symbol's GOT entry.
Analysis to Identify Performance Opportunities
In general, applications that frequently call non-preemptable global functions or reference non-preemptable global static data are the most heavily impacted by preemption overheads. To diagnose whether your application is incurring overheads due to preemption, one should begin by examining the hottest functions in the application, as determined by gprof or a similar execution time profiler. The hottest functions typically contain the largest opportunities for improvement. Once the hottest functions have been identified, one should inspect them to determine whether they perform large numbers of calls to functions that should have been marked as non-preemptable, or they perform a large number of direct references to global data items that should have been marked non-preemptable. A direct reference is a reference without any 's in C or C++, and a global data item is a data item declared outside any function without using the "static" keyword.
A direct reference to a preemptable global data item requires 2 levels of indirection to access the data value. First the offset into the global offset table must be determined, then the pointer to the global object can be loaded from the global offset table and finally the object value itself is loaded. In addition to the code size, the 2 levels of indirection also increase the pressure on the data caches. By creating assembly listings for hot functions, using gcc or the Intel compiler with the S option, one can get an idea on the number of references through the global offset table by looking for the ltoff relocations. Through the use of the software models described in the following section it may be possible to reduce both code size and data cache behavior.
Position independent code is a requirement for a symbol to be preemptable, however once it has been determined that a symbol will not be preempted it may also be possible to use position dependent code to reference the symbol if it will be linked into the main executable. It is of course not possible to use position dependent code for any symbol in an object that will be used in a shared object. To analyze the effect of marking symbols as non-preemptable and possibly referenced by position dependent one should notice the static number of ltoff references decrease in assembly and disassembly listings and the number of gprel and movl increase.
The overhead of calling a function that is preemptable is to save and restore the global domain pointer across the call and the linker must resolve the relocation such that preemption can occur. By marking a global function as non-preemptable we know it will be bound within the same global domain and therefore it is no longer necessary to save and restore the global domain pointer across the call site to the function. The linker will also benefit from knowledge of a global function that is not preemptable, as the linker may bind the symbol at link time instead of generating an import stub for the call site that has to be resolved at load time. On the other hand if it is known that a call to a global function will be to another shared object the software model options described makes it possible to mark the symbol as it will always be resolved from another component and the import stub can be inlined for better code locality.
![]()
If you're interested in this topic, these articles may be helpful:
![]() | Using Intel C++ Compiler 8.0 with the Eclipse Integrated Development Environment on Linux systems Intel Corp. The Eclipse Project is an open-source software-developm... |
![]() | Design SOA services with Rational Software Architect, Part 4: Generate and test Web services from UML models This tutorial, Part 4 of the Design SOA services with Rational Softwar... |
![]() | Apply patterns to classes using IBM Rational Software Modeler Apply patterns to classes using IBM Rational Software Modeler This de... |
![]() | Runtime environment security models by Selim Aissi, Intel R&D, Intel Corp. The tremendous new potential... |
![]() | Rational Modeling Extension for Microsoft .NET 7.0 IBM® Rational® Modeling Extension for Microsoft® .NET is a... |
![]()
Related Jobs:


