- Extending Xen* with Intel® Virtualization Technology
- ENERGY STAR* System Implementation
- Competitive Comparison: Dual-Core Intel® Xeon®: Processor-based Platforms vs. AMD Opteron*
- CMP Implementation in Systems Based on the Intel® Core™ Duo processor
- Software Company Plans for Multi-Core: How Epic Games, Adobe Systems, and IBM use Multi-Core Capability
- How to use all of CPUID for x64 platforms under Microsoft Visual Studio .NET 2005
- Flash and .NET Integration using ASP.NET
- Build and consume an ASP.NET Web service
- Multithreaded .NET Web service clients: threads and responsiveness
- High performance image processing and visualization in .NET client applications: Intel Integrated Performance Primitives (IPP)
Welcome to the Intel® Software Dispatch Subscription Program
by Alan Zeichick,
The Intel VTune Performance Analyzer is more than a code profiler—it actually gets inside the code to show you where the bottlenecks are, and helps identify areas where code may have scalability problems. This article discusses the latest version of this utility, released to support the Microsoft .NET Framework.
Remember that classic line from Star Trek III: The Search for Spock, when Scotty sabotages a brand-new Federation starship, and quips, "The more they overtake the plumbing, the easier it is to stop up the drain"? That's a lesson that software developers, as well as starship engineers, take to heart, especially when it comes to today's ever-more-complex execution stacks, such as Microsoft .NET. Say you're developing a reasonably sophisticated application, and it's just not running as fast as you think it ought to. Where do you look for the bottleneck?
Consider the amount of code vying for CPU time. You have the operating system (such as Windows 2000 or Windows XP) and its myriad device drivers. You have Microsoft .NET Framework and its Common Language Runtime. There are components and class libraries galore, plus background tasks, network stacks, parsers, interrupt handling routines, and so on. Then there's your own precious application source code, nestled gently within that runtime cocoon.
As I wrote about earlier this year (Your Fastest Route to Speeding Up Application Performance), the Intel VTune Performance Analyzer can take the mystery out of runtime performance problems. It not only traces the execution of the application, you can think of it as a "code review in a box"—it's not going to replace the need for your developers to get together occasionally and walk through their routines during live code reviews, but it can help your coders write better code more quickly.
For most of my recent development work I had been using Intel VTune Performance Analyzer version 5.0 on top of Windows 2000 for analyzing and tuning C++ and Fortran applications written for the Win32 APIs. While powerful, those languages aren't enough for all situations—like many developers, I've long built Windows applications in Visual Basic.
Then came .NET, which was formally introduced earlier this year. I've recently begun working with .NET development using Visual Basic .NET and Visual C# .NET, and my most recent acquisition was a license for Intel's updated Intel VTune Performance Analyzer version 6.1, which can work with any .NET language. Given the breadth and scope of the profile's new capabilities, it's time for a second look at what has become an essential piece of many enterprise developer's toolboxes.
Performance Data Collection and Analysis
It's perhaps easiest to think of the Intel VTune Performance Analyzer as having two pieces: a runtime monitor and a post-execution analysis suite. While working on your application-right from within Visual Studio .NET—you can configure the analyzer to take snapshots of the actual execution. Those snapshots can be configured to be taken at user-definable time slices (so programmers can see which modules are taking up all the CPU cycles) or at specific breakpoints (such as a jump to a troublesome routine, or at any movement between code modules). While the application runs, Intel VTune Performance Analyzer's data collector builds a large database that comprises a complete profile of the application's runtime experience.
The second piece of the Intel VTune Performance Analyzer is the analysis suite. That's where you'll see all the usual pictures common to any source-level profiler. Such reports can help you trace the most common causes of software slowdowns, such as putting some static code inside a loop, when it might be more efficiently placed outside the loop. You have a choice between viewing just the source code, just the disassembled object code, or a mixed view of both source and disassembled code at the same time.
While views of individual lines of source code can are essential for tracking down the root cause of very subtle errors, I find Intel VTune Performance Analyzer's Call Graph viewer to be even more valuable, because it shows runtime information about specific modules. And as any .NET programmer knows, coding and debugging take place at the modular level, such as a specific DLL or component.
From within the Call Graph viewer, you can get an instant understanding of which functions call other functions, how often they call them, and when. That lets you check on dependencies, as well as understand which functions are consuming the most CPU cycles.
Speaking of CPU cycles, one of the biggest improvements between Intel VTune Performance Analyzer versions 5.0 and 6.0 is in the tool's support for multiple threads and multiple processors. Data collection on multi-threaded applications is now fully automatic; you don't have to jump through any hoops to catch and track the threads. While that's probably of little interest for programmers building applications for end-user workstations (which typically have single processors), it's vital for ensuring that server-side applications work and scale gracefully.
One of my favorite uses of the Intel VTune Performance Analyzer is to run it twice—the first with the application running with a minimal test load, and the second time with it handling many transactions-and then compare the differences using Call Graph. The new version 6.1 makes that easy by having a built-in viewer for comparing the results of multiple execution runs. By spending a little time with Intel VTune Performance Analyzer, you can figure out which parts are scaling predictably nicely at O(n), which are a little worrisome at O(n log n), and which are behaving at an unfortunate O(n^2). And therefore, which parts might be poorly architected for a real-world Web transaction server.
![]()
If you're interested in this topic, these articles may be helpful:
![]() | Data access performance in ADO.NET by Ramesh Theivendran, architect, Borland Software Corp. First publish... |
![]() | EJB best practices: the fine points of data validation - how to get the best performance out of your validation code by Brett McLaughlin, author and editor, O'Reilly and Associates.First ... |
![]() | Threading Games for High Performance on Intel® Processors The evolution of the multi-threaded processor design is the trend for ... |
![]() | Intel® Integrated Performance Primitives 4.1 Intel® Integrated Performance Primitives (Intel® IPP) is a library o... |
![]() | High Performance Linux Clusters with OSCAR, Rocks, OpenMosix, and MPI by Joseph D. Sloan. O'Reilly Media Inc. To the outside world, a "su... |
![]()
Related Jobs:


