Turn performance killers into performance enablers
Gigaflop C++ Floating Point Binary Number Conversion: It's amazing the impact that a simple float-to-integer conversion can have on performance, or what an invisible store-forward violation might do at runtime, or even the effect of touching the processor's carry flag.float integer conversion performance store forward violation processor carry flag

by Alan Zeichick, principal analyst, Camden Associates. Intel Corp.

It's fast, efficient, and ideal for desktop and server applications-but Intel Corp.'s Pentium® 4 processor, like all microprocessors, has its quirks. That means that some application code, whether written in C, C++ or Assembler, might look great, and might indeed execute correctly. But, the app might be v-e-r-y slow, due to specific features of the processor's architecture, or (more commonly) the design of the compiler.

Floating away
Take the tasks of converting a floating-point number to a 32-bit integer. That's a common enough task, which, according to the ANSI C/C++ definitions, should be handled by simply truncating the fractional portion of the number.

Microsoft Visual C++, version 6.0, handles this operation by issuing a call to the _ftol (float to long int) function, which is included in the Microsoft C runtime library. And indeed, a properly ANSI-compliant integer does appear after you make the call.

That _ftol function is a general-purpose call, which can handle both 32-bit and 64-bit conversion. It temporarily modifies the floating-point rounding mode to perform a straight truncation, does the roundoff and type conversion, and then sets the rounding mode back again to its previous state. This method, according to Intel research, is a long latency operation-much longer than you might think.

How can you see if your application has this problem? In a couple ways. You can create and scan an assembly listing, and look for an explicit float-to-int cast that's being compiled as a call to _ftol. You could also check the compiler messages to see if there's an implicit cast that should generate a warning.

If you have float-to-int operations in your code, there are several steps you can take. One is to enable the /QIfist compiler switch. This switch suppresses the use of _ftol when performing conversions, and makes the operation go much faster. That's the good news. The bad news is that the results may not quite conform to the ANSI C standard, unless you change the default rounding mode to "truncate" using the _controlfp C runtime function in the float.h header. However, use this trick with caution: Changing that default might have unwanted side effects in your application. So be sure to test the app thoroughly.

Another option is to use the Streaming SIMD Extensions 2 (SSE2), found in the Intel® Pentium 4 and Intel Xeon™ processors, to accelerate the operation. Here, the appropriate instruction is CVTTSS2SI. Intel recommends that you implement that function using intrinsics, rather than inlining the assembly code. The only problem here, of course, is that you'll have to set up code branches for older processors, such as the Intel Pentium III chip, which lacks the SSE2 extensions.

For more, see Microsoft's online C++ reference on SSE2-based numerical conversions. Another resource is Microsoft's "Floating-Point Intrinsics Using Streaming SIMD Extensions."

Subscribers who liked this article also read:
Multiprocessors, clusters, grids, and parallel computing: what's the difference
by Andrew Binstock, principal analyst, Pacific Data Works LLC. Intel C...
Boosting Cryptography Performance with Intel® Libraries
by Muneesh Nagpal, server applications engineer, Core Software Divisio...

If you're interested in this topic, these articles may be helpful:

Rational Performance Tester V8
Download a free trial version of Rational® Performance Tester. R...
Web services extend high-performance computing grid capabilities
by Matt Gillespie, technical author. Intel Corp. Grid computing bas...
Boosting Cryptography Performance with Intel® Libraries
by Muneesh Nagpal, server applications engineer, Core Software Divisio...
A high-performance architecture for distributed object computing
by Douglas C. Schmidt, professor, Vanderbilt University Distributed...
Enterprise Java performance: best practices
by Kingsum Chow, Ricardo Morin, Kumar Shiv, Software and Solutions Gro...

Related Jobs:

Microsoft .NET Consultant/Developer #254705 - NE - Omaha - Tuxedo Technologies
MSI is looking for a Microsoft .NET Developer with creative programmin...
Sr. Staff Engineer, VoIP #016 - CA - Fremont - Centillium Communications
JOB TITLE: Sr. Staff Engineer, VoIP Job Number: 016 Location: Fremo...
Software Development Engineer in Test #137877 - WA - Redmond - Microsoft Corporation
Ever found a compiler bug? Have you ever been amazed at how much small...
Network Systems Engineer #233132 - NE - Omaha - Tuxedo Technologies
Job Purpose: This position, with general supervision and following ac...
Software Engineer #264677 - CA - Santa Clara - Atheros Communications, Inc.
We are looking for a software engineer to be part of the core WLAN eng...
Driver Firmware Engineer - CA - Cupertino - Excess Bandwidth Corporation
The candidate will be part of a team of hardware and software engineer...
Software Development Engineer #132027 - WA - Redmond - Microsoft Corporation
Home theaters powered by Windows; Seamless flow of audio and video bet...
Staff Software Engineer #018R - CA - Fremont - Centillium Communications
JOB TITLE: Staff Software Engineer Job Number: 018R Location: Fremo...
Software Development Engineer #145785 - WA - Redmond - Microsoft Corporation
Do you want be a part of the team that is revolutionizing the way we t...
Microsoft SQL Server Developer #254703 - NE - Omaha - Tuxedo Technologies
The successful candidate will be responsible for designing and deliver...