Turn performance killers into performance enablers
Gigaflop C++ Floating Point Binary Number Conversion: It's amazing the impact that a simple float-to-integer conversion can have on performance, or what an invisible store-forward violation might do at runtime, or even the effect of touching the processor's carry flag.float integer conversion performance store forward violation processor carry flag

by Alan Zeichick, principal analyst, Camden Associates. Intel Corp.

It's fast, efficient, and ideal for desktop and server applications-but Intel Corp.'s Pentium® 4 processor, like all microprocessors, has its quirks. That means that some application code, whether written in C, C++ or Assembler, might look great, and might indeed execute correctly. But, the app might be v-e-r-y slow, due to specific features of the processor's architecture, or (more commonly) the design of the compiler.

Floating away
Take the tasks of converting a floating-point number to a 32-bit integer. That's a common enough task, which, according to the ANSI C/C++ definitions, should be handled by simply truncating the fractional portion of the number.

Microsoft Visual C++, version 6.0, handles this operation by issuing a call to the _ftol (float to long int) function, which is included in the Microsoft C runtime library. And indeed, a properly ANSI-compliant integer does appear after you make the call.

That _ftol function is a general-purpose call, which can handle both 32-bit and 64-bit conversion. It temporarily modifies the floating-point rounding mode to perform a straight truncation, does the roundoff and type conversion, and then sets the rounding mode back again to its previous state. This method, according to Intel research, is a long latency operation-much longer than you might think.

How can you see if your application has this problem? In a couple ways. You can create and scan an assembly listing, and look for an explicit float-to-int cast that's being compiled as a call to _ftol. You could also check the compiler messages to see if there's an implicit cast that should generate a warning.

If you have float-to-int operations in your code, there are several steps you can take. One is to enable the /QIfist compiler switch. This switch suppresses the use of _ftol when performing conversions, and makes the operation go much faster. That's the good news. The bad news is that the results may not quite conform to the ANSI C standard, unless you change the default rounding mode to "truncate" using the _controlfp C runtime function in the float.h header. However, use this trick with caution: Changing that default might have unwanted side effects in your application. So be sure to test the app thoroughly.

Another option is to use the Streaming SIMD Extensions 2 (SSE2), found in the Intel® Pentium 4 and Intel Xeon™ processors, to accelerate the operation. Here, the appropriate instruction is CVTTSS2SI. Intel recommends that you implement that function using intrinsics, rather than inlining the assembly code. The only problem here, of course, is that you'll have to set up code branches for older processors, such as the Intel Pentium III chip, which lacks the SSE2 extensions.

For more, see Microsoft's online C++ reference on SSE2-based numerical conversions. Another resource is Microsoft's "Floating-Point Intrinsics Using Streaming SIMD Extensions."

Subscribers who liked this article also read:
Boosting Cryptography Performance with Intel® Libraries
by Muneesh Nagpal, server applications engineer, Core Software Divisio...
Multiprocessors, clusters, grids, and parallel computing: what's the difference
by Andrew Binstock, principal analyst, Pacific Data Works LLC. Intel C...

If you're interested in this topic, these articles may be helpful:

Web services extend high-performance computing grid capabilities
by Matt Gillespie, technical author. Intel Corp. Grid computing bas...
Enterprise Java performance: best practices
by Kingsum Chow, Ricardo Morin, Kumar Shiv, Software and Solutions Gro...
Rational Performance Tester V8
Download a free trial version of Rational® Performance Tester. R...
A high-performance architecture for distributed object computing
by Douglas C. Schmidt, professor, Vanderbilt University Distributed...
Boosting Cryptography Performance with Intel® Libraries
by Muneesh Nagpal, server applications engineer, Core Software Divisio...

Related Jobs:

Microsoft SQL Server Developer #254703 - NE - Omaha - Tuxedo Technologies
The successful candidate will be responsible for designing and deliver...
Software Development Engineer #137878 - WA - Redmond - Microsoft Corporation
Do you like making other developers' lives easier? Is assembly languag...
Software Development Engineer #145785 - WA - Redmond - Microsoft Corporation
Do you want be a part of the team that is revolutionizing the way we t...
Group Program Manager #144790 - WA - Redmond - Microsoft Corporation
Have you had a friend or family member who has been frustrated in deal...
S/W Engineer #13 - CA - Milpitas - Sigma Designs, Inc.
Description of duties and responsibilities: Responsible for the ana...
Software Engineer #221145 - CA - Santa Clara - Atheros Communications, Inc.
We are looking for a software engineer to join the software engineerin...
Communications Firmware Engineer - CA - Santa Clara - Atheros Communications, Inc.
Communications Firmware Engineer Oct 27, 2005 Santa Clara,Califo...
Sr. Staff Engineer, VoIP #016 - CA - Fremont - Centillium Communications
JOB TITLE: Sr. Staff Engineer, VoIP Job Number: 016 Location: Fremo...
Driver Firmware Engineer - CA - Cupertino - Excess Bandwidth Corporation
The candidate will be part of a team of hardware and software engineer...
Microsoft SharePoint Developer #251434 - NE - Omaha - Tuxedo Technologies
MSI Systems Integrators, architects, builds and supports information t...