by Blake Thompson, application engineer, Intel Corp.
When using threading in a performance critical-application, it is possible for the overhead associated with thread creation and destruction to overwhelm the benefits of utilizing threads. One method to mitigate this problem is to use a thread pool. This document defines and provides examples of thread pools, as well as suggesting when and when not to use one and providing a sample implementation of a thread pool in C++. It does not show implementations in any other languages, although the concepts delineated here can be used with other languages.
Background
With the advent of Hyper-Threading technology, threading in applications is more important to fully utilize a modern processor. A thread pool, which consists of a group of threads that are pre-allocated near the beginning of program execution and re-used again and again, can help to avoid some of the overhead associated with creating and destroying threads.
There is no definitive rule to determine whether a thread pool will benefit an application. As a general guide, applications that use many threads or that continually create and destroy threads to do small amounts of work are good candidates for a thread pool. Unfortunately, the best way to know is to implement a thread pool and test its performance in an application. The overhead to the programmer of doing this can be minimized by using Classes that abstract the thread pool, thus allowing the programmer to write the code to implement a thread pool once and then to re-use it many times. The details of properly abstracting a thread-pool class are beyond the scope of this document.
Sample code
The source code included here contains a simple implementation of a thread pool. Its Main() routine compares three methods of thread control to do a trivial piece of work.
The first method utilizes the thread pool to do the work. Further discussion of how the thread pool implementation works is discussed later in this document. The second method creates four threads and waits for each thread to complete execution before looping and creating another four threads, which is repeated until all of the necessary work is completed. The third method simply creates threads as fast as it can with no bounds on the number of threads.
To run the example, you must first compile the source code. This code was written for Win32 using Microsoft Visual C++ 6.0. It may work with other compilers and operating systems with some modifications. After compiling, execute the generated .exe file.
Sample code result
Each method will execute in turn and print the runtime of that method in milliseconds. The output of the example will obviously vary, depending on the speed of the processor that it is executed on. In general, the thread-pool method should be the fastest. Although the difference in execution times is small in this example, one should expect to see significant time differences in a larger, real-life implementation (for example in a video encoder).
![]()
![]() | Maximum FPS: three tips for faster code by Dean Macri, Solutions Enabling Group, Intel Corp. Welcome back t... |
If you're interested in this topic, these articles may be helpful:
![]() | The pillars of application quality: security, functionality, and performance testing from SPI Dynamics Inc. As enterprises put more essential daily busi... |
![]() | Accelerating .NET applications with the Intel VTune Performance Analyzer 6.1 by Alan Zeichick, The Intel VTune Performance Analyzer is more th... |
![]() | J2EE performance optimization, part 3 - design of experiments for performance tuning by Kingsum Chow, Ph.D., senior performance architect, Managed Runtime ... |
![]() | High-performance computing for the enterprise draws near by Edmund X. DeJesus, technical writer. Intel Corp. Can high-perfor... |
![]() | Intel® VTune™ Performance Analyzer Version 7.2 for Windows Streamline your code in just a few clicks, collect, analyze and displa... |
![]()
Related Jobs:

