by Kiefer Kuah, Intel Corp.
Speeding up matrix-vector multiplication with SSE instructions, threading, and data restructuring
Single-instruction multiple-data instructions, threading, and restructuring data are three common optimization methods. The performance impact that these methods can make on matrix-vector multiplication was investigated here. The different implementations were tested on three hardware configurations and compared to the version written in C. All three methods resulted in measurable gains. As expected, while it resulted in gains on the Hyper-Threading–technology–enabled system and the dual-processor Intel Xeon chip system, the multithreaded version did not exhibit performance benefits on the uniprocessor system.
Introduction
The multiplication of a matrix and a vector is a common operation in applications such as 3D graphics games. We investigated a few ways to write the code for this operation and assess the performance of each version on a 2.8-GHz uniprocessor Hyper-Threading–technology–enabled Intel Pentium 4 processor system as well as on a 2.4-GHz Pentium 4 Xeon system.
![]()
![]() | The "Rich-Client" Advantage for .NET Web Services by Dan Fineberg, enterprise/business marketing manager, and Gary Hayco... |
If you're interested in this topic, these articles may be helpful:
![]() | For-loop threading methods by Jeff Andrews, application engineer, Intel Corp. Explore differen... |
![]() | Multi-Threading for Experts: Inside a Parallel Application by Sergey N. Zheltov, project manager, and Stanislav V. Bratanov, soft... |
![]() | Utilizing thread pools in performance-critical applications by Blake Thompson, application engineer, Intel Corp. When using ... |
![]() | Multiple approaches to multithreaded applications by George Walsh, freelance researcher and writer. Intel Corp. The... |
![]() | Combining Linux Message Passing and Threading in High-Performance Computing by Andrew Binstock, principal analyst, Pacific Data Works LLC. Intel C... |
![]()
Related Jobs:

