by Dean Macri, Solutions Enabling Group, Intel Corp.
Welcome to the first installment of the game developer's maximum performance column! Once a month we'll be delivering concise information to help you achieve peak performance (and frame rates) in games you develop for Intel® architectures. In this premiere column we'll examine some peculiarities of AGP memory and show how you can reduce bus transactions in your 3D games to achieve speedups of 20% or more in some cases. In the coming months, we're going to provide a gamut of topics with exciting information like an introduction to game development for Intel StrongARM and Intel XScale™ processors. We want this column to meet your needs as game developers, so drop us a line if you have suggestions for topics you'd like to see us address. Now let's peel back the fancy layers of high-level 3D API's and see what's really happening between the processor, the graphics card, and this mysterious beast known as AGP memory.

Figure 1. PC Architecture with AGP 4x support
AGP history
Introduced with the Intel Pentium® II processor in 1997, the Accelerated Graphics Port (AGP) interface provides a high-speed mechanism for video cards to access main system memory. Debuting at 266 MB/sec peak transfer speed, the current specification, known as AGP 4x, allows for a peak transfer rate of 1067 MB/sec, eight times faster than the PCI bus found in desktop systems. Figure 1 depicts the architecture of a PC with support for AGP 4x. Notice that the graphics card has a direct pathway along the AGP bus, through the chipset to the system memory. For a system with a 133-MHz front-side bus, the AGP bus speed equals the memory bus speed, but still falls short of the 3.2GB/s peak bus speed of Intel Pentium 4 processor-based systems.
In addition to the high transfer speed, requests on the AGP bus can be pipelined whereas the PCI bus only supports sequential transfers with a special case for bursting. Figure 2 shows the benefit of pipelined accesses provided by the AGP specification. The PCI bus requires the data, D1, to arrive from the request of memory from address A1 before it can submit a request for memory from address A2. In a more efficient manner, the AGP bus can accept requests for memory from addresses, A2 through An while waiting for data item D1 to arrive. In this fashion, data items D2 through Dn arrive much sooner than in a non-pipelined scheme like that provided by the PCI bus. Something we'll talk about later is a burst operation that both the AGP and the PCI buses support. In a burst operation, a small number of data items (4 to 8) can be sent across the bus with a single memory address request.

![]()
![]() | Creating a particle system with streaming SIMD extensions by William Damon, technical marketing engineer, Software Solutions... |
![]() | Combining Linux Message Passing and Threading in High-Performance Computing by Andrew Binstock, principal analyst, Pacific Data Works LLC. Intel C... |
If you're interested in this topic, these articles may be helpful:
![]() | Maximum FPS: three tips for faster code by Dean Macri, Solutions Enabling Group, Intel Corp. Welcome back t... |
![]() | Open Source Game Development Threading Quake 3 Quake* 3 Profiling So, where do we begin threading? Profiling, Pro... |
![]() | Threaded Cross-Platform Game Development by Brad Werth Introduction The technology of computer gaming is unde... |
![]() | Three Methods for Speeding up Matrix-Vector Multiplication by Kiefer Kuah, Intel Corp. Speeding up matrix-vector multiplicati... |
![]()
Related Jobs:

