Maximum FPS: fast AGP writes for dynamic vertex data
->

Figure 2. AGP Pipelining vs. PCI transfers

The AGP bus also supports sideband addressing to allow overlapping of data with new address requests for further performance improvements. We won't need to know any more about the underlying protocol and addressing scheme of the AGP specification, but the description just provided should help us to better understand what's going on under the hood of the machine. What will be essential is to realize that the graphics card uses the AGP interface to access data in the system's main memory. It's this data in main memory that we're going to look at in detail.

Pre-requisites for AGP memory
For applications to be able to take advantage of the benefits of the AGP bus, several housekeeping tasks must be performed by the operating system and the hardware. First, because system memory is being used primarily for typical applications, there can be considerable fragmentation based on what applications are running and what particular memory allocations have occurred. A request to allocate 256k of memory for the graphics card to access via the AGP bus could result in several small, non-contiguous chunks of main memory being allocated. To make this appear as one 256k, contiguous piece of memory to the graphics card, the chipset, which controls the communication between the processor and main memory as well as the PCI bus, has something called a Graphics Address Re-Mapping Table (GART). The GART maps a linear range of virtual memory addresses to multiple, 4k, physical addresses in main memory. Figure 3 shows an example of a region of linear memory visible to the graphics card being mapped to a non-contiguous set of pages in physical memory. The amount of memory available for remapping by the GART is often determined by a setting in the BIOS known as the AGP aperture. Values vary from system to system but there is usually a maximum limit of half the available system memory.


Remapping of Memory Addresses through the GART

Figure 3. Remapping of Memory Addresses through the GART

We've addressed the tasks the chipset must perform. Next, because the graphics card will be accessing the memory regularly and performance is critical, the operating system must make sure that memory that will be accessed by the AGP bus is not swapped out to disk. The operating system does this by locking the pages of memory. Finally, because current processors have one or more levels of caches to improve performance of typical applications, something must be done to make sure the graphics card sees the most up to date data. To achieve this without requiring the graphics card to "snoop" the caches of the processor, the memory being used for AGP transfers is marked as not cacheable, uncacheable, by the processor. This uncacheable memory is the heart of what we're going to look at to achieve performance improvements in 3D applications. To understand it, we need to examine a feature that Intel processors have carried for several generations.

Memory types
Starting with the Intel Pentium Pro processor, Intel processors have provided a mechanism for changing the access characteristics of regions of memory using a set of registers know as the Memory Type Range Registers (MTRRs). These registers are not accessible to a normal application and can only be changed by the operating system or a device driver. However, the drivers that support AGP memory use the MTRRs to provide the right characteristics to the memory being accessed by the graphics card. Let's look at the various options that can be selected using the MTRRs.


Subscribers who liked this article also read:
Creating a particle system with streaming SIMD extensions
by William Damon, technical marketing engineer, Software Solutions...
Combining Linux Message Passing and Threading in High-Performance Computing
by Andrew Binstock, principal analyst, Pacific Data Works LLC. Intel C...

If you're interested in this topic, these articles may be helpful:

Open Source Game Development Threading Quake 3
Quake* 3 Profiling So, where do we begin threading? Profiling, Pro...
Maximum FPS: three tips for faster code
by Dean Macri, Solutions Enabling Group, Intel Corp. Welcome back t...
Threaded Cross-Platform Game Development
by Brad Werth Introduction The technology of computer gaming is unde...
Three Methods for Speeding up Matrix-Vector Multiplication
by Kiefer Kuah, Intel Corp. Speeding up matrix-vector multiplicati...

Related Jobs:

Engineer, Staff (Software) - FL - St. Petersburg - L-3 Communications - Security & Detection Systems
SUMMARY: Plans and performs engineering research, design and developm...
Software Development Engineer #149993 - WA - Redmond - Microsoft Corporation
Crazy about movies and TV? Ga-ga about your digital music? Love to...
Architect Evangelist #151405 - WA - Redmond - Microsoft Corporation
Are you passionate about customers? Want to innovate and expand user e...
Front End Programmer - British Columbia - Vancouver - Vivendi Games
Due to our phenomenal success & expansion, Radical is actively seeking...
Software Development Engineer #151285 - WA - Redmond - Microsoft Corporation
Do you memorize DirectX headers for fun? The Aces Studio is seeking a ...
Network Programmer - CA - Irvine - Blizzard Entertainment
Network Programmer Blizzard is looking for a talented network prog...
QA Tester #9917 - FL - Orlando - Electronic Arts Inc.
Post Description Electronic Arts - Tiburon is currently recruiting v...
Software Development Engineer #148483 - WA - Redmond - Microsoft Corporation
Crazy about movies and TV? Ga-ga about your digital music? Love to ...
Software Development Engineer #151570 - WA - Redmond - Microsoft Corporation
Join the Gaming Revolution on Windows Vista! If you have a passion for...
Software Development Engineer #148838 - WA - Redmond - Microsoft Corporation
In the 21st century everyone is photographer. We owe much to affordabl...