How to place the CPU side version most efficiently in memory will be described now.
The memory you use to build your CPU-side version of a constant buffer shouldn’t just be a local variable.
DX11.1 adds API features that allow an application to manage constant buffer memory directly.
It allows the creation of big constant buffers that can be filled with data for multiple rendering operations in one go.
This is possible in DX11.1, as the constant buffers for shader stages can be set in a way that only a subsection of a larger constant buffer ends up being utilized by a shader – e.g.
by using PSSet Constant Constant Buffers1() for the pixel stage.
Figure 1 – Renaming a sequence of constant buffer updates Example: 4096 bytes of constant buffer data (2048 for each, the pixel stage and the vertex stage) being updated for 10000 draw calls per frame, gets renamed to ~156MB (assuming 4 frames are in flight – e.g. So clearly this exceeds the above mentioned limit of 128MB, causing the driver to throttle down.
The accumulated memory size of all rename operations can grow so much that this causes the graphics driver to run out of renaming space.
Unfortunately it turns out that a bunch of small updates per draw call can create severe bottlenecks.
The goal of this post is to revisit constant buffer usage patterns and to also take a look at some of the superior new partial constant buffer updates that Direct X 11.1 ™ in Window 8 ™ allows for.
Under Direct X 9 ™ approximately 80% of all command buffer traffic sent to the GPU was shader constant data.
Smart game engines avoided falling into this trap by performing partial constant updates, while other engines would send all updates for every Draw() call – which is obviously extremely wasteful.