Lecture 004

Performance Indicator:

Challenge of Game Rendering:

Profiling: a system that automatically checks CPU usage for games in game platforms at release time.

Rendering Outline

Rendering Outline

No topics of cartoon rendering, 2D rendering, subsurface, and hair/fur is included in this course.

Overview of Rendering Pipeline

Overview of Rendering Pipeline

Projection and Rasterization

Projection and Rasterization

Different parts of GPU in shader code

Different parts of GPU in shader code

GPU Architecture

SIMD vs SIMT

SIMD vs SIMT

SIMD: common in CPU, one instruction multiple data. Combine data into vector and do vector calculation.

SIMT: single instruction multi-threads. Multiple cores take in the same instruction but with different data.

GPC: Graphics Processing Cluster. Graphic card's special region for graphics.

For more GPU Architecture related info, please look at CUDA.md

Von Neumann architecture: separation of storage and calculation. It makes design of hardware easier while reducing speed.

Dataflow Buttleneck: minimize latency by reducing CPU-GPU dependence (reducing data transfer back from GPU to CPU)

Dataflow Buttleneck: minimize latency by reducing CPU-GPU dependence (reducing data transfer back from GPU to CPU)

Nvidia and AMD GPU Memory Latency

Nvidia and AMD GPU Memory Latency

CPU and GPU Memory Latency

CPU and GPU Memory Latency

Application speed/latency is limited by:

Modern Hardware Pipeline

Modern Hardware Pipeline

Different Hardware:

Xbox Series X SOC Unified Memory Architecture

Xbox Series X SOC Unified Memory Architecture

Other State-of-Art Architecture: support tile-based rendering

Other State-of-Art Architecture: support tile-based rendering

Rendering

Mesh Primitive: a naive way to store mesh

Mesh Primitive: a naive way to store mesh

Mesh Primitive: store vertex and index buffer

Mesh Primitive: store vertex and index buffer

Instead of storing triangles and vertex directly, we store indice like an .obj file. This saves storage about 6x. But there are various methods:

Note that we need normal for each vertex because one vertex can form infinitely many triangles.

Material: how a mesh look like, there isn't only one material model.

Material: how a mesh look like, there isn't only one material model.

Phong Model, PBR Model, SubSurface Profile

Phong Model, PBR Model, SubSurface Profile

Combine Texture Map into Material using Shader

Combine Texture Map into Material using Shader

Shader is classified as data in game engine although it itself is code.

When a mesh correspond to more than one shader, we need to use many "submesh"s bind to the same set of UV and Vertex Position but with different offset, shader, and texture.

Submesh

Submesh

Mesh, Shader, and Texture Pool: to save memory space, each mesh only stores a reference to specific instance in Mesh, Shader, and Texture Pool

Mesh, Shader, and Texture Pool: to save memory space, each mesh only stores a reference to specific instance in Mesh, Shader, and Texture Pool

Material Sorting: we group meshes of the same material or texture for GPU to render in a batch to save updating the material from CPU to GPU

Material Sorting: we group meshes of the same material or texture for GPU to render in a batch to save updating the material from CPU to GPU

Instancing Example: drawing the same tree multiple times

Instancing Example: drawing the same tree multiple times

Culling: throwing out renderable object out of visible cone to reduce the amount of things for GPU to draw

Bounding Box of object is used for easy calculation of culling

Bounding Box of object is used for easy calculation of culling

There are different approach to bounding Box:

Bounding Boxes

Bounding Boxes

BVH Tree Culling Example: We construct the tree by merging spherical bounding box. Then to test what objects need to be drawn, we traverse from root to leaf.

There are many methods to construct BVH tree. BVH is good for dynamic scene since the construction is relatively cheap.

There are many methods to construct BVH tree. BVH is good for dynamic scene since the construction is relatively cheap.

PVS: a way to do culling and resource pre-load. Each room can only see subset of the other rooms through portals (in red). Space auto-partition algorithm is complex. While almost no game engine use PVS directly, the idea remains.

PVS in early games

PVS in early games

GPU Culling: use GPU to calculate culling for each object. The GPU only return true/false result. The speed is comparable to PVS and many other culling methods. Greatly adopted by industry as GPU calculates fast.

An Example to use PVS to generate regions in semi-linear scene

An Example to use PVS to generate regions in semi-linear scene

Table of Content