Lecture_006_Perspective_Projection_and_Texture_Mapping

Transformations

Transformation on camera is the same as transformation on all triangles

Rotation: inverse rotation is transpose
Translation: inverse translation is negative translation

Perspective Projection

Notice that all previous transformation from local, world, to view space is all done using matrix. The transformation from view space to clip space can also be done by matrix.

Local to World: transform geometry that is attached to other geometries to a world coordinate
World to View: transform the entire world so that camera is facing negative z direction, at origin
View to Clip: $w$ after this point don't have geometric meaning, so think it as an extra space to store some attribute instead.
- do a perspective matrix (see the matrix below and its invariants)
  - copy original z to w (if we don't do that, then will lose correct depth for interpolation and later perspective division by z)
  - non-linearly scale z so that near/far plane in w cube
  - shrink the x and y region that we suppose to see to [-w, w] region using focal length
- clip: during clipping, we clip x, y, z with original w
- z is nicer to user for clipping than w because all valid z value now is in [-w, w], it also has nice property that things closer to us has better resolution which helpes depth buffer testing.
- after clipping: make w to inv_w
- multiply only x, y, z by inv_w (that stored original inverted z in it)
- rasterization happens
Clip to NDC (Normalized Device Coordinate): We have $x_d = x_c / w_c, y_d = y_c / w_c, z_d = z_c / w_c$ , but $w_d = 1/w_c$ . This $w_d$ is for perspective-correct interpolation. Note that the transformation is no longer matrix.
NDC to Screen: We have $x_s = x_{width}(x_d + 1)/2 + x_{offset}, y_s = y_{height}(y_d + 1)/2 + y_{offset}$ , for mysterious reason, $z_s = (z_d + 1) / 2$ // QUESTION: why Q// QUESTION Rasterization and interpolation is done in Screen space. // QUESTION can 1/w be used for frame buffer? // TODO: z=1/z, w=z.

Here is a very good video to explain all the details of transformation after View Space.

Our 4x4 perspective transform matrix can be constructed like follow: Notice the matrix does not do the perspective projection (division by $z$ ).

/// Compute perspective projection matrix with
///  fov: vertical field of view (in degrees),
///   ar: aspect ratio (x/y)
///    n: near plane.
///
/// The camera is located at the origin looking down the -z axis
///  with y up and x right.
/// The far plane is at infinity.
///
/// This projection maps (x,y,z,1) to (x',y',z',w') such that:
///  - all visible points have w'>0 and (x',y',z')/w' in [-1,1]^2
///  - points on the near plane (z=-n) map to points with z'/w'= -1.0
///  - points on the far 'plane' (z=-inf) map to points with z'/w'= 1.0
///  - objects are closer if their mapped depth is lower
///  - w' must be z for perspective-correct interpolation
static Mat4 perspective(float fov, float ar, float n);

$\begin{bmatrix} c_x & 0 & 0 & 0\\ 0 & c_y & 0 & 0\\ 0 & 0 & m_1 & m_2\\ 0 & 0 & 1 & 0\\ \end{bmatrix} \cdot \begin{bmatrix} x\\ y\\ z\\ 1\\ \end{bmatrix} = \begin{bmatrix} c_x x\\ c_y y\\ m_1 z + m_2\\ z\\ \end{bmatrix}$

Note that this matrix is fixed as a constant after we constructed for the entire duration of the rendering. This is so that we can speed up the transformation in GPU (the same matrix will be used for all geometry coordinates). Therefore we can't just copy our $z$ into the matrix to save a $z$ value, since $z$ is geometry-dependent.

Explanation:

$c_x = \frac{z_{focal}}{2w_{vp}}$ : $z_{focal}$ is the focal length of the camera. $w_{vp}$ is the width of the view port (screen width).
$c_y = \frac{z_{focal}}{2h_{vp}}$ : $z_{focal}$ is the focal length of the camera. $h_{vp}$ is the height of the view port (screen width).
Bigger focal length (smaller angle) will result all coordinates stretched with respect to the origin. Therefore, more stuff will be clipped outside of $-1 \leq x \leq 1$ and $-1 \leq y \leq 1$ (or, to be more accurate $-w \leq x \leq w$ and $-w \leq y \leq w$ if we don't assume $w$ to always be $1$ since $(\forall c)(c\langle x, y, z, w\rangle = \langle x, y, z, w\rangle)$ ).
There are two steps in perspective projection, multiply by focal length and divide by $z$ . We do the first in this matrix multiplication, division by $z$ is done in later step. So this is only a half of perspective transformation.
Since we want our image to stay the same size when we scale the viewport, we need to divide by viewport $x, y$ . Wider viewport correspond to less object got clipped.
It is impossible to keep $z$ the original value after applying a fixed matrix. This is because the equation $m_1 z + m_2 = z^2$ (we want $z^2$ here because we have to divide by $z$ later, and we want the result after division still have a $z$ component.) only have two real solution. We have a choice to fix two $z$ values to the original $z$ value and other $z$ values will be transformed quadratically. We choose to fix $z = z_{near}, z = z_{far}$ . Solving the equation, we should set $m_1 = z_{far}+z_{near}, m_2 = -z_{far}z_{near}$ . The quadratic $z$ is desired since it gives us more resolution at $z$ closer to $0$ (near our camera). This is explanted in this video
Also note that since in view space, the camera is at origin, but we want to design our clipping box to be a unit cube of size $2w$ where $(0, 0, 0, 0)$ is at the center and extend by $1$ in all direction. However, we can't see what's in our back. So say our near/far plane in view space is $z_{near}, z_{far}$ , then we want to transform the $z$ value so that our near/far plane in the clip space to be $-z_{near}, z_{far}$ . This way, stuff behind will not remain in our viewport.

CAREFUL! In above formulas, some $z$ must be flipped since camera looks down to $-z$ axis instead of positive $z$ . Also make sure the view port aspect ratio is the same as view plane aspect ratio.

Focal Length: If we assume 1 unit focal length, then the angle would be 30 degree.

Instead of dividing by $-z$ (assuming we are facing $-z$ axis), if we need angle $\theta$ as view angle we need to divide by $-z \cdot \tan \theta$ . If we are dividing by $-z$ , we assume view angle to be $45^\circ$ .

Perspective Correct Interpolation

Vec3 barycentric = barycentric_coordinate(coord, va, vb, vc);
float z = 1.0f/barycentric_interpolate_2D(barycentric, a.inv_w, b.inv_w, c.inv_w);
frag.attributes[i] = z * barycentric_interpolate_2D(barycentric, a.attributes[i] * a.inv_w, b.attributes[i] * b.inv_w, c.attributes[i] * c.inv_w);

Watch this video

Clipping

Clipping: eliminating triangles not in view frustum to save rasterizing primitives

discarding fragments is expensive
make sense to toss out whole primitives

Clipping Half Overlapped Triangles by Splitting Shape to Triangles

Why near/far clipping planes?

hard to rasterize a vertices both in front and behind camera
we don't have infinite precision of depth buffer

Z-Fighting Effect is Larger with no Enough Precision in Depth Buffer (don't set near, far plane too big)

Non-Perspective Frustum Transformation: it is not perspective because it warpped space that aligns to camera's perspective

To again get perspective: copy $z$ to $w$ in homogeneous coordinate.

Full Perspective Matrix: takes view frustum and projection into account

Screen Transformation

Translate 2D viewing plane to pixel coordinates

reflect about x-axis
translate by $(1, 1)$
scale by $(W / 2, H / 2)$

Color Interpolation

Linear interpolation in 1D is Linear Combination of Two Functions

In 1D: since we have equation $\hat{f}(t) = (1 - t)f_i + tf_j$ , we can think of this as a linear combination of two functions.

Linear Interpolation in 2D:

we have a 2D function in 3D space as image above
the function is $\hat{f}(x, y) = ax + by + c$
to interpolate, we need to find coefficient such that the function matches the sample values at the sample points $\hat{f}(x_n, y_n) = f_n, n \in \{i, j, k\}$

$\begin{bmatrix} a\\ b\\ c\\ \end{bmatrix} = \frac{1}{(x_iy_i - x_iy_j)+(x_ky_j - x_jy_k)+(x_iy_k - x_ky_i)} \begin{bmatrix} f_i(y_k - y_j) + f_j(y_i - y_k) + f_k(y_j - y_i)\\ f_i(x_j - x_k) + f_j(x_k - x_i) + f_k(x_i - x_j)\\ f_i(x_ky_j - x_jy_k) + f_j(x_iy_k - x_ky_i) + f_k(x_jy_i - x_iy_j)\\ \end{bmatrix}$

Linear interpolation in 2D is Linear Combination of Three Functions

These picture describes the same as above complicated function.

Interpolate based on the three triangular area creased by a point in triangle

Barycentric Coordinates: $\phi_i(x), \phi_j(x), \phi_k(x)$

$\text{color}(x) = \text{color}(x_i)\phi_i + \text{color}(x_j)\phi_j + \text{color}(x_k)\phi_k$
it is used to interpolate attributes associated with vertices
the distance is already calculated in the triangular half-plane test (for example, to check whether point $P$ is in triangle $ABC$ , we check whether $\overrightarrow{AP} \times \overrightarrow{AB}$ is positive, and this check gives distance between line $AB$ and point $P$ that can be used as Barycentric Coordinates)

We should not interpolate in screen space, but in 3D space. How to solve this problem?

Perspective Incorrect Interpolation: compute barycentric coordinates using 2D coordinates leads to derivative discontinuity

Perspective Correct Interpolation: we interpolate attribute $\phi$

compute depth $z$ at each vertex
interpolate $\frac{1}{z}$ and $\frac{\phi}{z}$ using 2D barycentric coordinates to give perspective
divide interpolated $\frac{\phi}{z}$ by interpolated $\frac{1}{z}$

Texture Mapping

for color
for wetness attribute
for shinny
for normal map
for displacement mapping
for baked ambient occlusion
for reflection bulb
etc...

Given a model with UV:

for each pixel in rasterized image (screen space)
- interpolate $(u, v)$ coordinates accross triangle
- sample texture at interpolated $(u, v)$
- set color of fragment to sampled texture value

Sampled Texture Space might be warpped, it is hard to avoid aliasing

Magnification: camera too close to object

Problem: single pixel on screen maps to less than a pixel in texture
Solution: interpolate value at pixel center

Minification: camera too far to object

Problem: single pixel on screen maps to large region in texture
Solution: need to compute texture average over pixel (bug averaging kills performance)
Prefiltering: compute average in build time, not run time, we down sample texture in build time for multiple regions

Mip Map

Since averaging in large texture area is very costly, we tend to pre-compute mip map. Mip map is only smaller than original image by a factor of $2^2$ because this way we can ensure we only need the value of 4 pixels to compute the value of 1 pixel

Mip Map: a specific prefiltering technique

MIP map (L.Williams 83) store prefiltered image at every possible scale

Calculating MIP Map Level by estimating region cover using partial derivative

The result in above picture expresses $u, v$ in pixel coordinates $[0, W] \times [0, H]$ , not texture coordinate $[0, 1] \times [0, 1]$ . Imagine you have a very small base picture, but you look at it from far away, you would still show un-minified texture.

But we don't want jumps between MIP Map levels

Tri-linear Interpolation: interpolation after interpolation

Isotropic Filtering (Trilinear) vs. Anisotropic Filtering

Pipeline:

from screen space $(x, y)$ to barycentric coordinates
using barycentric coordinates to interpolate $(u, v)$ stored in vertices
approximate $\frac{du}{dx}, \frac{du}{dy}, \frac{dv}{dx}, \frac{dv}{dy}$ by taking differences of screen-adjacent samples and compute mip map level $d$
convert normalized texture coordinate $(u, v) \in [0, 1]$ to pixel locations in texture image $(U, V) \in [W, H]$ .
determine address of pixels for filter (8 neighbors for trilinear)
Load texels into local registers
tri-linear interpolation according to $(U, V, d)$
maybe anisotropic filtering

Table of Content