# Lecture_006_Perspective_Projection_and_Texture_Mapping

## Transformations

Transformation on camera is the same as transformation on all triangles

• Rotation: inverse rotation is transpose

• Translation: inverse translation is negative translation

## Perspective Projection

Notice that all previous transformation from local, world, to view space is all done using matrix. The transformation from view space to clip space can also be done by matrix.

• Local to World: transform geometry that is attached to other geometries to a world coordinate

• World to View: transform the entire world so that camera is facing negative z direction, at origin

• View to Clip: $w$ after this point don't have geometric meaning, so think it as an extra space to store some attribute instead.

• do a perspective matrix (see the matrix below and its invariants)
• copy original z to w (if we don't do that, then will lose correct depth for interpolation and later perspective division by z)
• non-linearly scale z so that near/far plane in w cube
• shrink the x and y region that we suppose to see to [-w, w] region using focal length
• clip: during clipping, we clip x, y, z with original w
• z is nicer to user for clipping than w because all valid z value now is in [-w, w], it also has nice property that things closer to us has better resolution which helpes depth buffer testing.
• after clipping: make w to inv_w
• multiply only x, y, z by inv_w (that stored original inverted z in it)
• rasterization happens
• Clip to NDC (Normalized Device Coordinate): We have $x_d = x_c / w_c, y_d = y_c / w_c, z_d = z_c / w_c$, but $w_d = 1/w_c$. This $w_d$ is for perspective-correct interpolation. Note that the transformation is no longer matrix.

• NDC to Screen: We have $x_s = x_{width}(x_d + 1)/2 + x_{offset}, y_s = y_{height}(y_d + 1)/2 + y_{offset}$, for mysterious reason, $z_s = (z_d + 1) / 2$ // QUESTION: why Q// QUESTION Rasterization and interpolation is done in Screen space. // QUESTION can 1/w be used for frame buffer? // TODO: z=1/z, w=z.

Here is a very good video to explain all the details of transformation after View Space.

Our 4x4 perspective transform matrix can be constructed like follow: Notice the matrix does not do the perspective projection (division by $z$).

/// Compute perspective projection matrix with
///  fov: vertical field of view (in degrees),
///   ar: aspect ratio (x/y)
///    n: near plane.
///
/// The camera is located at the origin looking down the -z axis
///  with y up and x right.
/// The far plane is at infinity.
///
/// This projection maps (x,y,z,1) to (x',y',z',w') such that:
///  - all visible points have w'>0 and (x',y',z')/w' in [-1,1]^2
///  - points on the near plane (z=-n) map to points with z'/w'= -1.0
///  - points on the far 'plane' (z=-inf) map to points with z'/w'= 1.0
///  - objects are closer if their mapped depth is lower
///  - w' must be z for perspective-correct interpolation
static Mat4 perspective(float fov, float ar, float n);

\begin{bmatrix} c_x & 0 & 0 & 0\\ 0 & c_y & 0 & 0\\ 0 & 0 & m_1 & m_2\\ 0 & 0 & 1 & 0\\ \end{bmatrix} \cdot \begin{bmatrix} x\\ y\\ z\\ 1\\ \end{bmatrix} = \begin{bmatrix} c_x x\\ c_y y\\ m_1 z + m_2\\ z\\ \end{bmatrix}

Note that this matrix is fixed as a constant after we constructed for the entire duration of the rendering. This is so that we can speed up the transformation in GPU (the same matrix will be used for all geometry coordinates). Therefore we can't just copy our $z$ into the matrix to save a $z$ value, since $z$ is geometry-dependent.

Explanation:

• $c_x = \frac{z_{focal}}{2w_{vp}}$: $z_{focal}$ is the focal length of the camera. $w_{vp}$ is the width of the view port (screen width).

• $c_y = \frac{z_{focal}}{2h_{vp}}$: $z_{focal}$ is the focal length of the camera. $h_{vp}$ is the height of the view port (screen width).

• Bigger focal length (smaller angle) will result all coordinates stretched with respect to the origin. Therefore, more stuff will be clipped outside of $-1 \leq x \leq 1$ and $-1 \leq y \leq 1$ (or, to be more accurate $-w \leq x \leq w$ and $-w \leq y \leq w$ if we don't assume $w$ to always be $1$ since $(\forall c)(c\langle x, y, z, w\rangle = \langle x, y, z, w\rangle)$).

• There are two steps in perspective projection, multiply by focal length and divide by $z$. We do the first in this matrix multiplication, division by $z$ is done in later step. So this is only a half of perspective transformation.

• Since we want our image to stay the same size when we scale the viewport, we need to divide by viewport $x, y$. Wider viewport correspond to less object got clipped.

• It is impossible to keep $z$ the original value after applying a fixed matrix. This is because the equation $m_1 z + m_2 = z^2$ (we want $z^2$ here because we have to divide by $z$ later, and we want the result after division still have a $z$ component.) only have two real solution. We have a choice to fix two $z$ values to the original $z$ value and other $z$ values will be transformed quadratically. We choose to fix $z = z_{near}, z = z_{far}$. Solving the equation, we should set $m_1 = z_{far}+z_{near}, m_2 = -z_{far}z_{near}$. The quadratic $z$ is desired since it gives us more resolution at $z$ closer to $0$ (near our camera). This is explanted in this video

• Also note that since in view space, the camera is at origin, but we want to design our clipping box to be a unit cube of size $2w$ where $(0, 0, 0, 0)$ is at the center and extend by $1$ in all direction. However, we can't see what's in our back. So say our near/far plane in view space is $z_{near}, z_{far}$, then we want to transform the $z$ value so that our near/far plane in the clip space to be $-z_{near}, z_{far}$. This way, stuff behind will not remain in our viewport.

CAREFUL! In above formulas, some $z$ must be flipped since camera looks down to $-z$ axis instead of positive $z$. Also make sure the view port aspect ratio is the same as view plane aspect ratio. Focal Length: If we assume 1 unit focal length, then the angle would be 30 degree.

Instead of dividing by $-z$ (assuming we are facing $-z$ axis), if we need angle $\theta$ as view angle we need to divide by $-z \cdot \tan \theta$. If we are dividing by $-z$, we assume view angle to be $45^\circ$.

## Perspective Correct Interpolation

Vec3 barycentric = barycentric_coordinate(coord, va, vb, vc);
float z = 1.0f/barycentric_interpolate_2D(barycentric, a.inv_w, b.inv_w, c.inv_w);
frag.attributes[i] = z * barycentric_interpolate_2D(barycentric, a.attributes[i] * a.inv_w, b.attributes[i] * b.inv_w, c.attributes[i] * c.inv_w);


Watch this video

## Clipping

Clipping: eliminating triangles not in view frustum to save rasterizing primitives

• make sense to toss out whole primitives

Why near/far clipping planes?

• hard to rasterize a vertices both in front and behind camera

• we don't have infinite precision of depth buffer Z-Fighting Effect is Larger with no Enough Precision in Depth Buffer (don't set near, far plane too big) Non-Perspective Frustum Transformation: it is not perspective because it warpped space that aligns to camera's perspective

To again get perspective: copy $z$ to $w$ in homogeneous coordinate.

## Screen Transformation

Translate 2D viewing plane to pixel coordinates

2. translate by $(1, 1)$
3. scale by $(W / 2, H / 2)$

## Color Interpolation Color Interpolation: we want to use vertex color to interpolate color inside triangle

In 1D: since we have equation $\hat{f}(t) = (1 - t)f_i + tf_j$, we can think of this as a linear combination of two functions.

Linear Interpolation in 2D:

• we have a 2D function in 3D space as image above

• the function is $\hat{f}(x, y) = ax + by + c$

• to interpolate, we need to find coefficient such that the function matches the sample values at the sample points $\hat{f}(x_n, y_n) = f_n, n \in \{i, j, k\}$

\begin{bmatrix} a\\ b\\ c\\ \end{bmatrix} = \frac{1}{(x_iy_i - x_iy_j)+(x_ky_j - x_jy_k)+(x_iy_k - x_ky_i)} \begin{bmatrix} f_i(y_k - y_j) + f_j(y_i - y_k) + f_k(y_j - y_i)\\ f_i(x_j - x_k) + f_j(x_k - x_i) + f_k(x_i - x_j)\\ f_i(x_ky_j - x_jy_k) + f_j(x_iy_k - x_ky_i) + f_k(x_jy_i - x_iy_j)\\ \end{bmatrix}

These picture describes the same as above complicated function. Interpolate based on the three triangular area creased by a point in triangle

Barycentric Coordinates: $\phi_i(x), \phi_j(x), \phi_k(x)$

• $\text{color}(x) = \text{color}(x_i)\phi_i + \text{color}(x_j)\phi_j + \text{color}(x_k)\phi_k$

• it is used to interpolate attributes associated with vertices

• the distance is already calculated in the triangular half-plane test (for example, to check whether point $P$ is in triangle $ABC$, we check whether $\overrightarrow{AP} \times \overrightarrow{AB}$ is positive, and this check gives distance between line $AB$ and point $P$ that can be used as Barycentric Coordinates) We should not interpolate in screen space, but in 3D space. How to solve this problem?

Perspective Correct Interpolation: we interpolate attribute $\phi$

1. compute depth $z$ at each vertex
2. interpolate $\frac{1}{z}$ and $\frac{\phi}{z}$ using 2D barycentric coordinates to give perspective
3. divide interpolated $\frac{\phi}{z}$ by interpolated $\frac{1}{z}$

### Texture Mapping

Texture Mapping

• for color

• for wetness attribute

• for shinny

• for normal map

• for displacement mapping

• for baked ambient occlusion

• for reflection bulb

• etc...

Given a model with UV:

• for each pixel in rasterized image (screen space)
• interpolate $(u, v)$ coordinates accross triangle
• sample texture at interpolated $(u, v)$
• set color of fragment to sampled texture value Sampled Texture Space might be warpped, it is hard to avoid aliasing

Magnification: camera too close to object

• Problem: single pixel on screen maps to less than a pixel in texture

• Solution: interpolate value at pixel center

Minification: camera too far to object

• Problem: single pixel on screen maps to large region in texture

• Solution: need to compute texture average over pixel (bug averaging kills performance)

• Prefiltering: compute average in build time, not run time, we down sample texture in build time for multiple regions

### Mip Map

Since averaging in large texture area is very costly, we tend to pre-compute mip map. Mip map is only smaller than original image by a factor of $2^2$ because this way we can ensure we only need the value of 4 pixels to compute the value of 1 pixel

Mip Map: a specific prefiltering technique MIP map (L.Williams 83) store prefiltered image at every possible scale Calculating MIP Map Level by estimating region cover using partial derivative

The result in above picture expresses $u, v$ in pixel coordinates $[0, W] \times [0, H]$, not texture coordinate $[0, 1] \times [0, 1]$. Imagine you have a very small base picture, but you look at it from far away, you would still show un-minified texture.

1. from screen space $(x, y)$ to barycentric coordinates
2. using barycentric coordinates to interpolate $(u, v)$ stored in vertices
3. approximate $\frac{du}{dx}, \frac{du}{dy}, \frac{dv}{dx}, \frac{dv}{dy}$ by taking differences of screen-adjacent samples and compute mip map level $d$
4. convert normalized texture coordinate $(u, v) \in [0, 1]$ to pixel locations in texture image $(U, V) \in [W, H]$.
7. tri-linear interpolation according to $(U, V, d)$