Transformation on camera is the same as transformation on all triangles
Rotation: inverse rotation is transpose
Translation: inverse translation is negative translation
Notice that all previous transformation from local, world, to view space is all done using matrix. The transformation from view space to clip space can also be done by matrix.
Local to World: transform geometry that is attached to other geometries to a world coordinate
World to View: transform the entire world so that camera is facing negative z
direction, at origin
View to Clip: w after this point don't have geometric meaning, so think it as an extra space to store some attribute instead.
z
to w
(if we don't do that, then will lose correct depth for interpolation and later perspective division by z
)z
so that near/far plane in w
cubex
and y
region that we suppose to see to [-w, w]
region using focal lengthx, y, z
with original w
z
is nicer to user for clipping than w
because all valid z
value now is in [-w, w]
, it also has nice property that things closer to us has better resolution which helpes depth buffer testing.w
to inv_w
x, y, z
by inv_w
(that stored original inverted z in it)Clip to NDC (Normalized Device Coordinate): We have x_d = x_c / w_c, y_d = y_c / w_c, z_d = z_c / w_c, but w_d = 1/w_c. This w_d is for perspective-correct interpolation. Note that the transformation is no longer matrix.
NDC to Screen: We have x_s = x_{width}(x_d + 1)/2 + x_{offset}, y_s = y_{height}(y_d + 1)/2 + y_{offset}, for mysterious reason, z_s = (z_d + 1) / 2 // QUESTION: why Q// QUESTION Rasterization and interpolation is done in Screen space. // QUESTION can 1/w be used for frame buffer? // TODO: z=1/z, w=z.
Here is a very good video to explain all the details of transformation after View Space.
Our 4x4
perspective transform matrix can be constructed like follow: Notice the matrix does not do the perspective projection (division by z).
/// Compute perspective projection matrix with
/// fov: vertical field of view (in degrees),
/// ar: aspect ratio (x/y)
/// n: near plane.
///
/// The camera is located at the origin looking down the -z axis
/// with y up and x right.
/// The far plane is at infinity.
///
/// This projection maps (x,y,z,1) to (x',y',z',w') such that:
/// - all visible points have w'>0 and (x',y',z')/w' in [-1,1]^2
/// - points on the near plane (z=-n) map to points with z'/w'= -1.0
/// - points on the far 'plane' (z=-inf) map to points with z'/w'= 1.0
/// - objects are closer if their mapped depth is lower
/// - w' must be z for perspective-correct interpolation
static Mat4 perspective(float fov, float ar, float n);
Note that this matrix is fixed as a constant after we constructed for the entire duration of the rendering. This is so that we can speed up the transformation in GPU (the same matrix will be used for all geometry coordinates). Therefore we can't just copy our z into the matrix to save a z value, since z is geometry-dependent.
Explanation:
c_x = \frac{z_{focal}}{2w_{vp}}: z_{focal} is the focal length of the camera. w_{vp} is the width of the view port (screen width).
c_y = \frac{z_{focal}}{2h_{vp}}: z_{focal} is the focal length of the camera. h_{vp} is the height of the view port (screen width).
Bigger focal length (smaller angle) will result all coordinates stretched with respect to the origin. Therefore, more stuff will be clipped outside of -1 \leq x \leq 1 and -1 \leq y \leq 1 (or, to be more accurate -w \leq x \leq w and -w \leq y \leq w if we don't assume w to always be 1 since (\forall c)(c\langle x, y, z, w\rangle = \langle x, y, z, w\rangle)).
There are two steps in perspective projection, multiply by focal length and divide by z. We do the first in this matrix multiplication, division by z is done in later step. So this is only a half of perspective transformation.
Since we want our image to stay the same size when we scale the viewport, we need to divide by viewport x, y. Wider viewport correspond to less object got clipped.
It is impossible to keep z the original value after applying a fixed matrix. This is because the equation m_1 z + m_2 = z^2 (we want z^2 here because we have to divide by z later, and we want the result after division still have a z component.) only have two real solution. We have a choice to fix two z values to the original z value and other z values will be transformed quadratically. We choose to fix z = z_{near}, z = z_{far}. Solving the equation, we should set m_1 = z_{far}+z_{near}, m_2 = -z_{far}z_{near}. The quadratic z is desired since it gives us more resolution at z closer to 0 (near our camera). This is explanted in this video
Also note that since in view space, the camera is at origin, but we want to design our clipping box to be a unit cube of size 2w where (0, 0, 0, 0) is at the center and extend by 1 in all direction. However, we can't see what's in our back. So say our near/far plane in view space is z_{near}, z_{far}, then we want to transform the z value so that our near/far plane in the clip space to be -z_{near}, z_{far}. This way, stuff behind will not remain in our viewport.
CAREFUL! In above formulas, some z must be flipped since camera looks down to -z axis instead of positive z. Also make sure the view port aspect ratio is the same as view plane aspect ratio.
Instead of dividing by -z (assuming we are facing -z axis), if we need angle \theta as view angle we need to divide by -z \cdot \tan \theta. If we are dividing by -z, we assume view angle to be 45\degree.
Vec3 barycentric = barycentric_coordinate(coord, va, vb, vc);
float z = 1.0f/barycentric_interpolate_2D(barycentric, a.inv_w, b.inv_w, c.inv_w);
frag.attributes[i] = z * barycentric_interpolate_2D(barycentric, a.attributes[i] * a.inv_w, b.attributes[i] * b.inv_w, c.attributes[i] * c.inv_w);
Watch this video
Clipping: eliminating triangles not in view frustum to save rasterizing primitives
discarding fragments is expensive
make sense to toss out whole primitives
Why near/far clipping planes?
hard to rasterize a vertices both in front and behind camera
we don't have infinite precision of depth buffer
To again get perspective: copy z to w in homogeneous coordinate.
Translate 2D viewing plane to pixel coordinates
In 1D: since we have equation \hat{f}(t) = (1 - t)f_i + tf_j, we can think of this as a linear combination of two functions.
Linear Interpolation in 2D:
we have a 2D function in 3D space as image above
the function is \hat{f}(x, y) = ax + by + c
to interpolate, we need to find coefficient such that the function matches the sample values at the sample points \hat{f}(x_n, y_n) = f_n, n \in \{i, j, k\}
These picture describes the same as above complicated function.
Barycentric Coordinates: \phi_i(x), \phi_j(x), \phi_k(x)
\text{color}(x) = \text{color}(x_i)\phi_i + \text{color}(x_j)\phi_j + \text{color}(x_k)\phi_k
it is used to interpolate attributes associated with vertices
the distance is already calculated in the triangular half-plane test (for example, to check whether point P is in triangle ABC, we check whether \overrightarrow{AP} \times \overrightarrow{AB} is positive, and this check gives distance between line AB and point P that can be used as Barycentric Coordinates)
Perspective Correct Interpolation: we interpolate attribute \phi
Texture Mapping
for color
for wetness attribute
for shinny
for normal map
for displacement mapping
for baked ambient occlusion
for reflection bulb
etc...
Given a model with UV:
Magnification: camera too close to object
Problem: single pixel on screen maps to less than a pixel in texture
Solution: interpolate value at pixel center
Minification: camera too far to object
Problem: single pixel on screen maps to large region in texture
Solution: need to compute texture average over pixel (bug averaging kills performance)
Prefiltering: compute average in build time, not run time, we down sample texture in build time for multiple regions
Since averaging in large texture area is very costly, we tend to pre-compute mip map. Mip map is only smaller than original image by a factor of 2^2 because this way we can ensure we only need the value of 4 pixels to compute the value of 1 pixel
Mip Map: a specific prefiltering technique
The result in above picture expresses u, v in pixel coordinates [0, W] \times [0, H], not texture coordinate [0, 1] \times [0, 1]. Imagine you have a very small base picture, but you look at it from far away, you would still show un-minified texture.
Pipeline:
Table of Content