NeRF

A good paper feed millions of researchers who can plagiarize some simple idea to publish papers.

CT Scan Reconstruction

CT Scan Reconstruction MRI Reconstruction Course Deep Learning for MRI MRI Basics Part 1 - Image Formation

It uses an analytical method relies on signal processing.

NeRF: Neural Radiance Fields for View Synthesis

NeRF Paper Reading: Youtube

Big Idea of NeRF: You get a bunch of images with labeled shooting directions. The algorithm gives you back a function that can be sampled to retrieve geometry information.

Big Idea of NeRF: You get a bunch of images with labeled shooting directions. The algorithm gives you back a function that can be sampled to retrieve geometry information.

NeRF's objective is to reconstruct 3D geometry using an array of images. However, the product of a NeRF is not an actual geometry, but a neural network that represent both the rendering function and a geometry.

Radiance Fields: It is a space in \mathbb{R}^3 such that each point contains a spherical function instead of containing a single value. This is essentially represent 3D volumes with 2D view-dependent appearance, basically geometry data viewed from different angle. It is a function of view direction on an object to color.

Approach

Input: ((\theta, \phi), (x, y, z)). Output: ((r, g, b), \alpha). Loss: difference between \int_D f(\cdot)_\alpha \cdot f(\cdot)_D and ground truth (r, g, b) where D is line along the ray.

Procedural:

  1. Prepare a bunch of images with shooting direction. First break each image down to pixels. So you have a bunch of pixels, each associated with a 3D vector represent shooting direction and pixel's rgb value.
  2. Randomly initialize weight of neural network. The network represents the rendering function (with viewing direction and pixel location as input, output color) with the geometry baked in. The network can be viewed as a generic function of radiance fields except for the extra parameter (x, y, z) and output \alpha for architecture (calculating the loss) purpose. We initialize the function by setting the function to a point in the function space.
  3. For each pixel, we feed the network with one viewing direction \mathbb{\theta, \phi} as constants integrate the network function along (x, y, z) \in \mathbb{D} (adding the result when input (x, y, z)) with estimated weights \alpha from output represent transparency to get a final color (r, g, b) output from the network. We take negative gradient after compare it with ground truth color.

// QUESTION: I kinda not sure how exactly to integrate \alpha.

We don't uniformly select voxel locations. We do two passes. The first pass is done with uniform voxel locations. And the second pass can be concentrated on the surface of the object.

However, if you just do that, the result will be poor, because for some reason, networks have hard time overfitting. Below is an example of a network trained with input (x, y) and output (r, g, b). The result is not great.

Sinusoid Encoding explained in "Fourier Features Let Networks Learn High Frequency"

Sinusoid Encoding explained in "Fourier Features Let Networks Learn High Frequency"

So the idea is to split the signal into different frequency layer to, in a sense, augment loss in higher frequency. This strategy can be found in transformers.

Further Readings: Fourier Features Let Networks Learn High Frequency

NTK: Infinite width fully connected layer, initialized with reasonable weights, and trained with infinite small steps. It is a good mathematical tool for giving good insights for fully connected layers.

Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields (ICCV 2021)

Mip-NeRF: It integrates the entire cone area by a weighted 3D gaussian-like distribution

Mip-NeRF: It integrates the entire cone area by a weighted 3D gaussian-like distribution

Big Idea: Deal with small, real-life image that has anti-aliasing and poor resolution. It also improve NeRF by not require camera to center at object's central location.

Mip-NeRF Sampling Result: It keeps low frequency while averages high frequency. This way we can calculate expected high frequency more accurately especially with non-uniform sampling.

Mip-NeRF Sampling Result: It keeps low frequency while averages high frequency. This way we can calculate expected high frequency more accurately especially with non-uniform sampling.

// TODO: what is blurpool

It use characteristics function to determine whether the point is in the cone. And then calculate the complex 3D gaussian expectation with fourier transformed coordinates. // TODO: think about math if you have time

NeRF++: Analyzing and Improving Neural Radiance Fields

Idea: the reason why MLP works is because the players serve as a prior to assume smoothness of color (and therefore shape) on geometry surface.

Data flow design of original NeRF

Data flow design of original NeRF

In original design of NeRF, the direction d is smooth with respect to color c more than position x with respect to color c. Therefore, putting input d in later layers help the accuracy of the model.

Background and Foreground Tradeoff in Original NeRF

Background and Foreground Tradeoff in Original NeRF

For 360 degree captures of unbounded scenes, NeRF’s parameterization of space either models only a portion of the scene, leading to significant artifacts in background elements (a), or models the full scene and suffers from an overall loss of detail due to finite sampling resolution (b).

So we separate NeRF (into two MLPs) to one for foreground and one for background.

Re-Parameterization for Background Scene

Re-Parameterization for Background Scene

We parameterize location encoding for background scene to encode space outside of the sphere as a 4D coordinate where the 4th coordinate 1/r decrease with distance. The idea is that the original sparse encoding for background becomes more dense and therefore more images can contribute to the color of far backgrounds to resolve background ambiguity.

Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields (CVPR 2022)

Re-Parameterization: shrink background but keep monotonicity in sphere of radius 2

Re-Parameterization: shrink background but keep monotonicity in sphere of radius 2

Parameterization: kinda Mip-NeRF + another way to do NeRF++

The first network only provide density

The first network only provide density

Convergence: Separate into two network but with one gradient, first for density and the second for density and color using first network's density to reduce training cost.

// QUESTION: don't quiet understand how this would work

Distillation: // TODO: didn't read

PlenOctrees for Real-time Rendering of Neural Radiance Fields

Background and Survey

Representing Geometry as Multo-plane Images

Representing Geometry as Multo-plane Images

Representing Geometry as Voxels

Representing Geometry as Voxels

Above methods is topology-free and can be rendered in real-time but is memory intensive (can't capture detailed resolution).

Coordinate-Based Neural Networks

Coordinate-Based Neural Networks

NeRFs can be sampled with arbitrary resolution since the function is continuous. However, they are slow to train and test. Methods to accelerate NeRF includes

  1. train on datasets of similar scenes
  2. skip empty region during testing (Neural Sparse Voxel Fields)
  3. decompose scene into smaller networks (Decomposed Radiance Fields)
  4. Quality Tradeoff (AutoInt)
  5. ...

Actual Work

Octree Geometry Compression

Octree Geometry Compression

Spherical harmonics: used to speed up the process of converting NeRF to PlenOctree. This is because we put view-dependent calculation to evaluation time instead of PlenOctree-convertion time. (also the model looks cleaner)

Comments:

Plenoxels: Radiance Fields without Neural Networks

Normalized device coordinates: ?

multi-sphere images: ?

Trilinear Interpolation is crucial: it converts discrete representation to a continuous one to minimize reconstruction loss.

// QUESTION: I don't understand how optimizing voxel coefficients, and regularization formula work, haven't read into it.

Comments:

Instant Neural Graphics Primitives with a Multiresolution Hash Encoding

Learned Hash Table

Learned Hash Table

Procedural:

  1. We integrate the location along the ray D the same way as in NeRF, however, for each point (x, y. z) we calculate its value by interpolation from its nearby vertex value (with course to fine multi-layer) that is randomly distributed in the hash.
  2. Since it is randomly distributed, we need a neural network to distribute avaliable hash cells to each vertices. This way, we can let the neural network deicide where the hash cell should be used in the geometry and also calculate RGBD target from interpolated latent space.

Morton Code

Morton Code

Morton Code

Morton code is used to map n dimensional space to linear space. Morton code defines a Z-shape space filler, which preserves n dimensional locality.

Oct-tree Using Morton Code

Oct-tree Using Morton Code

Assuming we already stored a 3D map where each cell is an int into a Morton Coded array, and we want to extract the int value in (x, y, z) = (5, 9, 1) = (0101b, 1001b, 0001b) then there is an easy method to know which array position we need to look up. We need to look up (010 001 000 111b). This is because from z, y, x, we extract the most significant bits in every dimension to the least significant bits.

Morton Code 3D Convertion: above is 3D coordinate and below is corresponding 1D coordinate

Morton Code 3D Convertion: above is 3D coordinate and below is corresponding 1D coordinate

Note that to invert morton code to 3D coordinate, we only need to code >> 0 for x, code >> 1 for y, and code >> 2 for z and pass to the same decoding function.

If for loop, we can write the code like this:

#include <stdint.h>
#include <limits.h>
using namespace std;

inline uint64_t mortonEncode_for(unsigned int x, unsigned int y, unsigned int z) {
  uint64_t answer = 0;
  for (uint64_t i = 0; i < (sizeof(uint64_t)* CHAR_BIT)/3; ++i) {
    answer |= ((x & ((uint64_t)1 << i)) << 2*i) | ((y & ((uint64_t)1 << i)) << (2*i + 1)) | ((z & ((uint64_t)1 << i)) << (2*i + 2));
  }
  return answer;
}

To achieve better performance, we could use magic bits:

#include <stdint.h>
#include <limits.h>
using namespace std;

// method to seperate bits from a given integer 3 positions apart
inline uint64_t splitBy3(unsigned int a){
  uint64_t x = a & 0x1fffff; // we only look at the first 21 bits
  x = (x | x << 32) & 0x1f00000000ffff; // shift left 32 bits, OR with self, and 00011111000000000000000000000000000000001111111111111111
  x = (x | x << 16) & 0x1f0000ff0000ff; // shift left 32 bits, OR with self, and 00011111000000000000000011111111000000000000000011111111
  x = (x | x << 8) & 0x100f00f00f00f00f; // shift left 32 bits, OR with self, and 0001000000001111000000001111000000001111000000001111000000000000
  x = (x | x << 4) & 0x10c30c30c30c30c3; // shift left 32 bits, OR with self, and 0001000011000011000011000011000011000011000011000011000100000000
  x = (x | x << 2) & 0x1249249249249249;
  return x;
}

inline uint64_t mortonEncode_magicbits(unsigned int x, unsigned int y, unsigned int z){
  uint64_t answer = 0;
  answer |= splitBy3(x) | splitBy3(y) << 1 | splitBy3(z) << 2;
  return answer;
}

or to use a giant table to achieve the best performance

#include <stdint.h>
#include <limits.h>
using namespace std;

static const uint32_t morton256_x[256] = {
0x00000000,
0x00000001, 0x00000008, 0x00000009, 0x00000040, 0x00000041, 0x00000048, 0x00000049, 0x00000200,
0x00000201, 0x00000208, 0x00000209, 0x00000240, 0x00000241, 0x00000248, 0x00000249, 0x00001000,
0x00001001, 0x00001008, 0x00001009, 0x00001040, 0x00001041, 0x00001048, 0x00001049, 0x00001200,
0x00001201, 0x00001208, 0x00001209, 0x00001240, 0x00001241, 0x00001248, 0x00001249, 0x00008000,
0x00008001, 0x00008008, 0x00008009, 0x00008040, 0x00008041, 0x00008048, 0x00008049, 0x00008200,
0x00008201, 0x00008208, 0x00008209, 0x00008240, 0x00008241, 0x00008248, 0x00008249, 0x00009000,
0x00009001, 0x00009008, 0x00009009, 0x00009040, 0x00009041, 0x00009048, 0x00009049, 0x00009200,
0x00009201, 0x00009208, 0x00009209, 0x00009240, 0x00009241, 0x00009248, 0x00009249, 0x00040000,
0x00040001, 0x00040008, 0x00040009, 0x00040040, 0x00040041, 0x00040048, 0x00040049, 0x00040200,
0x00040201, 0x00040208, 0x00040209, 0x00040240, 0x00040241, 0x00040248, 0x00040249, 0x00041000,
0x00041001, 0x00041008, 0x00041009, 0x00041040, 0x00041041, 0x00041048, 0x00041049, 0x00041200,
0x00041201, 0x00041208, 0x00041209, 0x00041240, 0x00041241, 0x00041248, 0x00041249, 0x00048000,
0x00048001, 0x00048008, 0x00048009, 0x00048040, 0x00048041, 0x00048048, 0x00048049, 0x00048200,
0x00048201, 0x00048208, 0x00048209, 0x00048240, 0x00048241, 0x00048248, 0x00048249, 0x00049000,
0x00049001, 0x00049008, 0x00049009, 0x00049040, 0x00049041, 0x00049048, 0x00049049, 0x00049200,
0x00049201, 0x00049208, 0x00049209, 0x00049240, 0x00049241, 0x00049248, 0x00049249, 0x00200000,
0x00200001, 0x00200008, 0x00200009, 0x00200040, 0x00200041, 0x00200048, 0x00200049, 0x00200200,
0x00200201, 0x00200208, 0x00200209, 0x00200240, 0x00200241, 0x00200248, 0x00200249, 0x00201000,
0x00201001, 0x00201008, 0x00201009, 0x00201040, 0x00201041, 0x00201048, 0x00201049, 0x00201200,
0x00201201, 0x00201208, 0x00201209, 0x00201240, 0x00201241, 0x00201248, 0x00201249, 0x00208000,
0x00208001, 0x00208008, 0x00208009, 0x00208040, 0x00208041, 0x00208048, 0x00208049, 0x00208200,
0x00208201, 0x00208208, 0x00208209, 0x00208240, 0x00208241, 0x00208248, 0x00208249, 0x00209000,
0x00209001, 0x00209008, 0x00209009, 0x00209040, 0x00209041, 0x00209048, 0x00209049, 0x00209200,
0x00209201, 0x00209208, 0x00209209, 0x00209240, 0x00209241, 0x00209248, 0x00209249, 0x00240000,
0x00240001, 0x00240008, 0x00240009, 0x00240040, 0x00240041, 0x00240048, 0x00240049, 0x00240200,
0x00240201, 0x00240208, 0x00240209, 0x00240240, 0x00240241, 0x00240248, 0x00240249, 0x00241000,
0x00241001, 0x00241008, 0x00241009, 0x00241040, 0x00241041, 0x00241048, 0x00241049, 0x00241200,
0x00241201, 0x00241208, 0x00241209, 0x00241240, 0x00241241, 0x00241248, 0x00241249, 0x00248000,
0x00248001, 0x00248008, 0x00248009, 0x00248040, 0x00248041, 0x00248048, 0x00248049, 0x00248200,
0x00248201, 0x00248208, 0x00248209, 0x00248240, 0x00248241, 0x00248248, 0x00248249, 0x00249000,
0x00249001, 0x00249008, 0x00249009, 0x00249040, 0x00249041, 0x00249048, 0x00249049, 0x00249200,
0x00249201, 0x00249208, 0x00249209, 0x00249240, 0x00249241, 0x00249248, 0x00249249
};

// pre-shifted table for Y coordinates (1 bit to the left)
static const uint32_t morton256_y[256] = {
0x00000000,
0x00000002, 0x00000010, 0x00000012, 0x00000080, 0x00000082, 0x00000090, 0x00000092, 0x00000400,
0x00000402, 0x00000410, 0x00000412, 0x00000480, 0x00000482, 0x00000490, 0x00000492, 0x00002000,
0x00002002, 0x00002010, 0x00002012, 0x00002080, 0x00002082, 0x00002090, 0x00002092, 0x00002400,
0x00002402, 0x00002410, 0x00002412, 0x00002480, 0x00002482, 0x00002490, 0x00002492, 0x00010000,
0x00010002, 0x00010010, 0x00010012, 0x00010080, 0x00010082, 0x00010090, 0x00010092, 0x00010400,
0x00010402, 0x00010410, 0x00010412, 0x00010480, 0x00010482, 0x00010490, 0x00010492, 0x00012000,
0x00012002, 0x00012010, 0x00012012, 0x00012080, 0x00012082, 0x00012090, 0x00012092, 0x00012400,
0x00012402, 0x00012410, 0x00012412, 0x00012480, 0x00012482, 0x00012490, 0x00012492, 0x00080000,
0x00080002, 0x00080010, 0x00080012, 0x00080080, 0x00080082, 0x00080090, 0x00080092, 0x00080400,
0x00080402, 0x00080410, 0x00080412, 0x00080480, 0x00080482, 0x00080490, 0x00080492, 0x00082000,
0x00082002, 0x00082010, 0x00082012, 0x00082080, 0x00082082, 0x00082090, 0x00082092, 0x00082400,
0x00082402, 0x00082410, 0x00082412, 0x00082480, 0x00082482, 0x00082490, 0x00082492, 0x00090000,
0x00090002, 0x00090010, 0x00090012, 0x00090080, 0x00090082, 0x00090090, 0x00090092, 0x00090400,
0x00090402, 0x00090410, 0x00090412, 0x00090480, 0x00090482, 0x00090490, 0x00090492, 0x00092000,
0x00092002, 0x00092010, 0x00092012, 0x00092080, 0x00092082, 0x00092090, 0x00092092, 0x00092400,
0x00092402, 0x00092410, 0x00092412, 0x00092480, 0x00092482, 0x00092490, 0x00092492, 0x00400000,
0x00400002, 0x00400010, 0x00400012, 0x00400080, 0x00400082, 0x00400090, 0x00400092, 0x00400400,
0x00400402, 0x00400410, 0x00400412, 0x00400480, 0x00400482, 0x00400490, 0x00400492, 0x00402000,
0x00402002, 0x00402010, 0x00402012, 0x00402080, 0x00402082, 0x00402090, 0x00402092, 0x00402400,
0x00402402, 0x00402410, 0x00402412, 0x00402480, 0x00402482, 0x00402490, 0x00402492, 0x00410000,
0x00410002, 0x00410010, 0x00410012, 0x00410080, 0x00410082, 0x00410090, 0x00410092, 0x00410400,
0x00410402, 0x00410410, 0x00410412, 0x00410480, 0x00410482, 0x00410490, 0x00410492, 0x00412000,
0x00412002, 0x00412010, 0x00412012, 0x00412080, 0x00412082, 0x00412090, 0x00412092, 0x00412400,
0x00412402, 0x00412410, 0x00412412, 0x00412480, 0x00412482, 0x00412490, 0x00412492, 0x00480000,
0x00480002, 0x00480010, 0x00480012, 0x00480080, 0x00480082, 0x00480090, 0x00480092, 0x00480400,
0x00480402, 0x00480410, 0x00480412, 0x00480480, 0x00480482, 0x00480490, 0x00480492, 0x00482000,
0x00482002, 0x00482010, 0x00482012, 0x00482080, 0x00482082, 0x00482090, 0x00482092, 0x00482400,
0x00482402, 0x00482410, 0x00482412, 0x00482480, 0x00482482, 0x00482490, 0x00482492, 0x00490000,
0x00490002, 0x00490010, 0x00490012, 0x00490080, 0x00490082, 0x00490090, 0x00490092, 0x00490400,
0x00490402, 0x00490410, 0x00490412, 0x00490480, 0x00490482, 0x00490490, 0x00490492, 0x00492000,
0x00492002, 0x00492010, 0x00492012, 0x00492080, 0x00492082, 0x00492090, 0x00492092, 0x00492400,
0x00492402, 0x00492410, 0x00492412, 0x00492480, 0x00492482, 0x00492490, 0x00492492
};

// Pre-shifted table for z (2 bits to the left)
static const uint32_t morton256_z[256] = {
0x00000000,
0x00000004, 0x00000020, 0x00000024, 0x00000100, 0x00000104, 0x00000120, 0x00000124, 0x00000800,
0x00000804, 0x00000820, 0x00000824, 0x00000900, 0x00000904, 0x00000920, 0x00000924, 0x00004000,
0x00004004, 0x00004020, 0x00004024, 0x00004100, 0x00004104, 0x00004120, 0x00004124, 0x00004800,
0x00004804, 0x00004820, 0x00004824, 0x00004900, 0x00004904, 0x00004920, 0x00004924, 0x00020000,
0x00020004, 0x00020020, 0x00020024, 0x00020100, 0x00020104, 0x00020120, 0x00020124, 0x00020800,
0x00020804, 0x00020820, 0x00020824, 0x00020900, 0x00020904, 0x00020920, 0x00020924, 0x00024000,
0x00024004, 0x00024020, 0x00024024, 0x00024100, 0x00024104, 0x00024120, 0x00024124, 0x00024800,
0x00024804, 0x00024820, 0x00024824, 0x00024900, 0x00024904, 0x00024920, 0x00024924, 0x00100000,
0x00100004, 0x00100020, 0x00100024, 0x00100100, 0x00100104, 0x00100120, 0x00100124, 0x00100800,
0x00100804, 0x00100820, 0x00100824, 0x00100900, 0x00100904, 0x00100920, 0x00100924, 0x00104000,
0x00104004, 0x00104020, 0x00104024, 0x00104100, 0x00104104, 0x00104120, 0x00104124, 0x00104800,
0x00104804, 0x00104820, 0x00104824, 0x00104900, 0x00104904, 0x00104920, 0x00104924, 0x00120000,
0x00120004, 0x00120020, 0x00120024, 0x00120100, 0x00120104, 0x00120120, 0x00120124, 0x00120800,
0x00120804, 0x00120820, 0x00120824, 0x00120900, 0x00120904, 0x00120920, 0x00120924, 0x00124000,
0x00124004, 0x00124020, 0x00124024, 0x00124100, 0x00124104, 0x00124120, 0x00124124, 0x00124800,
0x00124804, 0x00124820, 0x00124824, 0x00124900, 0x00124904, 0x00124920, 0x00124924, 0x00800000,
0x00800004, 0x00800020, 0x00800024, 0x00800100, 0x00800104, 0x00800120, 0x00800124, 0x00800800,
0x00800804, 0x00800820, 0x00800824, 0x00800900, 0x00800904, 0x00800920, 0x00800924, 0x00804000,
0x00804004, 0x00804020, 0x00804024, 0x00804100, 0x00804104, 0x00804120, 0x00804124, 0x00804800,
0x00804804, 0x00804820, 0x00804824, 0x00804900, 0x00804904, 0x00804920, 0x00804924, 0x00820000,
0x00820004, 0x00820020, 0x00820024, 0x00820100, 0x00820104, 0x00820120, 0x00820124, 0x00820800,
0x00820804, 0x00820820, 0x00820824, 0x00820900, 0x00820904, 0x00820920, 0x00820924, 0x00824000,
0x00824004, 0x00824020, 0x00824024, 0x00824100, 0x00824104, 0x00824120, 0x00824124, 0x00824800,
0x00824804, 0x00824820, 0x00824824, 0x00824900, 0x00824904, 0x00824920, 0x00824924, 0x00900000,
0x00900004, 0x00900020, 0x00900024, 0x00900100, 0x00900104, 0x00900120, 0x00900124, 0x00900800,
0x00900804, 0x00900820, 0x00900824, 0x00900900, 0x00900904, 0x00900920, 0x00900924, 0x00904000,
0x00904004, 0x00904020, 0x00904024, 0x00904100, 0x00904104, 0x00904120, 0x00904124, 0x00904800,
0x00904804, 0x00904820, 0x00904824, 0x00904900, 0x00904904, 0x00904920, 0x00904924, 0x00920000,
0x00920004, 0x00920020, 0x00920024, 0x00920100, 0x00920104, 0x00920120, 0x00920124, 0x00920800,
0x00920804, 0x00920820, 0x00920824, 0x00920900, 0x00920904, 0x00920920, 0x00920924, 0x00924000,
0x00924004, 0x00924020, 0x00924024, 0x00924100, 0x00924104, 0x00924120, 0x00924124, 0x00924800,
0x00924804, 0x00924820, 0x00924824, 0x00924900, 0x00924904, 0x00924920, 0x00924924
};

inline uint64_t mortonEncode_LUT(unsigned int x, unsigned int y, unsigned int z){
  uint64_t answer = 0;
  answer = morton256_z[(z >> 16) & 0xFF ] | // we start by shifting the third byte, since we only look at the first 21 bits
  morton256_y[(y >> 16) & 0xFF ] |
  morton256_x[(x >> 16) & 0xFF ];
  answer = answer << 48 | morton256_z[(z >> 8) & 0xFF ] | // shifting second byte
  morton256_y[(y >> 8) & 0xFF ] |
  morton256_x[(x >> 8) & 0xFF ];
  answer = answer << 24 |
  morton256_z[(z) & 0xFF ] | // first byte
  morton256_y[(y) & 0xFF ] |
  morton256_x[(x) & 0xFF ];
  return answer;
}

Here is the source code for octree:

// Expands a 10-bit integer into 30 bits
// by inserting 2 zeros after each bit.
__host__ __device__ inline uint32_t expand_bits(uint32_t v) {
  v = (v * 0x00010001u) & 0xFF0000FFu;
  v = (v * 0x00000101u) & 0x0F00F00Fu;
  v = (v * 0x00000011u) & 0xC30C30C3u;
  v = (v * 0x00000005u) & 0x49249249u;
  return v;
}

// Calculates a 30-bit Morton code for the
// given 3D point located within the unit cube [0,1].
__host__ __device__ inline uint32_t morton3D(uint32_t x, uint32_t y, uint32_t z) {
  uint32_t xx = expand_bits(x);
  uint32_t yy = expand_bits(y);
  uint32_t zz = expand_bits(z);
  return xx | (yy << 1) | (zz << 2);
}

__host__ __device__ inline uint32_t morton3D_invert(uint32_t x) {
  x = x               & 0x49249249;
  x = (x | (x >> 2))  & 0xc30c30c3;
  x = (x | (x >> 4))  & 0x0f00f00f;
  x = (x | (x >> 8))  & 0xff0000ff;
  x = (x | (x >> 16)) & 0x0000ffff;
  return x;
}

For details, read Out-of-Core Construction of Sparse Voxel Octrees and Morton encoding/decoding through bit interleaving: Implementations

Linear Congruential Generator

idx = ((i+step*n_elements) * 56924617 + j * 19349663 + 96925573) % (NERF_GRIDSIZE()*NERF_GRIDSIZE()*NERF_GRIDSIZE());

A linear congruential generator (LCG) is an algorithm that yields a sequence of pseudo-randomized numbers calculated with a discontinuous piecewise linear equation. The method represents one of the oldest and best-known pseudorandom number generator algorithms. The theory behind them is relatively easy to understand, and they are easily implemented and fast, especially on computer hardware which can provide modular arithmetic by storage-bit truncation. However, the statistical properties are bad.

Here, we don't actually care much about its statistical properties. Rather, we care about its property of producing a permutation: this use case distributes the density grid update samples more-or-less uniformly over space (due to the pseudo-random nature), but ensures good coverage by never visiting a grid cell twice without having visited all other cells (due to the permutation property).

MIT Vision and Graphics Seminar - Atlas Wang

Background: general modality models (transformers, MLP mixer, Perceiver), can they beat domain-specific architecture?

over-simplified rendering equation

over-simplified rendering equation

over-simplified spherical harmonics (NeRF for reflection materials: NeRF: integrate optical reflection model into volume rendering, Ref-NeRF: Efficient directional encoding for structured view dependence)

over-simplified spherical harmonics (NeRF for reflection materials: NeRF: integrate optical reflection model into volume rendering, Ref-NeRF: Efficient directional encoding for structured view dependence)

NeRF: data distillation by overfitting, cross-view interpolation

PixelNeRF & IBRNet: acquire RGBA of each point by weighted summing the image features of its 2D projections

MVSNeRF: predict RGBA of each point from a cost volume induced by MVSNet using 2 cameras

Side note: Transformer - attention is all you need replacing LSTM and early RNNs.

Generalizable NeRF Transformer (GNT):

// QUESTION: yes, the

// TODO: at 20min https://mit.zoom.us/rec/play/O-E4BZQZLc4km4Xd9EFXrMleMBPVoxK73HzZwo7iEmndSZb--QJXH

Tianshu Huang 10:36 PM Specifically last 5 minutes

And More

There are additional 9 articles in my reading list.

// QUESTION: Benchmark for Nvidia - how many images. and what is the training time.

Paper:

  1. breaking up images into pixel rays is clever. It creates more data and well used the assumption that each pixel shares the same rendering function (with 3d object baked into rendering function).
    1. can be break the rendering function even more by separating model part and true rendering function part? So like we can generalize the rendering function accross different models. And we freeze the rendering function part and only overfit the model part, so train less parameters for each new model.
  2. instead of "x" being center of ray, what if you let it be camera view position?
    1. if it were to be camera position, then there are data that is not relevent to the object, it decrease generalization for one voxel.
  3. there is no separation between "base color" and "resulting color". You are only asking for resulting color. So environment is baked into it, which is bad.
    1. the paper assume "opacity" to be voxel's internal property. Why not base color, specular, ect...? This might increase output space, hard for generalization.
    2. because the paper use voxel representation, most of voxels are "completely transparant", so the model will generalize better when you assume opacity is model's basic property
      1. is there a way to avoid voxel representation as most of them are blank? I mean, we can maybe shrink (regularize, or smooth) representation space to achieve better generalization?
  4. for a ray, do you query one point or all points? transparency issue?
    1. for opaque objects, if we do not use voxel representation, we might not need query every point
    2. for opaque objects, if we do use voxel representation, we only need to query everything until the first opaque voxel (this is given our representation is very well - it satisfy only stationary equation. but in training, we don't know for sure it is opaque or not, so you need a majority vote on opacity. But can we do a hard-coded majority vote first? because we are in 3d, if we imagine queying a transparent voxel outside of a cube, then 2 views will vote opaque and 4 will vote transparent. However, if you have a transparent voxel in a bowl, then all 6 will vote opaque. This is a problem.)
    3. The integration along the ray to get a color should be weighted by transparency. However, transparency is the thing we gonna train... Large batch size help?
  5. the overfitted model can't guess unseen voxels
  6. can we actually implement them into games, what problems will raise?
  7. What, the paper is ancient technology
  8. Why do you bake the rendering function? You could have directly optimize model?
  9. Tricks
    1. separate course, fine network
    2. cos decomposition of voxel representation

how many camera, how many gpu

Report

Report

  1. 16 cameras with uniform (semi-random) angle for hemisphere is minimum to get okay looking
  2. training time is independent of the number of camera, quality is dependent
  3. With 1 GPU, 3 seconds is enough to get okay looking. That means, with naive original implementation, we need at least 90 GPUs to achieve 30FPS if no optimization were done. As you may realize, this naive implementation is not practical. One intuitive next-step is to keep training on the same model with new dataset. This is what I am working on.

Hands on Experience without Code

Ideas and Questions

Questions

  1. what kind of sampling do we need for large open environment (I guess we cannot pre-define)
  2. game engine: alternative of octtree representation? (moving vs non-moving object, object changing shape?)

...

Video De-blur Methods Categories

Video De-blur Methods Categories

Table of Content