Advantage:
stable training
auto-regressive
Disadvantage:
Modeling Distribution
VAE: capture lower bound of p(x) result in blurry mena
Normalizing flow: exact p(x) with restricted architecture
Diffusion: score function (zero divergence)
Deterministic ODE: by Song Yang, dx = -\frac{1}{2} \nabla_x \log p_t(x) dt # QUESTION: why is this the function
Langevin dynamics: add noise at the end # QUESTION: correct error: numeric error? why Langevin
Heun step: higher order and euclid steps??
cascaded diffusion models
Generative models are hard due to hard to specify loss function
L2 loss average color, can't generate non-blurry, colorful stuff
classification loss
perceptual loss
Measures of Distribution:
FID(W2-distance) -> Github: clean-fid
JSD(symmetric, worse to W2)
KL
Loss in practice: don't use original classification loss. HingeLoss, Least Square Loss + R1 penalty and W2 is great. Paper: Are All GANs Created Equal?
StyleGan: AdaIn(x) replace Batch Normalization, nowadays people use cross-attention instead
upsampler in diffusion is GAN? transformer isn't compatible with GAN (we use L2 attention), GigaGAN uses filter banks
Comparison between different models: Sauer et al.
Table of Content