Lecture 001

Basics of Diffusion

Advantage:

stable training
auto-regressive

Disadvantage:

overfitting

Modeling Distribution

VAE: capture lower bound of $p(x)$ result in blurry mena
Normalizing flow: exact $p(x)$ with restricted architecture
Diffusion: score function (zero divergence)

QUESTION: what is the ODE in the first slide

QUESTION: connecting score with noise

QUESTION: revert by tracing

Deterministic ODE: by Song Yang, $dx = -\frac{1}{2} \nabla_x \log p_t(x) dt$ # QUESTION: why is this the function

Langevin dynamics: add noise at the end # QUESTION: correct error: numeric error? why Langevin

QUESTION: do we fix t=1000 in warping the t?

Heun step: higher order and euclid steps??

cascaded diffusion models

QUESTION: why resampling and out of distribution VAE? because VAE trianed separately? ve-gans, ve-vae

QUESTION: deepfloyd is image diffusion, 3 models, LDM just code easy

QUESTION: in-painting why it need to train another model

Basics of GAN

Generative models are hard due to hard to specify loss function

L2 loss average color, can't generate non-blurry, colorful stuff
classification loss
perceptual loss

Measures of Distribution:

FID(W2-distance) -> Github: clean-fid
JSD(symmetric, worse to W2)
KL

Loss in practice: don't use original classification loss. HingeLoss, Least Square Loss + R1 penalty and W2 is great. Paper: Are All GANs Created Equal?

StyleGan: AdaIn(x) replace Batch Normalization, nowadays people use cross-attention instead

QUESTION: FID for conditioned model

upsampler in diffusion is GAN? transformer isn't compatible with GAN (we use L2 attention), GigaGAN uses filter banks

Comparison between different models: Sauer et al.

Table of Content