Lecture 003

Speaker Presentation

Speaker: imisra

https://cvpr2022-tutorial-diffusion-models.github.io

Stable diffusion: latent might have leaking information of input image (beta scheduling: \beta_T matters), \beta_T = 1 should not have information (a bug in stable diffusion?). Noise also depend on resolution: bigger image has more signal, so more noise should be added?

Having T \neq 1 in diffusion design is kinda frequency decomposition of image (first we reconstruct low frequency detail then high frequency)

FID: Compare statistics on latent with Inception with Imagenet (problematic, OOD) for feudality. Text-condition uses CLIP embedding.

Other paper:

Superresolution: not as good in practice due to OOD. Most training on low resolution.

Text2Video-Zero

Make-A-Video

Localizing Object-level Shape Variations with Text-to-Image Diffusion Models

Table of Content