Speaker: imisra
https://cvpr2022-tutorial-diffusion-models.github.io
Stable diffusion: latent might have leaking information of input image (beta scheduling: \beta_T matters), \beta_T = 1 should not have information (a bug in stable diffusion?). Noise also depend on resolution: bigger image has more signal, so more noise should be added?
Having T \neq 1 in diffusion design is kinda frequency decomposition of image (first we reconstruct low frequency detail then high frequency)
FID: Compare statistics on latent with Inception with Imagenet (problematic, OOD) for feudality. Text-condition uses CLIP embedding.
Other paper:
Make A video: first public
Imagen-video: non-latent, (LAION-400M, internal, ) (http://cs231n.stanford.edu/slides/2023/lecture_15.pdf)
align your latent space from nvidia
preserve your own correlation from nvidia: mix of expert based on different timestep in image-diffusion
Superresolution: not as good in practice due to OOD. Most training on low resolution.
Table of Content