Dreamfusion

This article is about DreamFusion-related 3D generation techniques.

SweetDreamer, DreamCraft3D, Text2Tex. Major problem is diversity, texture quality, efficiency.

3D:

!DreamCraft3D
!Z123++
!FocalDreamer: geometry editing, fine result
Consist3D: not as good, conditioned on pointcloud for consistency
Progressive3D: generation composition
!Wonder3D: impressive, generate normal map first
DreamStone: not as good
3D-GPT
!IPDreamer: line art conditioning for 3D generation
!ConsistNet: fine
SceneDreamer: generate terrain
DreamGaussian: bad result
PonderV2: realistic 3D pre-training
!Consistent123-L: better
!Consistent123-W: good (only does image generation, image is good, but 3D is bad, can't rely on just image consistent without thinking distillation)
!SyncDreamer: good result, no DMTet, 3D conv for mutiview
Uni3D: 3D task training framework
What Does Stable Diffusion Know about the 3D Scene: Diffusion understands 3D
!SweetDreamer: good result, mostly DMTet, but introduced normal map condition
TextField3D: 3D generation with variations
PointGPT: another Point-E
Cap3D: generating caption from 3D
!MVDiffusion: image generation
!Text2Tex: texture generation (not consistent)
DMRF: integrate NeRF with mesh that does light interactions
CityDreamer: GAN
!MVDream: yup

Diffusion:

MDM
FreeMask
Reverse Stable Diffusion
PaintSeg
State of the Art on Diffusion Models for Visual Computing

Video:

EvalCrafter

=========================================== All category are listed from newest on top

Technology

Outdated Score: 💩

Quality Score: ⭐

Impact Score: 💥

Detection and Segmentation

Paper:

PatchFusion: good depth estimation
Marigold: good depth estimation
MVGFormer: multi-view 3D pose estimation
OmniSeg3D: hierarchical 3D segmentation via contrastive learning. (⭐⭐)

Image Generation

Paper:

EfficientSAM: 20x speedup SAM (!)
X-Adapter: convert SD-LoRA into SDXL-LoRA (!)
Kandinsky 3.0: big tti model
SDXL-turbo: super speed SD (sd-turbo 100fps) ()
HiDiffusion: generate large 4096 images without re-training ()
ElasticDiffusion: training arbitrary scale diffusion ()
Diffusion360: generate 360 panoramic image. (💥)
ZipLoRA: merge content LoRA with style LoRA. (💥)
ConceptSliders: add concept slider to image diffusion.
UFOGen: one step image generation using diffusion and GAN initialized from Stable Diffusion
LCM: Latent Consistency Model. (⭐⭐💥💥)
IGN: one step generation that... only experimented on faces dataset.
TransFusion: I2I diffusion process by increase transparency of abnormal region, fixing the image. (💥💥)

Tool:

Krita+ComfyUI+LCM: in-paint, out-paint, generate, refine, controlnet, lcm, upscale, job-queue, history in Krita. (💥💥)
InstantLoRA: A workflow built upon IPAdapter, ClipVision to achieve LoRA quality with no training.

Video Generation

Paper:

Efficient3Dim: training speedup for Z123 (⭐💥)
MotionCtrl: video generation with very good camera control ()
WonderJourney: creating zoom out stack of images for video with camera control
PowerPaint: inpaint, outpaint, controlnet with one word (💥💥💥)
GenDeF: generate video by warping image (⭐⭐⭐)
SparseCtrl: depth or sketch guided video diffusion ()
AnimateAnyone: pose + image to video, with some success for back of human ()
Breathing Life Into Sketches Using Text-to-Video Priors: train a local and global displacement field parameterized to generate vector sketches using SDS from video diffusion models. (⭐💥)
Stable Video Diffusion: training an open-sourced video diffusion with multi-view capability. (⭐⭐💥💥)

3D Related

Generation Paper:

HyperDreamer: 3D generation with relighting, 3D segmentation editing (!)
LooseControl: time-consistent, image generation conditioned on coarse-box depth map ()
Inpaint3D: inpaint in 3D w/ 2D region, too narrow view degree (💩)
NeRFiller: inpaint in 3D w/ 3D region, looks great (💥💥)
Free3D: little more consistent than z123 ()
Cascade-Z123: similar bad quality as Z123 (💩)
DreamPropeller: 4.7x speedup SDS (!)
CustomNeRF: 3D scene editing (!)
ImageDream: image prompt MVDream ()
AvatarStudio: edit dynamic avatar nerf ()
4D-fy: text to 4D w/ GS (!)
SceneTex: in-door room texture generation ()
XCube: large scale generation with hierarchy voxel (⭐💥)
PF-LRM: create 3d with 2~4 photos and no pose info. (💩)
SuGaR: Gaussian Splatting to mesh in Blender. (💥💥)
LucidDreamer: Interval Score Matching (ISM) to counteract over-smoothing. (⭐⭐💥)
PDD: generate large scene coarse to fine. (💥)
CityDreamer: generating city using height map and building instance generator.
GaussianDiffusion: variational Gaussian splatting and Gaussian noise distribution. (💩💥💥)
MetaDreamer: 20 min generation with quality texture.
FastHuman: multiview images to mesh in minutes. (💩)
AdaptiveShells: Adaptive Shells for Efficient Neural Radiance Field Rendering. (⭐⭐💥💥)
TheChosenOne: a diffusion model that maps a prompt (and its variation) to the same identity. (💥💥)

NeRF paper:

ReconFusion: reconstruction w/ diffusion ()
SAGA: 3D gaussian segmentation (!)
GaussianGrouping: jointly reconstruct and segment w/ gaussian splatting (!)
LightGaussian: GS but 15x compressed, 200+ FPS ()
Compact3D: GS 20x compression Vector Quantization ()

Showreel:

POSE-GPT: creating pose using text (!)
Gal Yosef: photorealistic cartoon style modling collection.

Language Models

Tool:

LLM Studio: Discover, download, and run local LLMs.
gpt-fast: <1000 line of pytorch code quantization, low latency GPT

Paper:

GPT4Motion: ask GPT to create a blender scene using diffusion model and animate it. (💩)
NeuroPrompts: trained a language model to produce prompt (💩)
MagicPrompt: save as above, but better (⭐💥)
Hyperspace: A BitTorrent-like Network for AI

Life

L3 Lab at CMU: Recruiting PhD students for fall 2024.

Crazy Idea

scan the entire world! you scan it, you own it You need to define nerf file format 20 万部电影电视剧淘宝 IMDb 豆瓣 Top250 1T 动便盘机械便盘 3T

Commercial: luma's GS w/ react 3js: https://twitter.com/lumalabsai/status/1732447521039888718?s=12

Project: 3D Gaussian in Unity Graph: https://x.com/van_eng622/status/1733773482440507562?s=20 Pose to Video (swimming): https://twitter.com/moveai_/status/1733172947831902570?s=46 Video generation with lots of motion (Comfy): https://twitter.com/aiwarper/status/1733344112605384734?s=46 Meta’s relightbale gaussian avatar: https://www.bilibili.com/video/BV15a4y1d77t/?share_source=copy_web&vd_source=f78a38561d7567eb91e3b787c1f608d4 Directly paint texture in blender: https://twitter.com/faruqtawsif/status/1732109179274526968?s=46

TODO:

https://github.com/InstantID/InstantID
https://github.com/bytedance/ImageDream
https://github.com/Anttwo/SuGaR
https://github.com/postech-ami/paint-it
https://github.com/tzco/VolumeDiffusion
https://github.com/Sanster/lama-cleaner
https://github.com/OpenTexture/Paint3D
https://dev-discuss.pytorch.org/t/cudagraphs-in-pytorch-2-0/1428
https://github.com/eliphatfs/zerorf
https://smerf-3d.github.io/
https://github.com/horseee/DeepCache
https://github.com/isl-org/ZoeDepth
https://github.com/upfusion3d/upfusion
https://github.com/bmaltais/kohya_ss
https://github.com/bentoml/OneDiffusion
https://huggingface.co/stabilityai/stable-zero123
https://github.com/3DTopia/threefiner
https://github.com/3DTopia/3DTopia
https://github.com/LiheYoung/Depth-Anything
https://github.com/3DTopia/GPTEval3D
https://github.com/modelscope/richdreamer
https://github.com/modelscope/normal-depth-diffusion
https://github.com/lucidrains/meshgpt-pytorch
https://github.com/PKU-YuanGroup/repaint123
https://github.com/dcharatan/pixelsplat

A1111magic: https://github.com/lllyasviel/ControlNet-v1-1-nightly/issues/50 v1.1 problems: https://github.com/huggingface/diffusers/issues/3095#issuecomment-1516718198 shuffle and guess mode: https://github.com/huggingface/diffusers/issues/3251

Table of Content

Dreamfusion

Technology

Detection and Segmentation

Image Generation

Video Generation

3D Related

Language Models

Life

More Readings

Crazy Idea

TODO: