Generative media compute
Image and video generation depend on GPU throughput, denoising iteration cost, and latent-space compression.
13 papers
Regime 7 of 10
Generative media compute
Device/setup
GPU and TPU training setups for high-dimensional image/video/audio generation, with sampling cost and memory-heavy U-Nets/Transformers prominent.
Bottleneck
Stable generative training, high-resolution synthesis, latent-space efficiency, diffusion sampling steps, and multimodal data throughput.
Methods that fit
VAEs, GANs, DCGAN/pix2pix/CycleGAN/StyleGAN, DDPM/SDE diffusion, latent diffusion, DALL-E-style text-to-image, improved DDPM, and DiT adapt generation to available accelerator budgets.
Methods that became obsolete or less central
Pixel-space generation without compression and adversarial-only pipelines became less central where diffusion/latent/Transformer setups were easier to scale.
Representative papers
Open questions
- Separate training-compute improvements from inference/sampling-compute improvements across media models.