Generative media compute

Device/setup

GPU and TPU training setups for high-dimensional image/video/audio generation, with sampling cost and memory-heavy U-Nets/Transformers prominent.

Bottleneck

Stable generative training, high-resolution synthesis, latent-space efficiency, diffusion sampling steps, and multimodal data throughput.

Methods that fit

VAEs, GANs, DCGAN/pix2pix/CycleGAN/StyleGAN, DDPM/SDE diffusion, latent diffusion, DALL-E-style text-to-image, improved DDPM, and DiT adapt generation to available accelerator budgets.

Methods that became obsolete or less central

Pixel-space generation without compression and adversarial-only pipelines became less central where diffusion/latent/Transformer setups were easier to scale.

Representative papers

Rank	Year	Paper	Priority	Status
75	2020	Denoising Diffusion Probabilistic Models	5	downloaded / read_complete
76	2021	High-Resolution Image Synthesis with Latent Diffusion Models	5	downloaded / read_complete
77	2013	Auto-Encoding Variational Bayes	4	downloaded / read_complete
78	2014	Generative Adversarial Nets	4	downloaded / read_complete
79	2018	A Style-Based Generator Architecture for Generative Adversarial Networks	4	downloaded / read_complete
80	2020	Score-Based Generative Modeling through Stochastic Differential Equations	4	downloaded / read_complete
81	2021	Zero-Shot Text-to-Image Generation	4	downloaded / read_complete
82	2022	Scalable Diffusion Models with Transformers	4	downloaded / read_complete
83	2015	Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks	3	downloaded / read_complete
84	2016	Image-to-Image Translation with Conditional Adversarial Networks	3	downloaded / read_complete
85	2017	Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks	3	downloaded / read_complete
86	2021	Improved Denoising Diffusion Probabilistic Models	3	downloaded / read_complete
132	2026	Qwen3.5-Omni Technical Report	4	downloaded / read_complete

Open questions

Separate training-compute improvements from inference/sampling-compute improvements across media models.