Compute Bottlenecks Ledger
This ledger tracks cross-regime bottlenecks. Entries below are draft syntheses backed by reading cards or source reports; use the linked cards for evidence details.
| Bottleneck | Regime | Evidence | Method adaptation | Status |
|---|---|---|---|---|
| CPU-era optimization and feature scale | Pre-2012 CPU and statistical foundations | Support-vector networks, large-scale SGD | SVMs, backpropagation, and SGD fit smaller CPU-era datasets and feature pipelines before accelerator-scale dense training | card-backed draft |
| Dense convolution throughput and GPU memory | Single-GPU deep learning | AlexNet, VGG, GoogLeNet | CNNs, ReLU-style training, dropout, and compact convolution modules exploit commodity GPU dense arithmetic | card-backed draft |
| Training stability under depth and batch-size constraints | Multi-GPU dense training | ResNet, batch normalization, group normalization | Residual connections and normalization methods make deeper or distributed dense networks trainable | card-backed draft |
| Distributed communication and model fit | Multi-GPU dense training / hyperscale dense LLM training | ImageNet in 1 hour, Megatron-LM, ZeRO | Large-batch SGD, tensor parallelism, pipeline/model parallelism, and optimizer-state partitioning trade communication for feasible scale | card-backed draft |
| Accelerator-friendly dense matrix multiplication | TPU and accelerator Transformer era | Attention Is All You Need, BERT, T5, TPU datacenter analysis | Transformers and TPU-style workloads emphasize batched dense matmul and compiler-friendly layouts | card-backed draft |
| Compute/data/model allocation | Hyperscale dense LLM training | GPT-3, Scaling Laws, Chinchilla, PaLM | Scaling laws and compute-optimal training decide whether to spend budget on parameters, tokens, or longer training | card-backed draft |
| Conditional compute and sparse activation | Sparse and memory-efficient scaling | MoE, GShard, Switch Transformer | Mixture-of-experts increases total parameters while activating a sparse subset per token | card-backed draft |
| Attention IO and memory hierarchy | Sparse and memory-efficient scaling | FlashAttention, FlashAttention-2 | IO-aware exact attention tiles work through SRAM/HBM hierarchy and reduces materialized attention traffic | card-backed draft |
| Sampling cost and generative-model throughput | Generative media compute | DDPM, latent diffusion, DiT, StyleGAN | Diffusion, GAN, VAE, and autoregressive image models trade accelerator training throughput, latent compression, and sampling cost differently | card-backed draft |
| Inference-time allocation and behavior shaping | Inference-time compute and post-training | RAG, InstructGPT, chain-of-thought, ReAct | Retrieval, preference optimization, reasoning samples, and tool calls spend runtime or post-training compute after base-model pretraining | card-backed draft |
| Deployment memory and latency | Efficient and edge inference | distillation, MobileNet, LoRA, GPTQ, speculative decoding | Compression, efficient architectures, adapters, quantization, and draft-model decoding reduce serving cost or adaptation memory | card-backed draft |
| Hardware parameter anchors | Cross-regime | mainstream accelerator era map, paper compute device extraction | Source reports define the device, memory, interconnect, and pod/GPU scale used to interpret reading cards | sourced draft |