Hardware dataset
AI Accelerator Timeline
A hardware timeline for AI methods, from the GTX 580 used for AlexNet to Blackwell Ultra and Ironwood. The 16 entries track device class, memory, interconnect, and access model across mainstream AI accelerators.
Source: history/sources/data/mainstream_accelerator_eras.csv Citations checked against NVIDIA and Google primary sources.
NVIDIA GTX 580 3GB
2-GPU paper setup
paper anchor . AlexNet reports two GTX 580 3GB GPUs and identifies GPU memory as the limiting factor. Full board specs need a separate archived vendor/board source.
NVIDIA Tesla K40
single data-center GPU
widely adopted pre-HBM research GPU . Representative for 2014-2015 CUDA research servers before HBM and tensor cores. Figures follow NVIDIA's Volta comparison table GPU Boost peak convention; base-clock board datasheets can report lower p…
NVIDIA Tesla K80
dual-GPU data-center board / research cluster GPU
paper-verified mid-2010s research GPU . Added because local reading cards repeatedly identify K80 as a 2015-2016 training and inference comparison device. Numeric board parameters should not be used for quantitative plots until an archived …
NVIDIA Tesla P100
research cluster GPU
widely adopted Pascal/HBM GPU . The PCIe datasheet gives lower P100 figures; NVIDIA's Volta comparison table gives the SXM2-class peak figures.
Google Cloud TPU v2
Cloud TPU training accelerator / TPUv2 mesh
paper-verified Google Cloud TPU training anchor . Added because GPipe and Mesh TensorFlow both use TPUv2; keep exact chip-level peak conservative because current Google docs point to an ACM architecture paper for detailed specs.
NVIDIA Tesla V100 SXM2/SXM3
training GPU
widely adopted Volta tensor-core GPU . Primary 2017 anchor for Volta tensor-core multi-GPU training; later 2019-2020 V100 clusters support hyperscale dense LLM cards such as Megatron-LM and GPT-3.
Google Cloud TPU v3
TPU pod chip
Google TPU transformer-era anchor . Good anchor for BERT-era and early large transformer TPU training.
NVIDIA A100 40GB/80GB
training GPU
widely adopted Ampere LLM GPU . BF16 and 80 GB HBM2e made this the main open-report LLM training GPU for 2021-2023.
Google Cloud TPU v4
TPU pod chip
Google TPU pod training anchor . TPU v4 was deployed in 2020 and is a strong source for large-scale TPU interconnect/topology constraints.
NVIDIA H100 SXM
Hopper training/inference GPU
widely adopted Hopper frontier GPU . Transformer Engine and FP8 shift the bottleneck toward memory capacity, KV cache, and scale-up fabric.
Google Cloud TPU v5e
TPU cloud chip
Google Cloud cost-efficient TPU anchor . Useful anchor for economical training/inference, not the frontier-memory configuration.
Google Cloud TPU v5p
TPU pod chip
Google frontier TPU pod anchor . High-memory TPU generation for large dense and MoE training. Google comparison docs list FP8 at the same peak as BF16; do not infer an Ironwood-style 2x FP8/BF16 ratio.
NVIDIA H200 SXM
Hopper memory-expanded GPU
widely adopted memory-heavy inference GPU . Same broad compute class as H100, but HBM capacity/bandwidth changes long-context and inference economics.
Google Cloud TPU v6e Trillium
TPU cloud chip
Google TPU training/fine-tuning/serving anchor . Google positions v6e for transformer, text-to-image, CNN training, fine-tuning, and serving. The v6e page lists BF16 and INT8; the TPU7x comparison table also lists v6e FP8 at 918 TFLOPS.
NVIDIA HGX B200
8-GPU Blackwell system
frontier candidate Blackwell system . Treat as system-level because NVIDIA's current public HGX table reports B200 in 8-GPU units.
Google Cloud TPU7x Ironwood
TPU pod chip
frontier candidate TPU7x . Google calls TPU7x the latest Cloud TPU and positions it for dense and MoE training plus decode-heavy inference.
NVIDIA HGX B300
8-GPU Blackwell Ultra system
frontier candidate Blackwell Ultra system . Included for 2026 tracking, but most published research may still report H100/H200/A100 until B300 systems are broadly used. NVIDIA HGX lists FP4 as Sparse | Dense = 144 | 108, a 1.33:1 ratio, so this…