Parallelism and sharding

Compute interpretation

Model, data, pipeline, and optimizer-state partitioning methods that make large models fit distributed accelerator clusters.

Supporting reading cards

Obsolete or less central under later compute

Track this only through linked reading cards; do not treat this method page as standalone evidence.