Mixture of experts

Compute interpretation

Conditional compute architecture that increases parameter count without activating all weights per token.

Supporting reading cards

Obsolete or less central under later compute

Track this only through linked reading cards; do not treat this method page as standalone evidence.