Memory-efficient attention

Compute interpretation

Attention implementations that make GPU memory hierarchy and IO traffic first-class constraints.

Supporting reading cards

Obsolete or less central under later compute

Track this only through linked reading cards; do not treat this method page as standalone evidence.