Sequence-to-sequence models

Compute interpretation

Encoder-decoder sequence modeling that used GPU-parallel recurrent workloads before the Transformer made dense attention dominant.

Supporting reading cards

Obsolete or less central under later compute

Track this only through linked reading cards; do not treat this method page as standalone evidence.