Sequence-to-sequence models
Compute interpretation
Encoder-decoder sequence modeling that used GPU-parallel recurrent workloads before the Transformer made dense attention dominant.
Supporting reading cards
- Neural Machine Translation by Jointly Learning to Align and Translate (2014,
single_gpu_deep_learning) - Sequence to Sequence Learning with Neural Networks (2014,
single_gpu_deep_learning) - Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation (2016,
multi_gpu_dense_training)
Obsolete or less central under later compute
Track this only through linked reading cards; do not treat this method page as standalone evidence.