Speculative decoding

Compute interpretation

Inference acceleration method that trades extra draft-model compute for lower latency on large target models.

Supporting reading cards

Obsolete or less central under later compute

Track this only through linked reading cards; do not treat this method page as standalone evidence.