RLHF and preference optimization

Compute interpretation

Post-training method family that spends additional optimization and annotation compute to shape model behavior.

Supporting reading cards

Obsolete or less central under later compute

Track this only through linked reading cards; do not treat this method page as standalone evidence.