Speculative Decoding
An inference acceleration technique where a small, fast “draft” model generates multiple candidate tokens that a larger “verifier” model checks in parallel. Since the verifier can validate a batch of tokens faster than generating them one by one, this speeds up overall generation by 2–3x without changing the output quality. Speculative decoding is especially effective on consumer hardware where memory bandwidth is the bottleneck — it gets more useful work per memory read.