Oesnada

The day as a single line.

Only the stories with enough weight to bend the timeline.

Active Signal

Google boosts Gemma 4 speed with new MTP drafter models

Source: Ars Technica
Time: 5:13 PM
Weight: 94/100

Audio Brief

0:00 / 0:00

Google has introduced Multi-Token Prediction (MTP) drafter models for its Gemma 4 open AI series, significantly increasing local inference speeds. By employing a technique known as speculative decoding, these experimental models allow for up to a threefold increase in token generation speed.

The system works by using a lightweight drafter model to predict future tokens while the primary model verifies them in parallel, effectively utilizing compute cycles that would otherwise go unused during memory-intensive tasks. The performance gains vary across hardware configurations, with Google reporting speed increases of up to 3.1x on Pixel smartphones and 2.5x on Apple’s M4 silicon.

Original SourceX