Qwen3.6-27B-MTP-UD-Q5_K_XL on my 7900XTX goes from 32 t/s to 50-72 t/s depending on the predictability of the task. So, a 1.5x increase on creative tasks up to a 2.2x increase on math.
MTP does not change the quality with the only cost being a few hundred MB extra VRAM usage. You will need to download a gguf model with MTP support to use it.
My parameters:
; Context memory usage
ctx-size = 65536
ctk = q8_0
ctv = q8_0
; Prompt processing speed
batch-size = 1024
ubatch-size = 1024
; Speculative decoding
np = 1
spec-type = draft-mtp
spec-draft-n-max = 3
Edit: did some more testing using Unsloth’s parameters and with spec-draft-n-max = 6 I can get up to 82 tk/s, a 2.56x increase, on the same math prompt. But this comes at the cost of the creative writing task that now falls below 40 tk/s.
It seems like this should be tweaked depending on the prompt similar to the sampling parameters.
Using MTP combined with tensor parallelism, I was able to go from running Qwen3.6 27b at ~7t/s to ~30t/s which I think is an insane boost (3x RTX 2000e Ada).
This does 18tps on 2x R9700:
[Qwen3.6-27B-Q8_0-Code-256K] m = /models/Qwen3.6-27B/Qwen3.6-27B-Q8_0.gguf mmproj = /models/Qwen3.6-27B/mmproj-BF16.gguf chat-template-kwargs = {"preserve_thinking": true} ctx-size = 262144 temp = 0.6 top-p = 0.95 top-k = 20 min-p = 0.0 presence-penalty = 0.0 repeat-penalty = 1.0This does 39tps on the same hardware:
[Qwen3.6-27B-MTP-Q8_0-Code-256K] m = /models/Qwen3.6-27B-MTP/Qwen3.6-27B-Q8_0.gguf mmproj = /models/Qwen3.6-27B-MTP/mmproj-BF16.gguf spec-type = draft-mtp spec-draft-n-max = 2 chat-template-kwargs = {"preserve_thinking": true} ctx-size = 262144 temp = 0.6 top-p = 0.95 top-k = 20 min-p = 0.0 presence-penalty = 0.0 repeat-penalty = 1.0😱
https://unsloth.ai/docs/models/qwen3.6#mtp-guide
Unsloth made a guide and has graphs with comparisons



