Skip to content

Benchmarks

Local LLM speed results across models, backends, hardware, and power profiles. Decode tok/s is the headline metric; latency, raw engine runs, and workload context stay visible in their own views.

1181 source rows835 matching source rowslatest run May 21, 2026schemas v1-v4source content/benchmarks/runs/
Filters
Advanced filters

Full row-level explorer. This is the place for raw shapes, hardware probes, cache/ppfix rows, dense power caps, and reruns.

4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 280 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-280wchat1
174.9
38ms5.7
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 270 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-270wchat1
173.1
38ms5.8
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 280 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-280wcodegen1
171.4
128ms5.8
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 260 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-260wchat1
171.3
37ms5.8
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 280 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-280wrag1
170.6
367ms5.9
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 270 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-270wcodegen1
169.6
125ms5.9
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 280 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-280wagent1
169.6
250ms5.9
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 280 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-280wagent4
169.6
4.42s5.9
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 250 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-250wchat1
169.2
37ms5.9
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 270 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-270wagent1
168.0
229ms6.0
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 270 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-270wrag1
168.0
361ms6.0
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 270 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-270wagent4
168.0
4.30s6.0
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 260 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-260wcodegen1
166.9
124ms6.0
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 260 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-260wrag1
166.7
340ms6.0
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 260 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-260wagent4
165.1
4.30s6.1
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 260 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-260wagent1
165.1
250ms6.1
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 250 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-250wcodegen1
164.9
131ms6.1
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 240 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-240wchat1
164.7
40ms6.1
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 250 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-250wrag1
164.6
328ms6.1
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 250 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-250wagent4
163.5
4.37s6.1
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 250 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-250wagent1
163.4
238ms6.1
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 240 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-240wcodegen1
162.1
117ms6.2
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 230 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-230wchat1
161.8
38ms6.2
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 240 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-240wrag1
161.3
389ms6.2
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 240 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-240wagent1
161.0
230ms6.2
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 220 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-220wchat1
161.0
39ms6.2
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 240 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-240wagent4
160.6
3.87s6.2
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 230 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-230wrag1
158.8
351ms6.3
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 230 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-230wcodegen1
158.4
120ms6.3
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 230 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-230wagent4
156.8
4.80s6.4
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 230 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-230wagent1
156.3
292ms6.4
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 210 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-210wchat1
153.6
38ms6.5
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 220 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-220wrag1
153.3
331ms6.5
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 220 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-220wagent4
152.6
4.69s6.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 220 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-220wcodegen1
152.1
131ms6.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 220 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-220wagent1
151.0
231ms6.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 210 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-210wrag1
146.3
344ms6.8
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-200wchat1
142.9
45ms7.0
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 210 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-210wcodegen1
142.4
129ms7.0
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 210 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-210wagent4
142.1
5.34s7.0
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 210 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-210wagent1
141.7
237ms7.1
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-200wrag1
133.5
340ms7.5
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-200wagent1
128.7
249ms7.8
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-200wagent4
128.2
5.41s7.8
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-200wcodegen1
127.9
127ms7.8
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 190 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-190wchat1
127.0
43ms7.9
FlashQ4_K_XLlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590
llama.cpp 59778f0 (cuda)baselinechat1
123.7
48ms8.1
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 190 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-190wrag1
119.6
383ms8.4
FlashQ4_K_XLlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590
llama.cpp 59778f0 (cuda)baselinecodegen1
119.3
116ms8.4
FlashQ4_K_XLlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590
llama.cpp 59778f0 (cuda)baselinerag1
118.9
206ms8.4
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 190 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-190wagent4
114.5
6.02s8.7
FlashQ4_K_XLlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590
llama.cpp 59778f0 (cuda)baselineagent1
114.5
237ms8.7
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 190 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-190wagent1
113.9
248ms8.8
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 190 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-190wcodegen1
112.9
122ms8.9
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 180 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-180wchat1
110.9
44ms9.0
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 180 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-180wrag1
104.2
370ms9.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 180 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-180wagent4
100.7
7.26s9.9
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 180 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-180wcodegen1
100.1
124ms10.0
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 180 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-180wagent1
100.0
300ms10.0
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 170 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-170wchat1
96.8
45ms10.3
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 170 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-170wrag1
89.8
370ms11.1
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 170 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-170wagent1
87.5
250ms11.4
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 170 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-170wagent4
85.7
8.30s11.7
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 170 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-170wcodegen1
85.2
127ms11.7
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 160 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-160wchat1
79.8
48ms12.5
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 160 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-160wrag1
75.4
374ms13.3
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 160 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-160wagent1
71.6
293ms14.0
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 160 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-160wagent4
71.5
9.85s14.0
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 160 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-160wcodegen1
70.4
123ms14.2
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 150 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-150wchat1
64.4
55ms15.5
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 150 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-150wrag1
58.2
448ms17.2
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 150 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-150wagent1
58.2
287ms17.2
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 150 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-150wcodegen1
58.1
131ms17.2
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 150 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-150wagent4
58.0
9.98s17.2
FlashQ4_K_XLlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590
llama.cpp 59778f0 (cuda)baselineagent4
55.1
1.05s18.2
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 140 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-140wrag1
52.6
429ms19.0
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 140 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-140wchat1
50.9
68ms19.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 140 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-140wcodegen1
46.3
131ms21.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 140 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-140wagent4
45.5
13.16s22.0
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 140 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-140wagent1
44.3
314ms22.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 130 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-130wchat1
35.6
106ms28.1
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 100 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-100wchat1
35.6
116ms28.1
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 120 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-120wchat1
35.6
117ms28.1
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 110 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-110wchat1
35.6
116ms28.1
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 130 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-130wcodegen1
35.3
137ms28.4
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 120 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-120wcodegen1
35.3
215ms28.4
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 100 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-100wcodegen1
35.3
210ms28.4
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 110 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-110wcodegen1
35.2
216ms28.4
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 130 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-130wagent1
35.2
299ms28.4
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 130 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-130wagent4
35.2
17.74s28.4
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 110 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-110wagent1
35.1
352ms28.5
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 110 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-110wagent4
35.1
19.35s28.5
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 110 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-110wrag1
35.1
563ms28.5
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 120 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-120wrag1
35.1
501ms28.5
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 130 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-130wrag1
35.1
470ms28.5
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 100 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-100wrag1
35.1
503ms28.5
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 100 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-100wagent4
35.1
18.29s28.5
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 120 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-120wagent1
35.0
359ms28.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 120 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-120wagent4
34.9
16.01s28.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 100 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-100wagent1
34.9
348ms28.6
Decode tok/s
Headline speed metric
TTFT / TPOT
Latency context
Raw vs workload
Separate comparison contracts
Notes badge key
hardware comparable

Use these rows for GPU-to-GPU comparisons when the model, quant, backend, driver family, power policy, and benchmark shape match closely.

stack comparable

Use these rows to compare a similar software stack. They are useful, but backend, server path, driver, cache, or power settings may still influence the number.

stack realistic

Treat these as real workload measurements, not pure hardware rankings. They include prompt mix, API/server overhead, cache behavior, and local software details.

legacyOlder workload harness row.
350 W capRecorded GPU power limit.
drv 590GPU driver branch.
reasoningReasoning-token model.
Metric guide
Decode tok/s - Generation rate. Raw rows come from the engine benchmark; API rows use token intervals when available.
TTFT - Time to first token. This includes prompt processing and server/API overhead.
TPOT / ITL - Time per output token after the first token. Lower is better.
Raw Engine - llama-bench style cases intended for hardware-normalized comparison across rigs.
Workload / API - Stack-realistic measurements that include backend, server, cache, driver, and prompt behavior.
Power badges - A cap badge shows the recorded power limit. The row metadata records the cap relative to the recorded max.