Skip to content

Benchmarks

Local LLM speed results across models, backends, hardware, and power profiles. Decode tok/s is the headline metric; latency, raw engine runs, and workload context stay visible in their own views.

1181 source rows835 matching source rowslatest run May 21, 2026schemas v1-v4source content/benchmarks/runs/
Filters
Advanced filters

Full row-level explorer. This is the place for raw shapes, hardware probes, cache/ppfix rows, dense power caps, and reruns.

4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 430 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-430wchat1
185.2
37ms5.4
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 410 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-410wchat1
185.1
38ms5.4
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 595
llama.cpp cuda-3e12fbd (cuda)baseline-pl-450w-595-r2chat1
185.0
26ms5.4
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 420 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-420wchat1
184.9
37ms5.4
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wchat1
184.7
38ms5.4
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 400 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-400wchat1
184.6
37ms5.4
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 440 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-440wchat1
184.6
39ms5.4
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 380 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-380wchat1
184.0
37ms5.4
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 595
llama.cpp cuda-3e12fbd (cuda)baseline-pl-350w-595-r2chat1
183.4
29ms5.5
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 370 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-370wchat1
183.3
37ms5.5
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 390 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-390wchat1
183.3
37ms5.5
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 360 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-360wchat1
183.2
39ms5.5
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wchat1
182.2
38ms5.5
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 340 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-340wchat1
181.9
37ms5.5
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 595
llama.cpp cuda-3e12fbd (cuda)baseline-pl-450w-595-r2codegen1
181.5
29ms5.5
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wcodegen1
181.3
129ms5.5
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 430 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-430wcodegen1
181.2
135ms5.5
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 420 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-420wcodegen1
181.2
143ms5.5
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 440 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-440wcodegen1
181.1
121ms5.5
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 410 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-410wcodegen1
181.0
120ms5.5
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 330 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-330wchat1
180.8
39ms5.5
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 400 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-400wcodegen1
180.8
125ms5.5
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wrag1
180.6
339ms5.5
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 380 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-380wcodegen1
180.4
130ms5.5
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 420 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-420wrag1
180.3
424ms5.5
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 595
llama.cpp cuda-3e12fbd (cuda)baseline-pl-450w-595-r2rag1
180.2
231ms5.5
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 370 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-370wcodegen1
180.2
129ms5.5
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 595
llama.cpp cuda-3e12fbd (cuda)baseline-pl-450w-595-r2agent1
180.1
203ms5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 410 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-410wrag1
180.1
333ms5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wagent1
180.1
223ms5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 320 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-320wchat1
180.0
42ms5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 390 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-390wcodegen1
180.0
131ms5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 430 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-430wrag1
180.0
326ms5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 440 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-440wagent1
179.9
226ms5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wagent4
179.8
3.87s5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 595
llama.cpp cuda-3e12fbd (cuda)baseline-pl-350w-595-r2codegen1
179.8
29ms5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 430 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-430wagent4
179.7
3.86s5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 595
llama.cpp cuda-3e12fbd (cuda)baseline-pl-450w-595-r2agent4
179.7
3.72s5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 380 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-380wrag1
179.7
348ms5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 440 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-440wrag1
179.7
354ms5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 430 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-430wagent1
179.7
223ms5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 420 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-420wagent4
179.6
4.07s5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 360 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-360wcodegen1
179.5
119ms5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 420 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-420wagent1
179.5
223ms5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 390 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-390wrag1
179.4
323ms5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 400 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-400wagent1
179.4
227ms5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 310 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-310wchat1
179.3
38ms5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 440 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-440wagent4
179.2
4.20s5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 410 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-410wagent1
179.2
251ms5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 400 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-400wagent4
179.2
4.00s5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 410 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-410wagent4
179.1
4.01s5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 390 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-390wagent1
179.1
246ms5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 380 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-380wagent1
179.0
221ms5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 390 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-390wagent4
179.0
4.31s5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 595
llama.cpp cuda-3e12fbd (cuda)baseline-pl-350w-595-r2rag1
178.9
235ms5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 380 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-380wagent4
178.7
3.80s5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 400 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-400wrag1
178.6
341ms5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wcodegen1
178.6
127ms5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 360 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-360wrag1
178.4
345ms5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 370 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-370wagent4
178.3
4.08s5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 340 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-340wcodegen1
178.3
125ms5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 370 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-370wagent1
178.2
234ms5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 370 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-370wrag1
178.2
382ms5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 360 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-360wagent4
178.1
4.10s5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 330 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-330wcodegen1
177.8
132ms5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 300 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-300wchat1
177.8
36ms5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wrag1
177.7
387ms5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wagent1
177.7
228ms5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 360 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-360wagent1
177.7
240ms5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 595
llama.cpp cuda-3e12fbd (cuda)baseline-pl-350w-595-r2agent1
177.4
218ms5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 340 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-340wrag1
177.3
338ms5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 595
llama.cpp cuda-3e12fbd (cuda)baseline-pl-350w-595-r2agent4
177.3
4.01s5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wagent4
177.1
4.16s5.6
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 340 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-340wagent4
176.9
4.17s5.7
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 320 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-320wcodegen1
176.7
131ms5.7
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 290 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-290wchat1
176.7
37ms5.7
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 340 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-340wagent1
176.7
262ms5.7
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 330 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-330wagent1
176.1
261ms5.7
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 330 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-330wagent4
176.1
4.24s5.7
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 330 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-330wrag1
176.0
339ms5.7
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 310 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-310wcodegen1
175.6
119ms5.7
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 320 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-320wrag1
175.5
330ms5.7
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 320 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-320wagent1
175.3
227ms5.7
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 320 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-320wagent4
175.0
4.13s5.7
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 300 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-300wcodegen1
174.4
120ms5.7
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 310 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-310wrag1
174.3
351ms5.7
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 310 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-310wagent4
173.5
4.22s5.8
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 310 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-310wagent1
173.3
251ms5.8
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 300 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-300wrag1
173.2
341ms5.8
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 300 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-300wagent4
173.0
4.26s5.8
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 290 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-290wcodegen1
172.9
127ms5.8
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 300 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-300wagent1
172.9
231ms5.8
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 290 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-290wrag1
172.2
343ms5.8
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 290 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-290wagent4
171.1
4.21s5.8
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 290 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-290wagent1
170.9
291ms5.9
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595
llama.cpp cuda-3e12fbd (cuda)baseline-pl-200w-595-r2chat1
142.9
41ms7.0
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595
llama.cpp cuda-3e12fbd (cuda)baseline-pl-200w-595-r2rag1
138.1
372ms7.2
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595
llama.cpp cuda-3e12fbd (cuda)baseline-pl-200w-595-r2agent1
128.4
246ms7.8
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595
llama.cpp cuda-3e12fbd (cuda)baseline-pl-200w-595-r2agent4
126.9
5.64s7.9
4b-itQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595
llama.cpp cuda-3e12fbd (cuda)baseline-pl-200w-595-r2codegen1
126.7
125ms7.9
Decode tok/s
Headline speed metric
TTFT / TPOT
Latency context
Raw vs workload
Separate comparison contracts
Notes badge key
hardware comparable

Use these rows for GPU-to-GPU comparisons when the model, quant, backend, driver family, power policy, and benchmark shape match closely.

stack comparable

Use these rows to compare a similar software stack. They are useful, but backend, server path, driver, cache, or power settings may still influence the number.

stack realistic

Treat these as real workload measurements, not pure hardware rankings. They include prompt mix, API/server overhead, cache behavior, and local software details.

legacyOlder workload harness row.
350 W capRecorded GPU power limit.
drv 590GPU driver branch.
reasoningReasoning-token model.
Metric guide
Decode tok/s - Generation rate. Raw rows come from the engine benchmark; API rows use token intervals when available.
TTFT - Time to first token. This includes prompt processing and server/API overhead.
TPOT / ITL - Time per output token after the first token. Lower is better.
Raw Engine - llama-bench style cases intended for hardware-normalized comparison across rigs.
Workload / API - Stack-realistic measurements that include backend, server, cache, driver, and prompt behavior.
Power badges - A cap badge shows the recorded power limit. The row metadata records the cap relative to the recorded max.