Skip to content

Benchmarks

Local LLM speed results across models, backends, hardware, and power profiles. Decode tok/s is the headline metric; latency, raw engine runs, and workload context stay visible in their own views.

1181 source rows835 matching source rowslatest run May 21, 2026schemas v1-v4source content/benchmarks/runs/
Filters
Advanced filters

Full row-level explorer. This is the place for raw shapes, hardware probes, cache/ppfix rows, dense power caps, and reruns.

350MQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wchat1
940.7
9ms1.1
350MQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wcodegen1
936.3
14ms1.1
350MQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wagent1
935.5
15ms1.1
350MQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wagent4
935.5
279ms1.1
350MQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wcodegen1
934.6
14ms1.1
350MQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wagent4
928.5
275ms1.1
350MQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wagent1
927.6
17ms1.1
350MQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wchat1
925.1
8ms1.1
350MQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wrag1
924.2
22ms1.1
350MQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wrag1
922.5
18ms1.1
350MQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590
llama.cpp 59778f0 (cuda)baselinechat1
845.3
9ms1.2
350MQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590
llama.cpp 59778f0 (cuda)baselinecodegen1
836.8
12ms1.2
350MQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590
llama.cpp 59778f0 (cuda)baselinerag1
822.4
23ms1.2
350MQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590
llama.cpp 59778f0 (cuda)baselineagent1
821.0
28ms1.2
350MQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590
llama.cpp 59778f0 (cuda)baselineagent4
477.6
206ms2.1
8B-A1BQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wchat1
439.9
21ms2.3
8B-A1BQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wcodegen1
438.0
34ms2.3
8B-A1BQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wrag1
436.7
52ms2.3
8B-A1BQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wagent1
435.2
22ms2.3
8B-A1BQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wchat1
433.7
21ms2.3
8B-A1BQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wagent4
432.0
1.53s2.3
8B-A1BQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wcodegen1
430.3
33ms2.3
8B-A1BQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wagent1
428.3
23ms2.3
8B-A1BQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wagent4
427.2
1.61s2.3
8B-A1BQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wrag1
425.0
53ms2.4
8B-A1BQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590
llama.cpp 59778f0 (cuda)baselinechat1
351.9
24ms2.8
8B-A1BQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590
llama.cpp 59778f0 (cuda)baselinerag1
344.2
62ms2.9
8B-A1BQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590
llama.cpp 59778f0 (cuda)baselinecodegen1
339.1
39ms2.9
8B-A1BQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590
llama.cpp 59778f0 (cuda)baselineagent1
333.6
73ms3.0
30B-A3B-InstructthinkQ4_K_XLlegacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wchat1
203.0
86ms4.9
30B-A3B-InstructthinkQ4_K_XLlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wchat1
201.7
89ms5.0
30B-A3B-InstructthinkQ4_K_XLlegacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wcodegen1
199.8
125ms5.0
30B-A3B-InstructthinkQ4_K_XLlegacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wrag1
198.9
205ms5.0
30B-A3B-InstructthinkQ4_K_XLlegacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wagent1
198.6
157ms5.0
30B-A3B-InstructthinkQ4_K_XLlegacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wagent4
198.1
2.46s5.0
30B-A3B-InstructthinkQ4_K_XLlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wcodegen1
197.8
120ms5.1
30B-A3B-InstructthinkQ4_K_XLlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wrag1
196.9
236ms5.1
30B-A3B-InstructthinkQ4_K_XLlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wagent4
196.1
2.34s5.1
30B-A3B-InstructthinkQ4_K_XLlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wagent1
196.0
164ms5.1
30B-A3B-ReasoningthinkQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wchat1
183.9
159ms5.4
30B-A3B-ReasoningthinkQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wrag1
183.4
291ms5.5
30B-A3B-ReasoningthinkQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wcodegen1
183.1
191ms5.5
30B-A3B-ReasoningthinkQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wagent1
182.5
208ms5.5
30B-A3B-ReasoningthinkQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wagent4
181.9
4.83s5.5
30B-A3B-ReasoningthinkQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wchat1
181.8
145ms5.5
30B-A3B-ReasoningthinkQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wrag1
181.0
328ms5.5
30B-A3B-ReasoningthinkQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wcodegen1
180.8
204ms5.5
30B-A3B-ReasoningthinkQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wagent1
180.2
189ms5.5
30B-A3B-ReasoningthinkQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wagent4
180.1
4.68s5.6
30B-A3B-InstructthinkQ4_K_XLlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590
llama.cpp 59778f0 (cuda)baselinechat1
165.8
72ms6.0
30B-A3B-InstructthinkQ4_K_XLlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590
llama.cpp 59778f0 (cuda)baselinerag1
161.0
220ms6.2
30B-A3B-InstructthinkQ4_K_XLlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590
llama.cpp 59778f0 (cuda)baselinecodegen1
157.3
165ms6.4
30B-A3B-ReasoningthinkQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590
llama.cpp 59778f0 (cuda)baselinechat1
154.9
156ms6.5
30B-A3B-InstructthinkQ4_K_XLlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590
llama.cpp 59778f0 (cuda)baselineagent1
154.9
161ms6.5
7B-InstructQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wchat1
151.9
26ms6.6
7B-InstructQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wcodegen1
150.1
47ms6.7
7B-InstructQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wrag1
149.8
104ms6.7
30B-A3B-ReasoningthinkQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590
llama.cpp 59778f0 (cuda)baselinerag1
149.4
439ms6.7
7B-InstructQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wagent1
148.9
75ms6.7
7B-InstructQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wagent4
148.6
4.36s6.7
35B-A3BthinkQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wchat1
148.5
125ms6.7
7B-InstructQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wchat1
147.9
36ms6.8
35B-A3BthinkQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wchat1
147.6
124ms6.8
35B-A3BthinkQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wcodegen1
147.4
159ms6.8
35B-A3BthinkQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wrag1
147.3
398ms6.8
35B-A3BthinkQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wagent1
147.0
234ms6.8
35B-A3BthinkQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wrag1
146.9
356ms6.8
7B-InstructQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wcodegen1
146.8
47ms6.8
35B-A3BthinkQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wcodegen1
146.7
174ms6.8
35B-A3BthinkQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wagent4
146.6
5.82s6.8
7B-InstructQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wrag1
146.4
108ms6.8
35B-A3BthinkQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wagent1
146.0
222ms6.9
35B-A3BthinkQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wagent4
145.7
5.83s6.9
7B-InstructQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wagent1
145.6
71ms6.9
7B-InstructQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wagent4
145.4
3.82s6.9
30B-A3B-ReasoningthinkQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590
llama.cpp 59778f0 (cuda)baselineagent1
143.3
462ms7.0
30B-A3B-ReasoningthinkQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590
llama.cpp 59778f0 (cuda)baselinecodegen1
142.3
217ms7.0
35B-A3BthinkQ4_K_XLlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590
llama.cpp 59778f0 (cuda)baselinechat1
126.6
123ms7.9
8B-A1BQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590
llama.cpp 59778f0 (cuda)baselineagent4
124.6
339ms8.0
7B-InstructQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590
llama.cpp 59778f0 (cuda)baselinechat1
91.2
36ms11.0
7B-InstructQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590
llama.cpp 59778f0 (cuda)baselinerag1
87.4
118ms11.4
7B-InstructQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590
llama.cpp 59778f0 (cuda)baselineagent1
85.4
76ms11.7
7B-InstructQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590
llama.cpp 59778f0 (cuda)baselinecodegen1
81.8
53ms12.2
30B-A3B-InstructthinkQ4_K_XLlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590
llama.cpp 59778f0 (cuda)baselineagent4
72.6
1.94s13.8
7B-InstructQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590
llama.cpp 59778f0 (cuda)baselineagent4
50.7
1.04s19.7
27BthinkQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wchat1
42.7
258ms23.4
27BthinkQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wrag1
42.4
762ms23.6
27BthinkQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wcodegen1
42.2
358ms23.7
27BthinkQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wagent1
42.1
507ms23.7
27BthinkQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wagent4
42.0
19.29s23.8
27BthinkQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wchat1
41.7
236ms24.0
27BthinkQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wrag1
41.4
785ms24.1
27BthinkQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wcodegen1
41.4
305ms24.2
27BthinkQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wagent1
41.3
506ms24.2
27BthinkQ4_K_Mlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wagent4
41.3
19.67s24.2
27BthinkQ4_K_XLlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590
llama.cpp 59778f0 (cuda)baselinechat1
22.9
253ms43.7
27BthinkQ4_K_XLlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590
llama.cpp 59778f0 (cuda)baselinerag1
21.9
912ms45.8
27BthinkQ4_K_XLlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590
llama.cpp 59778f0 (cuda)baselinecodegen1
21.6
346ms46.3
27BthinkQ4_K_XLlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590
llama.cpp 59778f0 (cuda)baselineagent1
21.5
611ms46.5
27BthinkQ4_K_XLlegacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590
llama.cpp 59778f0 (cuda)baselineagent4
10.9
3.91s91.5
Decode tok/s
Headline speed metric
TTFT / TPOT
Latency context
Raw vs workload
Separate comparison contracts
Notes badge key
hardware comparable

Use these rows for GPU-to-GPU comparisons when the model, quant, backend, driver family, power policy, and benchmark shape match closely.

stack comparable

Use these rows to compare a similar software stack. They are useful, but backend, server path, driver, cache, or power settings may still influence the number.

stack realistic

Treat these as real workload measurements, not pure hardware rankings. They include prompt mix, API/server overhead, cache behavior, and local software details.

legacyOlder workload harness row.
350 W capRecorded GPU power limit.
drv 590GPU driver branch.
reasoningReasoning-token model.
Metric guide
Decode tok/s - Generation rate. Raw rows come from the engine benchmark; API rows use token intervals when available.
TTFT - Time to first token. This includes prompt processing and server/API overhead.
TPOT / ITL - Time per output token after the first token. Lower is better.
Raw Engine - llama-bench style cases intended for hardware-normalized comparison across rigs.
Workload / API - Stack-realistic measurements that include backend, server, cache, driver, and prompt behavior.
Power badges - A cap badge shows the recorded power limit. The row metadata records the cap relative to the recorded max.