Qwen3.6 27B-MTP

Q4_K_M·27B params·GGUF
reasoning
checkpoint: unsloth/Qwen3.6-27B-MTP-GGUF:Q4_K_M

All runs (127)

HardwareBackendModeShapeConc.Gen tok/sPrefill tok/sTTFTTPOT (ms)Prompt tokOut tokTotalVRAM Δ
GeForce RTX 3090 · 24 GiB450 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)MTP n=2codegen1
63.7
195.4335ms0.162100015.69s0.000 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)MTP n=2agent1
61.7
1157.5525ms0.15995008.11s0.000 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)MTP n=2chat1
59.5
119.7259ms0.1301001.68s0.000 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)MTP n=3codegen1
59.2
175.8372ms0.162100016.88s0.000 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)MTP n=3chat1
58.7
123.8259ms0.1301001.70s0.000 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)MTP n=3agent1
57.8
1208.4537ms0.15995008.66s0.000 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)MTP n=2rag1
55.9
1186.1879ms0.18422003.58s0.000 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)MTP n=3rag1
54.6
1096.3934ms0.18422003.66s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 595
llama.cpp direct (cuda)MTP n=3ctx1k_answer1
50.2
656.81.48s0.19695009.96s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 595
llama.cpp direct (cuda)MTP n=2ctx1k_answer1
47.5
671.11.45s0.196950010.54s0.000 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baselinecodegen1
40.4
188.9355ms23.762100024.76s0.000 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baselineagent1
39.3
1196.7505ms23.859950012.73s0.000 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baselinechat1
38.7
127.0238ms23.4301002.58s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 595
llama.cpp direct (cuda)MTP n=3ctx4k_answer1
38.0
827.94.57s0.1377850013.15s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 595
llama.cpp direct (cuda)MTP n=2ctx4k_answer1
36.7
823.74.59s0.1377850013.62s0.000 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baselinerag1
35.6
1303.0738ms23.68422005.62s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 595
llama.cpp direct (cuda)baselinectx1k_answer1
35.2
824.91.18s25.396950014.21s0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
llama.cpp 4f13cb7-mtp (cuda)mtp-3-pl-200wchat1
34.2
109.6283ms0.1301002.92s0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
llama.cpp 4f13cb7-mtp (cuda)mtp-2-pl-200wchat1
32.0
113.4271ms0.1301003.12s0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
llama.cpp 4f13cb7-mtp (cuda)mtp-2-pl-200wcodegen1
31.8
185.9377ms0.162100031.41s0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
llama.cpp 4f13cb7-mtp (cuda)mtp-3-pl-200wrag1
31.2
856.11.05s0.18422006.40s0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
llama.cpp 4f13cb7-mtp (cuda)mtp-3-pl-200wagent1
31.2
964.4621ms0.159950016.03s0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
llama.cpp 4f13cb7-mtp (cuda)mtp-3-pl-200wcodegen1
31.1
182.4384ms0.162100032.18s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 595
llama.cpp direct (cuda)baselinectx4k_answer1
30.8
1238.83.05s25.5377850016.22s0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
llama.cpp 4f13cb7-mtp (cuda)mtp-2-pl-200wagent1
30.4
972.1616ms0.059950016.44s0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
llama.cpp 4f13cb7-mtp (cuda)mtp-2-pl-200wrag1
29.8
860.61.05s0.08422006.71s0.000 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)MTP n=2agent4
25.3
57.312.42s0.159950020.39s0.000 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)MTP n=3agent4
24.0
59.713.43s0.159950021.54s0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
llama.cpp 4f13cb7-mtp (cuda)baseline-pl-200wcodegen1
21.5
191.7345ms45.962100046.44s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp direct (rocm)MTP n=3ctx1k_answer1
21.3
286.63.38s0.096950023.50s0.013 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)MTP n=3chat1
21.2
78.2386ms3.0301004.72s0.006 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
llama.cpp 4f13cb7-mtp (cuda)baseline-pl-200wchat1
21.1
112.9266ms43.9301004.74s0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
llama.cpp 4f13cb7-mtp (cuda)baseline-pl-200wagent1
21.0
984.2609ms46.259950023.77s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)MTP n=3codegen1
20.6
133.5475ms0.162100048.47s0.041 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)MTP n=3agent1
20.4
1984.4302ms0.059950024.51s0.022 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 595
llama.cpp direct (cuda)baselinectx16k_answer1
20.2
1344.611.27s26.61515450024.81s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)MTP n=2codegen1
20.0
136.7460ms0.162100049.96s0.044 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
llama.cpp 4f13cb7-mtp (cuda)baseline-pl-200wrag1
20.0
1035.7923ms44.984220010.01s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)MTP n=3rag1
19.9
426.21.69s0.084220010.05s0.011 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)MTP n=2chat1
19.7
81.6375ms0.0301005.07s0.006 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)MTP n=2agent1
19.4
1949.4315ms0.059950025.73s0.024 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp direct (rocm)MTP n=2ctx1k_answer1
19.3
287.43.37s0.096950025.91s0.013 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)MTP n=2rag1
18.8
432.51.68s0.084220010.66s0.012 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 595
llama.cpp direct (cuda)MTP n=2ctx16k_answer1
17.2
791.319.15s0.11515450029.01s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 595
llama.cpp direct (cuda)MTP n=3ctx16k_answer1
17.2
790.119.18s0.11515450029.05s0.000 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baselineagent4
16.2
35.919.73s23.859950032.01s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp direct (rocm)MTP n=3ctx4k_answer1
15.3
303.812.44s0.0377850032.64s0.012 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp direct (rocm)MTP n=2ctx4k_answer1
14.1
302.712.48s0.0377850035.35s0.014 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 595
llama.cpp direct (cuda)baselinectx32k_answer1
13.1
1269.623.87s28.03030550038.03s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)baselinecodegen1
12.0
146.7428ms82.762100083.17s0.016 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)baselineagent1
12.0
2090.1287ms82.859950041.69s0.009 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)baselinechat1
11.7
87.3345ms82.4301008.52s0.003 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp direct (rocm)baselinectx1k_answer1
11.3
319.13.04s82.896950044.42s0.005 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)baselinerag1
11.1
470.61.51s82.884220018.01s0.006 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 595
llama.cpp direct (cuda)MTP n=3ctx32k_answer1
9.9
737.141.12s0.13030550050.70s0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
llama.cpp 4f13cb7-mtp (cuda)baseline-pl-200wagent4
9.8
4.04s89.534134.74s0.040 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 595
llama.cpp direct (cuda)MTP n=2ctx32k_answer1
9.8
739.340.99s0.13030550051.28s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp direct (rocm)baselinectx4k_answer1
9.4
339.211.14s83.7377850053.00s0.006 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)MTP n=3agent4
8.4
18.836.60s0.159950061.49s0.090 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)MTP n=2agent4
8.2
21.938.32s0.159950063.14s0.097 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 595
llama.cpp direct (cuda)baselinectx64k_answer1
7.1
1107.254.68s30.56060150070.34s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp direct (rocm)MTP n=3ctx16k_answer1
6.4
280.154.12s0.01515450077.75s0.013 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp direct (rocm)MTP n=2ctx16k_answer1
6.3
280.254.09s0.01515450079.44s0.014 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 595
llama.cpp direct (cuda)baselinectx1k_probe1
6.3
870.21.09s24.995181.27s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp direct (rocm)baselinectx16k_answer1
5.4
311.148.72s86.71515450092.05s0.006 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 595
llama.cpp direct (cuda)MTP n=3ctx1k_probe1
5.4
693.41.37s0.195181.49s0.010 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 595
llama.cpp direct (cuda)MTP n=2ctx1k_probe1
5.3
694.61.37s0.195181.50s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)baselineagent4
5.0
12.662.69s82.8599500104.08s0.037 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 595
llama.cpp direct (cuda)MTP n=3ctx64k_answer1
4.7
631.495.98s0.160601500105.45s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 595
llama.cpp direct (cuda)MTP n=2ctx64k_answer1
4.7
632.795.67s0.160601500105.88s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 595
llama.cpp direct (cuda)baselinectx100k_answer1
4.2
937.4100.91s34.494590500118.33s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp direct (rocm)MTP n=3ctx32k_answer1
3.4
250.0121.24s0.030305500147.14s0.013 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp direct (rocm)MTP n=2ctx32k_answer1
3.4
249.5121.48s0.130305500148.56s0.015 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp direct (rocm)baselinectx32k_answer1
3.2
276.3109.69s90.930305500155.13s0.005 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 595
llama.cpp direct (cuda)baselinectx128k_answer1
3.1
852.8142.02s36.5121117500160.72s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 595
llama.cpp direct (cuda)MTP n=3ctx100k_answer1
2.7
536.0176.47s0.194590500187.99s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 595
llama.cpp direct (cuda)MTP n=2ctx100k_answer1
2.7
535.0176.82s0.194590500188.56s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 595
llama.cpp direct (cuda)baselinectx4k_probe1
2.4
1256.43.05s27.4383583.29s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp direct (rocm)baselinectx1k_probe1
2.3
336.12.83s82.895183.42s0.002 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp direct (rocm)MTP n=3ctx1k_probe1
2.3
299.43.18s0.095183.50s0.003 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp direct (rocm)MTP n=2ctx1k_probe1
2.3
299.53.17s0.095183.55s0.003 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 595
llama.cpp direct (cuda)MTP n=3ctx128k_answer1
1.9
478.8252.98s0.1121117500263.24s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 595
llama.cpp direct (cuda)MTP n=2ctx128k_answer1
1.9
476.8254.00s0.1121117500265.22s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 595
llama.cpp direct (cuda)MTP n=3ctx4k_probe1
1.7
833.54.60s0.1383584.72s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 595
llama.cpp direct (cuda)MTP n=2ctx4k_probe1
1.7
821.74.67s0.1383584.81s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp direct (rocm)baselinectx64k_answer1
1.6
224.9269.20s99.360601500318.82s0.005 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp direct (rocm)MTP n=3ctx64k_answer1
1.5
204.8295.96s0.060601500322.74s0.010 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp direct (rocm)MTP n=2ctx64k_answer1
1.5
204.7296.05s0.060601500326.27s0.009 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp direct (rocm)baselinectx100k_answer1
0.9
185.8509.02s108.694590500563.37s0.006 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp direct (rocm)MTP n=3ctx100k_answer1
0.9
170.1556.16s0.094590500585.80s0.010 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp direct (rocm)MTP n=2ctx100k_answer1
0.8
169.4558.28s0.094590500590.24s0.010 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 595
llama.cpp direct (cuda)baselinectx16k_probe1
0.7
1350.711.21s26.515136811.40s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp direct (rocm)baselinectx4k_probe1
0.7
340.511.26s85.83835811.89s0.004 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp direct (rocm)baselinectx128k_answer1
0.6
163.8739.48s115.9121117500797.46s0.005 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp direct (rocm)MTP n=3ctx4k_probe1
0.6
302.912.66s0.03835812.98s0.002 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp direct (rocm)MTP n=2ctx4k_probe1
0.6
304.112.61s3.63835813.02s0.001 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp direct (rocm)MTP n=3ctx128k_answer1
0.6
150.2806.44s0.0121117500837.93s0.010 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp direct (rocm)MTP n=2ctx128k_answer1
0.6
149.5810.10s0.0121117500844.87s0.010 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 595
llama.cpp direct (cuda)MTP n=3ctx16k_probe1
0.4
795.019.04s0.115136819.16s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 595
llama.cpp direct (cuda)MTP n=2ctx16k_probe1
0.4
793.219.08s0.015136819.24s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 595
llama.cpp direct (cuda)baselinectx32k_probe1
0.3
1269.923.85s27.730287824.05s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 595
llama.cpp direct (cuda)MTP n=3ctx32k_probe1
0.2
739.340.97s0.130287841.10s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 595
llama.cpp direct (cuda)MTP n=2ctx32k_probe1
0.2
738.541.01s0.030287841.17s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp direct (rocm)baselinectx16k_probe1
0.2
312.448.44s87.215136849.07s0.002 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp direct (rocm)MTP n=2ctx16k_probe1
0.1
281.453.80s0.115136854.24s0.002 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp direct (rocm)MTP n=3ctx16k_probe1
0.1
279.954.08s0.015136854.42s0.003 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 595
llama.cpp direct (cuda)baselinectx64k_probe1
0.1
1098.855.14s31.260585855.37s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 595
llama.cpp direct (cuda)MTP n=3ctx64k_probe1
0.1
635.195.39s0.160585895.58s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 595
llama.cpp direct (cuda)MTP n=2ctx64k_probe1
0.1
633.495.65s0.060585895.82s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 595
llama.cpp direct (cuda)baselinectx100k_probe1
0.1
939.3100.77s33.4946458101.01s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp direct (rocm)baselinectx32k_probe1
0.1
276.3109.63s91.1302878110.28s0.001 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp direct (rocm)MTP n=2ctx32k_probe1
0.1
250.1121.13s0.0302878121.58s0.003 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp direct (rocm)MTP n=3ctx32k_probe1
0.1
249.8121.26s0.0302878121.61s0.002 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 595
llama.cpp direct (cuda)baselinectx128k_probe1
0.1
856.6141.38s36.81210998141.66s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 595
llama.cpp direct (cuda)MTP n=2ctx100k_probe1
0.0
537.7176.03s0.1946458176.22s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 595
llama.cpp direct (cuda)MTP n=3ctx100k_probe1
0.0
536.4176.45s0.0946458176.66s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 595
llama.cpp direct (cuda)MTP n=2ctx128k_probe1
0.0
477.9253.42s0.11210998253.61s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 595
llama.cpp direct (cuda)MTP n=3ctx128k_probe1
0.0
479.0252.82s0.01210998252.96s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp direct (rocm)baselinectx64k_probe1
0.0
224.8269.46s99.5605858270.17s0.002 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp direct (rocm)MTP n=2ctx64k_probe1
0.0
204.8295.80s0.0605858296.33s0.003 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp direct (rocm)MTP n=3ctx64k_probe1
0.0
204.8295.90s0.0605858296.29s0.003 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp direct (rocm)baselinectx100k_probe1
0.0
185.8509.52s108.8946458510.31s0.002 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp direct (rocm)MTP n=2ctx100k_probe1
0.0
169.7557.77s0.0946458558.32s0.002 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp direct (rocm)MTP n=3ctx100k_probe1
0.0
169.9557.01s0.0946458557.61s0.003 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp direct (rocm)baselinectx128k_probe1
0.0
163.6740.26s115.91210998741.09s0.001 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp direct (rocm)MTP n=2ctx128k_probe1
0.0
149.9807.88s0.01210998808.42s0.003 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp direct (rocm)MTP n=3ctx128k_probe1
0.0
150.2806.45s0.01210998806.93s0.002 GiB

Environment

GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power200 W / 450 W max(44% cap)
backendllama.cpp 4f13cb7-mtp (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
containerizedtrue
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power450 W / 450 W max
pcieGen 4 x16 / Gen 4 x16 max
clocksgfx 1980/2100 MHz · mem 9501 MHz
temp38°C idle · 83°C peak
peak draw436 W
backendllama.cpp cuda-4f13cb7 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driverNVIDIA 590.48.01 + CUDA 13.1
libc2.39
python3.12.3
containerizedtrue
llama.cppversion: 18 (4f13cb7) built with GNU 13.3.0 for Linux x86_64
build flagsGGML_CUDA=ON CMAKE_BUILD_TYPE=Release
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
2× GeForce RTX 3090 · 24 GiB each
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090 × 2
archNVIDIA
vram48 GiB (system 64.0 GiB)
power200 W × 2 / 450 W × 2 max(44% cap)
pcieGen 4 x16 / Gen 4 x16 max
clocksgfx 1800/2100 MHz · mem 9501 MHz
temp41°C idle · 53°C peak
peak draw195 W
backendllama.cpp direct (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
driverNVIDIA 595.71.05 + CUDA 13.2
libc2.39
python3.12.3
llama.cppversion: 18 (4f13cb7) built with GNU 13.3.0 for Linux x86_64
build flagsGGML_CUDA=ON CMAKE_BUILD_TYPE=Release
runs/cell3
warmups1
endpoint/v1/chat/completions
streamingtrue
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)
cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S
gpuAMD Radeon 8060S
archStrix Halo (gfx1151)
vram96 GiB (system 31.1 GiB, unified)
pcieGen 4 x16 / Gen 4 x16 max
clocksgfx 1158 MHz · mem 1000 MHz
temp47°C idle · 77°C peak
peak draw103 W
backendllama.cpp direct (rocm)
serverlemonade unknown
osUbuntu 24.04 LTS
driverROCm 7.2.3
libc2.39
python3.12.3
llama.cppversion: 1 (4f13cb7) built with Clang 22.0.0 for Linux x86_64
build flagsGGML_HIP=ON AMDGPU_TARGETS=gfx1151 CMAKE_BUILD_TYPE=Release
runs/cell3
warmups1
endpoint/v1/chat/completions
streamingtrue
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)
cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S
gpuAMD Radeon 8060S
archStrix Halo (gfx1151)
vram96 GiB (system 31.1 GiB, unified)
backendllama.cpp 4f13cb7-mtp (rocm)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel7.0.2-2-pve
python3.12.3
containerizedtrue
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue