Qwen3.6 27B-MTP

Q4_K_M·27B params·GGUF
reasoning
checkpoint: unsloth/Qwen3.6-27B-MTP-GGUF:Q4_K_M

All runs (43)

HardwareBackendModeShapeConc.Gen tok/sPrefill tok/sTTFTTPOT (ms)Prompt tokOut tokTotalVRAM Δ
GeForce RTX 3090 · 24 GiB450 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)MTP n=2codegen1
63.7
195.4335ms0.162100015.69s0.000 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)MTP n=2agent1
61.7
1157.5525ms0.15995008.11s0.000 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)MTP n=2chat1
59.5
119.7259ms0.1301001.68s0.000 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)MTP n=3codegen1
59.2
175.8372ms0.162100016.88s0.000 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)MTP n=3chat1
58.7
123.8259ms0.1301001.70s0.000 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)MTP n=3agent1
57.8
1208.4537ms0.15995008.66s0.000 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)MTP n=2rag1
55.9
1186.1879ms0.18422003.58s0.000 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)MTP n=3rag1
54.6
1096.3934ms0.18422003.66s0.000 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baselinecodegen1
40.4
188.9355ms23.762100024.76s0.000 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baselineagent1
39.3
1196.7505ms23.859950012.73s0.000 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baselinechat1
38.7
127.0238ms23.4301002.58s0.000 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baselinerag1
35.6
1303.0738ms23.68422005.62s0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
llama.cpp 4f13cb7-mtp (cuda)mtp-3-pl-200wchat1
34.2
109.6283ms0.1301002.92s0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
llama.cpp 4f13cb7-mtp (cuda)mtp-2-pl-200wchat1
32.0
113.4271ms0.1301003.12s0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
llama.cpp 4f13cb7-mtp (cuda)mtp-2-pl-200wcodegen1
31.8
185.9377ms0.162100031.41s0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
llama.cpp 4f13cb7-mtp (cuda)mtp-3-pl-200wrag1
31.2
856.11.05s0.18422006.40s0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
llama.cpp 4f13cb7-mtp (cuda)mtp-3-pl-200wagent1
31.2
964.4621ms0.159950016.03s0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
llama.cpp 4f13cb7-mtp (cuda)mtp-3-pl-200wcodegen1
31.1
182.4384ms0.162100032.18s0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
llama.cpp 4f13cb7-mtp (cuda)mtp-2-pl-200wagent1
30.4
972.1616ms0.059950016.44s0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
llama.cpp 4f13cb7-mtp (cuda)mtp-2-pl-200wrag1
29.8
860.61.05s0.08422006.71s0.000 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)MTP n=2agent4
25.3
57.312.42s0.159950020.39s0.000 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)MTP n=3agent4
24.0
59.713.43s0.159950021.54s0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
llama.cpp 4f13cb7-mtp (cuda)baseline-pl-200wcodegen1
21.5
191.7345ms45.962100046.44s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)MTP n=3chat1
21.2
78.2386ms3.0301004.72s0.006 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
llama.cpp 4f13cb7-mtp (cuda)baseline-pl-200wchat1
21.1
112.9266ms43.9301004.74s0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
llama.cpp 4f13cb7-mtp (cuda)baseline-pl-200wagent1
21.0
984.2609ms46.259950023.77s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)MTP n=3codegen1
20.6
133.5475ms0.162100048.47s0.041 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)MTP n=3agent1
20.4
1984.4302ms0.059950024.51s0.022 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)MTP n=2codegen1
20.0
136.7460ms0.162100049.96s0.044 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
llama.cpp 4f13cb7-mtp (cuda)baseline-pl-200wrag1
20.0
1035.7923ms44.984220010.01s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)MTP n=3rag1
19.9
426.21.69s0.084220010.05s0.011 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)MTP n=2chat1
19.7
81.6375ms0.0301005.07s0.006 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)MTP n=2agent1
19.4
1949.4315ms0.059950025.73s0.024 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)MTP n=2rag1
18.8
432.51.68s0.084220010.66s0.012 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baselineagent4
16.2
35.919.73s23.859950032.01s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)baselinecodegen1
12.0
146.7428ms82.762100083.17s0.016 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)baselineagent1
12.0
2090.1287ms82.859950041.69s0.009 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)baselinechat1
11.7
87.3345ms82.4301008.52s0.003 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)baselinerag1
11.1
470.61.51s82.884220018.01s0.006 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
llama.cpp 4f13cb7-mtp (cuda)baseline-pl-200wagent4
9.8
4.04s89.534134.74s0.040 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)MTP n=3agent4
8.4
18.836.60s0.159950061.49s0.090 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)MTP n=2agent4
8.2
21.938.32s0.159950063.14s0.097 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)baselineagent4
5.0
12.662.69s82.8599500104.08s0.037 GiB

Environment

GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power200 W / 450 W max(44% cap)
backendllama.cpp 4f13cb7-mtp (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
containerizedtrue
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power450 W / 450 W max
pcieGen 4 x16 / Gen 4 x16 max
clocksgfx 1980/2100 MHz · mem 9501 MHz
temp38°C idle · 83°C peak
peak draw436 W
backendllama.cpp cuda-4f13cb7 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driverNVIDIA 590.48.01 + CUDA 13.1
libc2.39
python3.12.3
containerizedtrue
llama.cppversion: 18 (4f13cb7) built with GNU 13.3.0 for Linux x86_64
build flagsGGML_CUDA=ON CMAKE_BUILD_TYPE=Release
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)
cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S
gpuAMD Radeon 8060S
archStrix Halo (gfx1151)
vram96 GiB (system 31.1 GiB, unified)
backendllama.cpp 4f13cb7-mtp (rocm)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel7.0.2-2-pve
python3.12.3
containerizedtrue
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue